Missing metadata key in results?

Good afternoon from the Philippines. I’m pretty new to Pinecone. When i run this code it seems that the metadata, ‘text’ is missing

# Embed and index dataset
batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    batch = data.iloc[i:i_end]
    ids = [f"{x['doi']}-{x['chunk-id']}" for _, x in batch.iterrows()]
    texts = [x['chunk'] for _, x in batch.iterrows()]
    embeds = embed_model.embed_documents(texts)
    metadata = [{'text': x['chunk'], 'source': x['source'], 'title': x['title']} for _, x in batch.iterrows()]
    index.upsert(vectors=zip(ids, embeds, metadata))

text_field = "text"

# Initialize vector store
vectorstore = PineconeVectorStore(index, embed_model, text_field)

# Define function to augment prompt with context
def augment_prompt(query: str):
    results = vectorstore.similarity_search(query, k=3)
    for result in results:
        print("Available keys in metadata:", result.metadata)
    source_knowledge = "\n".join([x.metadata.get('text', 'No text available in metadata') for x in results])
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

It returns these results
Available keys in metadata: dict_keys([‘source’, ‘title’])
Available keys in metadata: dict_keys([‘source’, ‘title’])
Available keys in metadata: dict_keys([‘source’, ‘title’])

In my vector db, i’ve already upserted the text, source, and title with no issue. It’s just when i retrieve the metadata that there happens to be an issue. I’m trying to build a chatbot if it’s any relevant.

1 Like

Having a similar issue. I’m keep receiving results that say that ‘text’ fields doesn’t exist in the dictionary of the query result. This only happens in the last iteration of the dictionary.

This is the error that I get when I try to create a list comprehension of just ‘text’ fields of the metadata:

PineconeApiAttributeError: ScoredVector has no attribute 'text' at ['['received_data', 'matches', 9]']['text']

I have no idea where ‘received_data’ came from.

This is the structure of my query results. I have 10 of these dictionaries inside the list.:

{'id': '82f2.........980c',
              'metadata': {'filename': 'abc.pdf',
                           'page_number': 120,
                           'text': 'Hello there'
              }
}

I ran following code for debugging:

for i, entry in enumerate(query_results['matches']):
    has_metadata = 'metadata' in entry
    has_text = has_metadata and 'text' in entry['metadata']
    print(f"Index {i}: has_metadata={has_metadata}, has_text={has_text}")

And got the following results:

Index 0: has_metadata=True, has_text=True
Index 1: has_metadata=True, has_text=True
Index 2: has_metadata=True, has_text=True
Index 3: has_metadata=True, has_text=True
Index 4: has_metadata=True, has_text=True
Index 5: has_metadata=True, has_text=True
Index 6: has_metadata=True, has_text=True
Index 7: has_metadata=True, has_text=True
Index 8: has_metadata=True, has_text=True
Index 9: has_metadata=True, has_text=False

It shows that only the last iteration doesn’t have ‘text’ field for some reason. But I manually checked the query result and that chunk does have ‘text’ metadata field.

Another thing I tried is when I do len(query_results), it shows 10. But when I do a for loop and iterate over query_results, it only prints the top 9. It’s almost as if the last one doesn’t even exist.

Does anyone have any idea why this is happening or how to resolve it?