Good afternoon from the Philippines. I’m pretty new to Pinecone. When i run this code it seems that the metadata, ‘text’ is missing
# Embed and index dataset
batch_size = 100
for i in tqdm(range(0, len(data), batch_size)):
i_end = min(len(data), i+batch_size)
batch = data.iloc[i:i_end]
ids = [f"{x['doi']}-{x['chunk-id']}" for _, x in batch.iterrows()]
texts = [x['chunk'] for _, x in batch.iterrows()]
embeds = embed_model.embed_documents(texts)
metadata = [{'text': x['chunk'], 'source': x['source'], 'title': x['title']} for _, x in batch.iterrows()]
index.upsert(vectors=zip(ids, embeds, metadata))
text_field = "text"
# Initialize vector store
vectorstore = PineconeVectorStore(index, embed_model, text_field)
# Define function to augment prompt with context
def augment_prompt(query: str):
results = vectorstore.similarity_search(query, k=3)
for result in results:
print("Available keys in metadata:", result.metadata)
source_knowledge = "\n".join([x.metadata.get('text', 'No text available in metadata') for x in results])
augmented_prompt = f"""Using the contexts below, answer the query.
Contexts:
{source_knowledge}
Query: {query}"""
return augmented_prompt
It returns these results
Available keys in metadata: dict_keys([‘source’, ‘title’])
Available keys in metadata: dict_keys([‘source’, ‘title’])
Available keys in metadata: dict_keys([‘source’, ‘title’])
In my vector db, i’ve already upserted the text, source, and title with no issue. It’s just when i retrieve the metadata that there happens to be an issue. I’m trying to build a chatbot if it’s any relevant.