I am having issues trying to query my pinecone index with a filter. It works without the filter however. Any help or advice would be greatly appreciated!
I have upserted into pinecone in the following way:
for doc in tqdm(docs_restaurant):
chunks = text_splitter.split_text(str(doc.metadata))
for i, chunk in enumerate(chunks):
documents.append({
'id': f'{doc.page_content}_{i}',
'text': chunk,
'metadata': {
'file': 'restaurants'
}
})
for i in tqdm(range(0, len(documents), batch_size)):
i_end = min(len(documents), i+batch_size)
res = s.post(
f"{endpoint_url}/upsert",
headers=headers,
json={
"documents": documents[i:i_end]
}
)
The data is in pinecone. I can query it with no filter successfully ala:
query_response = self.index.query(
top_k=query.top_k,
vector=query.embedding,
filter=None,
include_metadata=True,
)
Which returns the following:
{'matches': [{'id': 'Del Taco_0_1',
'metadata': {'document_id': 'Del Taco_0',
'file': 'restaurants',
'text': "pic Beyond Burritos', 'Desserts & Shakes', "
"'Tacos', 'Meals', '20 Under $2 Menu']}"},
'score': 0.786260903,
'values': []},
{'id': 'Minos Take Out_0_1',
'metadata': {'document_id': 'Minos Take Out_0',
'file': 'restaurants',
'text': "izers', 'Wraps & Pitas', 'Light Meals']}"},
'score': 0.7722193,
'values': []},
'namespace': ''
}
However, when I try this:
self.index.query(
top_k=query.top_k,
vector=query.embedding,
filter={'file': {'$eq': 'restaurants'}},
include_metadata=True,)
It returns nothing:
{'matches': [], 'namespace': ''}