Unable to query pinecone vector index using filter

anthonyjohnarena · June 10, 2023, 9:43pm

I am having issues trying to query my pinecone index with a filter. It works without the filter however. Any help or advice would be greatly appreciated!

I have upserted into pinecone in the following way:

for doc in tqdm(docs_restaurant):
    chunks = text_splitter.split_text(str(doc.metadata))
    for i, chunk in enumerate(chunks):
        documents.append({
            'id': f'{doc.page_content}_{i}',
            'text': chunk,
            'metadata': {
                'file': 'restaurants'
            }
        })

for i in tqdm(range(0, len(documents), batch_size)):
    i_end = min(len(documents), i+batch_size)
    res = s.post(
        f"{endpoint_url}/upsert",
        headers=headers,
        json={
            "documents": documents[i:i_end]
        }
    )

The data is in pinecone. I can query it with no filter successfully ala:

query_response = self.index.query(
                    top_k=query.top_k,
                    vector=query.embedding,
                    filter=None,
                    include_metadata=True,
                )

Which returns the following:

{'matches': [{'id': 'Del Taco_0_1',
               'metadata': {'document_id': 'Del Taco_0',
                            'file': 'restaurants',
                            'text': "pic Beyond Burritos', 'Desserts & Shakes', "
                                    "'Tacos', 'Meals', '20 Under $2 Menu']}"},
               'score': 0.786260903,
               'values': []},
              {'id': 'Minos Take Out_0_1',
               'metadata': {'document_id': 'Minos Take Out_0',
                            'file': 'restaurants',
                            'text': "izers', 'Wraps & Pitas', 'Light Meals']}"},
               'score': 0.7722193,
               'values': []},
'namespace': ''
}

However, when I try this:

self.index.query(
top_k=query.top_k,
vector=query.embedding,
filter={'file': {'$eq': 'restaurants'}},
include_metadata=True,)

It returns nothing:

{'matches': [], 'namespace': ''}

dra · June 12, 2023, 3:36am

Hi
I could not figure out what was wrong with the code just by looking at it.

Are you getting the expected results with curl?
I have tested it with similar data and of course it works fine.