i have a serverless pinecone index. i need to get all the data filtering by metadata only, not using the vector search.
i run it with the dummy vector to overcome the limitation of the pinecone api and adding the necessary metadata filter:
res = index_pc.query(
vector=[0]*1536,
top_k=200,
include_metadata=True,
include_values=False,
filter = {
"account_id": {"$eq": '0011T00002SlOYFQA3'},
"call_date": {"$eq": '2024-11-15'},
}
)
with this i receive 127 rows and i see not all the rows that are stored there even though their numbers is lower than 200 i set in the top_k. I can find 164 rows that satisfy the filtering by the metadata.
The rows that are missing are present, they satisfy the filtering and i can check it querying it with the ids that were missing from the query by the dummy vector:
res = index_pc.query(
id='1ec0f24389216f0f0dc460941971d2b0',
top_k=1,
filter = {
"account_id": {"$eq": '0011T00002SlOYFQA3'},
"call_date": {"$eq": '2024-11-15'},
},
include_metadata=True,
include_values=False)
json_string = json.dumps(res.to_dict(), indent=2)
df_temp = pd.DataFrame(json.loads(json_string)['matches'])
what could be the reason? The support bot told me that it’s ANN behavior, thus i think it shall not be so.