Missing vectors when using filters

@stanislav.novokhatko and @jocelyn I’m encountering the same issue, with a slight difference in input parameters. My query includes both a vector and metadata filters.

query_dense = [......]  # (openai/text-embedding-small)
topK = 5000
results = index.query(
    top_k=topK,
    include_metadata=True,
    vector=query_dense,
    namespace=namespace,
    filter={
        "category_child_id": {
            "$in": [
                102513,
                102744,
                # ..........
            ]
        }
    }
)

Issue:

  • I have 334 IDs of type Number, all of which exist in the index. However, the query only returned 304 results, leaving out 30 IDs that are still present in the index.

Doubt:

  • Does Pinecone intentionally prune results?
  • I ran a simple experiment by creating a namespace with 50 unique records (988 records in the namespace). When I queried with a vector and metadata filter on all 50 IDs (same as the code above), it returned only 21 results—despite the same topK value.
  • Does Pinecone apply any internal threshold or weightage to the result set based on the input vector?

Hello aneerudh,

Thank you for your post. I hope you do not mind me moving it. I believe it’s a different issue from the one you tagged it on to.

  • Does Pinecone intentionally prune results?
  • Does Pinecone apply any internal threshold or weightage to the result set based on the input vector?

Have a read over Serverless Architecture - Query Planners as it over both these points. TL;DR: Is that in some cases, there will be pruning, but in this case, I would expect all the results to be returned.

  • I ran a simple experiment by creating a namespace with 50 unique records (988 records in the namespace). When I queried with a vector and metadata filter on all 50 IDs (same as the code above), it returned only 21 results—despite the same topK value.

Would it be possible to share this test, including the upserting of the records?