Search latency with filtering on metadata field with high granularity

mrmac · March 16, 2023, 9:14am

Hi,

I’m wondering whether someone has some experience with the following use case in Pinecone:
We have ~1 Mio vectors and want to filter on a metadata field which is essentially a document id (~ 800k unique values). The filter is an “in” clause. In the query we pass a list of ids that should be considered in the query. Here is a dummy example:

index.query(
    vector=[0.1, 0.2, 0.3],
    filter={
        "doc_id": {"$in": [1,2,3,4,5]},
    },
    top_k=10
)

We experience very high latency if we pass a longer list (>1k values) as the filter for doc_id.
Does anyone know a workaround for this issue or is this kind of query a limitation in Pinecone?

Thank you!

diego.barbosa · August 5, 2023, 10:31pm

Same problem, more than 1~2k and the index almost crash. Would appreciate a functionality that allow to filter by id.