Does Pinecone support filtering by vector ID?

steve1 · August 23, 2023, 3:04pm

Let’s say I have a set of candidate vector IDs (i.e. from an external system), and I want to sort them by similarity score to a given query. How can I represent this as a Pinecone query?

This does not seem to be well supported by Metadata Filtering, as the docs say

High cardinality consumes more memory: Pinecone indexes metadata to allow
for filtering. If the metadata contains many unique values — such as a unique
identifier for each vector — the index will consume significantly more
memory. Consider using selective metadata indexing to avoid indexing high-cardinality metadata that is not needed for filtering.

Cory_Pinecone · August 23, 2023, 3:11pm

To clarify, you want to run a query using a vector to get similar ones but limit the results to a fixed set of vector IDs. Is that right?

steve1 · August 23, 2023, 4:21pm

Hey @Cory_Pinecone , yes that’s exactly right!

Cory_Pinecone · August 23, 2023, 5:26pm

Interesting. Filtering against a list of vector IDs isn’t supported, and as you rightly pointed out, having high cardinality in metadata can impact overall index performance. But there might be an option using sparse-dense vectors, instead. You could encode the vector IDs into the sparse vectors and use that to filter against in the query.

This notebook gives an example of doing something similar, but in e-commerce search when filtering on different types of products. But the principle is basically the same.

steve1 · August 23, 2023, 5:49pm

Thanks for the response, @Cory_Pinecone . So in that case, I would add all of the vector IDs to the query itself? Since I’m already using the sparse vector in this project, I worry that adding in the IDs might make it hard to tune the alpha parameter, but it’s worth a shot.

hetnon.freitas · April 7, 2024, 4:21am

Having the very same challenge here. It’s a bit baffling that this is not readily available as a function.
what I’m doing is returning the vectors based on the ids and calculating the similarity manually and then ranking. solved for me but I’m still in disbelief that this is not implemented yet and I have to do it manually. Feels like I’m doing something wrong.

john2 · May 31, 2024, 6:13pm

Are there any plans to support this in the near future?
This is absolutely a required feature for us and we’ll have to use a different DB if it is not supported.

And since we use sparse vectors for our similarity query (similar to the issue @steve1 had) filtering by ids in the sparse vector isn’t a great workaround for us

john2 · May 31, 2024, 6:16pm

We are also forced to implement the same solution as hetnon, but there is little point to us using pinecone and its vector indexing if we need to calculate all the similarity scores manually.

tjlabs.prod · June 28, 2024, 12:50am

It makes no sense for a vector database to not support this feature, because the whole idea is to not calculate the similarity scores by ourselves

adamdmurphy4 · March 12, 2025, 1:03pm

Any update on this now that Pinecone is serverless? I assume memory isn’t much of an issue anymore. But is query speed slow when you do this filtering?

Did anyone find other vector dbs that support this filtering out of the box?

Qdrant offers filtering as a core feature