Filtering: The Missing WHERE Clause in Vector Search

discobot · February 1, 2022, 7:58pm

Vector similarity search makes massive datasets searchable in fractions of a second. Yet despite the brilliance and utility of this technology, often what seem to be the most straightforward problems are the most difficult to solve. Such as filtering.

Filtering takes the top place in being seemingly simple — but actually incredibly complex. Applying fast-but-accurate filters when performing a vector search (ie, nearest-neighbor search) on massive datasets is a surprisingly stubborn problem.

This is a companion discussion topic for the original entry at https://www.pinecone.io/learn/vector-search-filtering/

rom1504 · February 21, 2022, 2:37am

Hey,
Great article, thanks for writing it!
I’ve been thinking about the single stage filtering for a long time and I’m really interested in how you did it.
Do you intend to do another post explaining that?
Thanks
rom1504

greg · February 22, 2022, 6:45pm

Hey @rom1504 it’s on our “to-write” list which unfortunately sits below the “to-build” list. So no promises on when, but eventually yes! For now there’s no replacement for trying and seeing it for yourself.

earyj · March 23, 2022, 3:15pm

This is a great article, but I have some little questions.
In Milvus, there is a hybrid search method that uses the metadata index to filter data first and get a bitmap about which vectors pass the filter, then just check if the vector is in the bitmap during normal ANN. I think this won’t be too slow.
By the way, might you give some idea about how you merge indexes into one? Is it through merging attributes into vectors?
No offense, just curious about this question.

devpp · October 19, 2022, 5:14pm

Can you provide more technical details on single stage filtering .
Also curios is this single stage filtering applicable to a specific ANN algorithm like hnsw or faiss_ivf.

toon.cappuyns · June 20, 2023, 5:47am

Can single stage filtering be used on a hybrid index?