Nearest Neighbor Indexes for Similarity Search

discobot · February 1, 2022, 7:58pm

Vector similarity search is a game-changer in the world of search. It allows us to efficiently search a huge range of media, from GIFs to articles — with incredible accuracy in sub-second timescales for billion+ size datasets.

One of the key components to efficient search is flexibility. And for that we have a wide range of search indexes available to us — there is no ‘one-size-fits-all’ in similarity search.

This is a companion discussion topic for the original entry at https://www.pinecone.io/learn/vector-indexes/

Prabhat · July 7, 2022, 11:47am

How does efConstruction have an impact on search time for IVFHNSW (as mentioned in this description)? I am observing something in my experiments but can’t figure out why.

jamesbriggs · July 12, 2022, 3:50pm

efConstruction primarily effects your build time (eg larger efConstruction == longer build time). It has been some time since working with HNSW so take this with a pinch of salt, different efConstruction values create a more or less optimized graph, so search times could be slower or faster because of that.

Generally, greater efConstruction means more optimized graph, and faster search times. But I don’t think there is a perfect correlation.

Jerome · July 22, 2022, 5:54pm

Hi guys,
regarding the efConstruction parameter, I have a question about the performance graphs shown in the HNSW section. How can M be greater than efConstruction ? In my understanding of the paper algorithms 1 and 2 by Malkov and Yashumin, during the node insertion, the edges are created between the inserted node and the M (or Mmax) nearest neighbors among the efConstruction ones returned for a given layer… So it seems to me that M is always less or equal to efConstruction. Same for the efSearch parameter which is by construction less or equal to the efConstruction, the maximum possible value for M (Mmax).
Thanks for your clarification
Best regards
Jerome

tang-hi · February 9, 2023, 5:41am

Great article
I found a typo in
This is a strong result — 90% of the performance could certainly be a reasonable sacrifice to performance if we get improved search-times.

It should be accuracy?

don · May 29, 2023, 2:57pm

Two problems with the sample code.

I think ‘data’ was defined in the previous lesson. When I try to call the add function to an index, I get an assertion error from faiss. Is there a modification to ‘data’ that would make this work?

Later, a new variable is used ‘wb’. It doesn’t seem to be defined anywhere.

yangtaohehe · June 6, 2023, 1:56pm

Hi, I can’t found any ‘Voronoi diagrams’ info in source code of Faiss(IndexIVF.h and IdexIVF.cpp)
So, why did you said “It(IVF) works on the concept of Voronoi diagrams”