Locality Sensitive Hashing (LSH): The Illustrated Guide

Locality sensitive hashing (LSH) is a widely popular technique used in approximate nearest neighbor (ANN) search. The solution to efficient similarity search is a profitable one — it is at the core of several billion (and even trillion) dollar companies.

Big names like Google, Netflix, Amazon, Spotify, Uber, and countless more rely on similarity search for many of their core functions.


This is a companion discussion topic for the original entry at https://www.pinecone.io/learn/locality-sensitive-hashing/

How can you get which values are candidate pairs from the bucket, like the bucket contains the signatures and the candidate pairs are in the form of signatures so i actually do not know which ID should i check

I’m not sure for the LSH index specifically, but for other indexes there is a get_ids method that you can call on the index that returns a “swig object” that you convert to numpy with faiss.vector_to_array. I can also see here that it is possible to retrieve the data from index.codes for LSH, but I believe this the vector codes themselves (not IDs)