Similarity Search with score cutoff

johnkongtcheu · June 4, 2024, 3:15pm

Hello, Pinecone community,
I’m writing to ask if it’s possible to have a vector similarity search with thresholding instead of just top K. Similar to what Weaviate has(Vector similarity search | Weaviate - Vector Database), For my use case: I need to return all of the relevant documents to ingest into my LLM pipeline whether its 100-10000.

ZacharyProser · June 6, 2024, 2:18pm

Hi @johnkongtcheu, and thank you for your question!

While this is not currently supported natively, you can achieve this with a post-retrieval filtering step, like so:

# Perform the search
response = index.query(queries=[query_embedding], top_k=10, include_values=True)

# Define score cutoff
score_cutoff = 0.8

# Filter results based on the score cutoff
filtered_results = [result for result in response['results'][0]['matches'] if result['score'] >= score_cutoff]

# Print filtered results
for result in filtered_results:
    print(f"ID: {result['id']}, Score: {result['score']}")

I hope that helps!

Best,
Zack

loganju2000 · June 6, 2024, 5:42pm

Hi @ZacharyProser
Is it possible to set a threshold in the function index.query()?

gdj0nes · June 7, 2024, 12:51pm

@loganju2000 we don’t have this functionality today. What is your use case for a threshold query?

johnkongtcheu · June 7, 2024, 2:34pm

Hi, thank you @gdj0nes and @ZacharyProser for the help on this question. I’m aware of the post-processing abilities that can be done with Pinecone. However, in my use case I may request 100K+ documents to be retrieved from Pinecone at a time for LLM ingestion, and performing the cutoff as a post-processing step incurs a latency penalty due to network transfer speed, which limits the usefulness of Pinecone’s fast queries.

loganju2000 · June 8, 2024, 4:39am

@gdj0nes I was trying to integrate the rag into LLM where I need to retrieve the most relevant data from the index.