Is it possible query Pinecone for all (or top N) results where the similarity to a vector (or id) is over a certain threshold, e.g. 0.90?
If so, can you point me to an example or to the relevant documentation.
Thank you in advance.
Is it possible query Pinecone for all (or top N) results where the similarity to a vector (or id) is over a certain threshold, e.g. 0.90?
If so, can you point me to an example or to the relevant documentation.
Thank you in advance.
From what I know, you set the top_k when querying to retrieve the top_k most similar vectors. You can’t set a threshold at that time, but you can retrieve the similarity score in the metadata, so you can filter the retrieved results, sort them… (in your app).
The score appears in the query response. Something like this should work in python:
threshold = .9
def filter_by_score(d):
return d["score"] > threshold
def query_vdb(s, filters={}, k=3):
v = openai.Embedding.create(input=s, engine='text-embedding-ada-002')['data'][0]['embedding']
q = index.query(vector=v, top_k=k,
filter = filters,
include_metadata=True
)
return list(filter(filter_by_score, q['matches']))
query_vdb("this is my query")