Using similarity in queries

flemming.madsen · September 12, 2023, 9:39am

Is it possible query Pinecone for all (or top N) results where the similarity to a vector (or id) is over a certain threshold, e.g. 0.90?

If so, can you point me to an example or to the relevant documentation.
Thank you in advance.

adumont · September 14, 2023, 5:55am

From what I know, you set the top_k when querying to retrieve the top_k most similar vectors. You can’t set a threshold at that time, but you can retrieve the similarity score in the metadata, so you can filter the retrieved results, sort them… (in your app).

tjensen · September 14, 2023, 4:48pm

The score appears in the query response. Something like this should work in python:

threshold = .9

def filter_by_score(d):
    return d["score"] > threshold

def query_vdb(s, filters={}, k=3):
    v = openai.Embedding.create(input=s, engine='text-embedding-ada-002')['data'][0]['embedding']
    q = index.query(vector=v, top_k=k,
                        filter = filters,
                        include_metadata=True
                    )
    return list(filter(filter_by_score, q['matches']))

query_vdb("this is my query")