Let’s say I am searching within 30 vectors, based on a metadata filter, and I gave an embedding for getting the top 5 most similar matches, but I am not getting any results. Is there a threshold which is in place by default?
Hi @siddharth, and welcome to the Pinecone community forums!
Thank you for your question.
There’s no default threshold, but as our Query Data guide explains,
Depending on your data and your query, you may get fewer than
top_k
results. This happens whentop_k
is larger than the number of possible matching vectors for your query.
Could you share more about:
- Your data - what you’re indexing
- How you’re querying it
and any relevant code you have? If you do share your code, please be careful not to include any secrets such as your Pinecone API key.
That would help us debug further.
Hope this helps!
Best,
Zack
The pinecone_index has around 350,000 vectors, out of which the filter passes around 40 files here. The data is from a custom model which gets 64 dimensional vectors from any audio file. So, I am querying with 1 vector embedding(for an audio) to find a closes match within the rest of the 40 audio files(or vectors)
Here is the code I have -
compatible_vectors = pinecone_index.query(
vector=vector_embedding,
top_k=5,
include_metadata=True,
filter={
"$and": [
{"field1": field1_filter},
{"filename": {"$in": list_of_filenames}},
]
},
)
My question mainly is two parts -
- Shouldn’t pinecone at least return the closest matches anyway, since I can get the cosine distances between the vector_embedding above and the rest of the 40 embeddings?
- Can anything be done from the user side to mitigate this?
@siddharth We recently experienced an incident during which some queries returned no results. Are you still experiencing this behavior now? If so, can you please share the name of the index in question?