Hi, my apologies if this was covered in the docs, but I did not find it. I am using Pinecone with Langchain to do similarity search on some text data.
The documents have a common metadata field, and this field is high cardinality but not unique to the specific document.
Is there a way to tell the Pinecone API to return only the result with the highest score for a unique metadata value? Right now there’s a possibility that my top_n could include only results with the same value in that metadata field, but with the data I am working with I actually don’t want any duplicates, only the top scoring result.
Just to expand on it a bit - I expect queries to return multiple “possible matches”, but I only want the top result for each unique value for a particular metadata field. For example, in this set of results (score descending):
Hey Chris - You’re thinking like me: “this is so easy to do in SQL” I’m not affiliated with pinecone, but from the documentation on metadata filtering, I’d suggest you’d need to either use your own logic that loops through your resultset, or use 3 metadata queries (from the metadata filter docs Metadata filtering):