List results from query in a specific order other than by score

s170559 · July 2, 2024, 4:45pm

Hello

I’m interested in knowing if it’s possible to sort the output of your results using a specific parameter other than the similarity score.

Currently, I retrieve the ID numbers of the xx newest items from my relational database and use these IDs as input filters to list the corresponding vectors, ensuring I get the latest entries. Ideally, I’d like to merge the capabilities of the relational database with the vector database to achieve optimal results.

Essentially, I want to directly sort the results based on a time parameter stored as metadata for my vectors, similar to how SQL allows for ordering.

The logic behind this is to have a list of documents (vectors). If the user has not input any search query for a similarity search, the system should default to showing the newest documents in the list. Additionally, if the user has selected a suboption, it should default to the newest documents within that suboption.

I could show this using a relational database, but i would love to skip this step

Regards

Oscar

zeke_pinecone · July 8, 2024, 4:03pm

Hi @s170559, thanks for your post. This is an interesting use case, and I’ll be sharing the details with our product team for their consideration.

At this time, query results must be sorted as a post-processing step. Beyond our metadata filtering capabilities, other filtering steps must also be performed in post-processing.

s170559 · July 21, 2024, 10:40am

Thanks for answer.
I also have another question if that is okay

I use Pinecone for both a RAG chatbot, which works really well, and as a tool to search and retrieve documents. These documents are usually long and split into 20-200 chunks, with each chunk having metadata like type, date, UnixDateTime, title, ChunkNumber and TotalChunkNumber of the document.

When I search for documents, I retrieve the top results. The issue is that it returns multiple chunks from the same document, but I only want one chunk per document so the one with the highest matching score. Currently, I retrieve the top 50 to 100 results and then filter them to ensure I have a unique list based on metadata like title, date, and a dynamic parameter. I use this dynamic parameter in my URL to retrieve the entire document.

Is there a way to prefilter the output to avoid post-filtering while still searching for all chunks? I could set a metadata filter to only find chunk number 1, but then I wouldn’t be able to search the remaining chunks of the document, which defeats the purpose.

Here is a link to my website where i have the autocomplete where I can search and find the document.

I hope this makes sense