Need page number links instead of the full document URL from pinecone search

info6 · January 10, 2024, 6:11am

I am using pinecone for my RAG model. It gives me results to my response with the File URL but not a navigation to a specific page number. How can I fix that?

Additional Info:
I am using AWS knowledgebase functionality with pinecone. I submit my PDF to an AWS S3 bucket, than use Titan-Embedding Key1 Text and Pinecone.

Currently, Pinecone creates its embeddings that point back to the document, not to specific pages. Is there a way to let Pinecode (or AWS/Titan) do this with some configuration?

Cory_Pinecone · January 31, 2024, 8:51pm

Hi @info6. Just to clear up some confusion, Pinecone doesn’t generate embeddings (unless you’re using Canopy, but even then it calls OpenAI or other LLM you’re using). In this case, Titan is generating the embeddings, which you’re storing in Pinecone.

What’s returned from a given query depends on the context of the vectors that are being stored. If you’re embedding an entire document at once, there’s not really a good way to identify exactly which page a given vector of embeddings came from. Instead, you would want to chunk based on page, or even paragraph, and store the page number as metadata in the index. This way, you can easily refer back to it later.

There’s also this tool, which was just announced the other day. Full disclosure, this is not a Pinecone-created tool, so we wouldn’t be able to give direct help on any questions about using it. But it promises to do exactly what you’re looking for.