Hello Pinecone Community,
Let’s take the example of a mid-size vector database with 50k vectors. When you retrieve the most relevant documents with respect to a given query, you get the vectors not the original documents. To retrieve the original documents, you have two options:
- either attaching the original document into the metadata. So Pinecone returns the most relevant vectors with their document attached in the metadata.
- Or storing only the id of the document in the metadata, and storing the original document in a separate database that you query in a second stage to get the document associated to the id you got with pinecone.
Which solution do you recommend ?
I guess the first solution is more expensive in terms of metadata size (but the second solution necessitates to buy a second database), is it also slower due to the metadata attached to each vector?
Hi @paulphilippelouis.pa, and welcome to the Pinecone community forums!
Thank you for your excellent question.
You’ve already correctly identified many of the key tradeoffs. I would say it really comes down to you requirements, use case and what you are trying to build.
For most document backed Retrieval Augmented Generation pipelines, attaching the original source text of your chunk as metadata to your vectors is going to work very well, be manageable from a code perspective and not add appreciable overhead or latency.
If you are building a system where security is paramount, you may prefer to practice defense in depth by intentionally only storing foreign keys to full records in your metadata - this way even if your entire application were compromised, malicious actors would need to do more work to obtain sensitive records.
Meanwhile, the Pinecone API supports optionally sending or omitting both vectors and metadata on a per-request basis, so once you reach the scale were this latency is likely to become noticeable, you have additional tools to manage it.
For prototyping, small apps, small to medium RAG pipelines, chatbots, etc, I’d recommend starting out by adding your text chunk to your metadata, because it simplifies referring to that information within your application.
Hope that helps, and great question!
Also curious to hear others’ opinions here - what has been working for you?
Best,
Zack
1 Like
Hi @ZacharyProser
Thank you for your response. @paulphilippelouis.pa Great analysis and I am sure many of us would have this question.
One problem in using - attaching the original document into the metadata is - HTTP response body: {“code”:3,“message”:“Metadata size is 363060 bytes, which exceeds the limit of 40960 bytes per vector”,“details”:[]}
So is it fine to believe that for heavy documents this approach is not correct.
Hi @ashish.tyagi,
Thank you for your insightful follow-up question. You are correct that it’s critical to consider the size limits of a request to Pinecone.
Part of the reason we chunk larger documents is to achieve more granular and accurate retrieval of source texts. Still, it has the added benefit of reducing the overall payload size sent to Pinecone in each request.
You can read our recommended chunking strategies here.
Hope this helps!
Best,
Zack