Hybrid search for large documents

I’m attempting to build a hybrid search engine based on the method you described in this example (https://www.pinecone.io/docs/examples/basic-hybrid-search/), passing word tokens as metadata.

However, my content is mainly large documents, so my metadata size is exceeding the limit of 10,240 bytes per vector.

Is there a workaround for this issue?

Hi @christian.adib,

You might want to take a look at @jamesbriggs’ article on hybrid search, where he goes into detail about combining dense and sparse vectors, rather than metadata tokens. This would require using our new hybrid index, which is currently in private preview. If you’re interested in signing up for that I can share your email with our Product team and get you added. Just let me know either way. But that would be the way to handle such large documents, rather than using metadata tokens.


Thanks for this, Cory. Definitely am interested.

My email is: christian.adib@gmail.com

Great, I’ve shared your details with our Product team! Someone should be in touch with regarding next steps soon, if they haven’t already.