Hybrid search for large documents

christian.adib · November 15, 2022, 4:39pm

I’m attempting to build a hybrid search engine based on the method you described in this example (https://www.pinecone.io/docs/examples/basic-hybrid-search/), passing word tokens as metadata.

However, my content is mainly large documents, so my metadata size is exceeding the limit of 10,240 bytes per vector.

Is there a workaround for this issue?

Cory_Pinecone · November 15, 2022, 5:08pm

Hi @christian.adib,

You might want to take a look at @jamesbriggs’ article on hybrid search, where he goes into detail about combining dense and sparse vectors, rather than metadata tokens. This would require using our new hybrid index, which is currently in private preview. If you’re interested in signing up for that I can share your email with our Product team and get you added. Just let me know either way. But that would be the way to handle such large documents, rather than using metadata tokens.

Cory

christian.adib · November 15, 2022, 6:08pm

Thanks for this, Cory. Definitely am interested.

My email is: christian.adib@gmail.com

Cory_Pinecone · November 15, 2022, 8:47pm

Great, I’ve shared your details with our Product team! Someone should be in touch with regarding next steps soon, if they haven’t already.

Cory