Changes in docs related to hybrid search

saadk8 · April 10, 2025, 3:22pm

Okay, so i have a use case, which involves implementing hybrid search. Now previously, i implemented hybrid search using llama-index(docs llamaindex ai/en/stable/examples/vector_stores/PineconeIndexDemo-Hybrid/) and pinecone as the vector DB. Now the index that was created was of Dense type on Pinecone. Now because i used llama-index, i didn’t really had much control or know what was happening apart from specifying dense and sparse embeddings model (the sparse embedding model i used was prithivida/Splade_PP_en_v1). Now the thing is, i am looking to build custom, without using Llama-index. I saw the docs from the two URLs below:

docs pinecone.io/guides/indexes/pods/understanding-hybrid-search
and after clicking “create a hybrid index”, i was routed to:
docs pinecone.io/guides/indexes/create-an-index

– Now the thing is, I am now getting the impression, that i have to create two different indexes (sparse and dense) on pinecone and then query them. Now I couldn’t find any resource on how to combine this, if you have any relevant documentation, please have it shared with me. One more thing is that i checked out all the three notebooks in this URL:

github com/pinecone-io/examples/tree/master/learn/search/hybrid-search
but this combines both the sparse and dense vectors in a single index. So i am still curious and confused on how to combine both of the results if i use two different indexes(sparse index and vector index). Or should I proceed with the method presented in these notebooks. Another thing i noticed is, that now while creating an index on Pinecone(dashboard), I specifically need to mention if the vectors are of sparse or dense type, which means either the approach mentioned in these notebooks are deprecated or maybe not, I dont know. Please clear up this confusion. I will be really thankful to you guys. Also, if you need anymore information from my end, i will be more then happy to provide, if that results in resolving my issue. Thanks

jenna · April 18, 2025, 5:18pm

Hi @saadk8 - Thank you for reaching out and pointing out some of the confusion in our documentation and notebooks! We’re working to get this straightened out.

In the meantime, I would suggest approaching this with two separate indexes - a sparse index and a dense index. You’re right in that you’d have to search each independently, rerank, merge, de-duplicate. However this approach, with two indexes, gives you the most flexibility and control long-term. You could add keyword search to an existing workload, leaving the dense-only index and adding records to a sparse-only index. You can review how querying is done in this code example.

It is possible to do a single sparse-dense index as you were doing previously, by creating both the sparse and dense embeddings separately and then upserting them together into a dense index but we recommend the two index approach.