I’m currently exploring advanced search capabilities within Pinecone and have a couple of inquiries that I hope to gain deeper insights into:
BM25 Parameters in BM25Encoder: I’m interested in the specifics of the BM25Encoder in pinecone_text, particularly the k and b parameters. Could anyone share what the default values are for these parameters?
Hybrid Search Logic Using Sparse-Dense Vectors: Moving on to hybrid search strategies, I am curious about the methodology behind integrating sparse and dense vectors, especially regarding the computation of BM25 scores in such scenarios. How does Pinecone handle this blend, and what is the logic behind the BM25 score calculation? If normalization is used, what kind is employed?
Any insights or detailed explanations on these topics would be greatly appreciated, as they would significantly contribute to optimizing my implementation and understanding of Pinecone’s search functionalities.
pinecone-text is opensource so feel free to have a look at the code yourself, if you like.
I’m interested in the specifics of the BM25Encoder in pinecone_text, particularly the k and b parameters. Could anyone share what the default values are for these parameters?
Hi @patrick1
Thank you for the detailed response.
I appreciate you sharing the default values for the BM25Encoder parameters.
Thanks again for taking the time to help me out!
I am also interested in the “fusion” techniques used to combine sparse and dense vectors. Is it a linear combination, or is RRF(Reciprocal ranked fusion) or some other algorithm? Simply put: How do the two results get combined into one?
Edit:
The blog mentioned by Patrick only mentions an alpha/weighted parameter. Any deeper insight into this how the calculation is done?