Hi - I’m considering implementing SPLADE to improve average performance over just dense vector searching. I’m going off of James Briggs’ video/tutorial: Medical Search Engine with SPLADE + Sentence Transformers in Python - YouTube
Couple of questions:
- James mentions embedding models should be dotproduct, yet I see with indexes in Pinecone, you can specify euclidian or cosine. From my research, cosine is better for semantic search of paragraphs. Can I use cosine index with SPLADE techniques?
- How does the actual ranking occur within the Pinecone engine. Obviously, instead of just a dense_vector, both the dense_vector and sparse_vector are considered. What formula combines the results to arrive at a result ranking score?
Thank you!