Normailized Score when using hybrid search (which is only supported now with DOT product instead of cosine similarity)

Hi there,

Is there a way to get a normalized score back from a hybrid search query with bm25 and dense embeddings?

I can normalize it after the request but the values are pretty diverse ([1.0, 0.15, 0.143, 0.14, 0])

When i use cosine similarity the values are in range [0.88, 0.87, 0.85, 0.83]. This is what I would need also for hybrid search.

The error message what I get when I want a normalized cosine similarity is:

HTTP response headers: HTTPHeaderDict({‘content-type’: ‘application/json’, ‘date’: ‘Fri, 25 Aug 2023 07:38:39 GMT’, ‘x-envoy-upstream-service-time’: ‘1’, ‘content-length’: ‘86’, ‘server’: ‘envoy’})
HTTP response body: {“code”:3,“message”:“Index configuration does not support sparse values”,“details”:}

Anybody has a hint to achieve this?

Thanks in advance Mike

Hi @Mike,

First of all, it is important to note that Pinecone only supports hybrid search with dot product as the similarity metric. This is the reason behind the error you encountered. To achieve normalized scores with hybrid search that incorporates BM25, here’s what you can do:

  • Dense Vectors: use an embedding model that yields normalized scores. This way, you won’t need the cosine metric. For instance, OpenAI’s ada-002 is already normalized, and many sentence transformer models can be configured to produce such scores.

  • BM25 Values: If you’re using the pinecone-text library, the BM25 scores are already normalized between 0-1. However, it’s worth noting that the nature of BM25 scores differs from that of dense vectors. While embedding models are designed to generate scores ranging from 0 to 1 - reflecting the likelihood of a document’s relevance to the query - BM25 normalization merely divides by the maximum possible score a document can achieve for a given query. This means the score distribution for BM25 might vary significantly from one query to another.

It’s worth noting that the convex combination of hybrid scores is basically a heuristic and may behave differently for individual use cases. The best approach is to experiment with your data and see what works well. If your objective behind normalization is to filter results based on a certain threshold, you might want to consider using a re-ranker, and filter results based on its logits.

I hope this clarifies things.

Best regards,
Amnon