General Overview: We are currently encountering issues while fetching chunks from the Pinecone database using a hybrid-search approach. The following code snippet illustrates the method we are using to scale our hybrid search query:
def hybrid_scale(self, query, alpha: float):
if alpha < 0 or alpha > 1:
raise ValueError("Alpha must be between 0 and 1")
hsparse = {
"indices": query["sparse_vector"]["indices"],
"values": [v * (1 - alpha) for v in query["sparse_vector"]["values"]],
}
hdense = [v * alpha for v in query["vector"]]
return {"vector": hdense, "sparse_vector": hsparse}
The Problem: Our pipeline leverages the BM25 model for sparse embeddings and the jinaai/jina-embeddings-v2-base-en for dense embeddings.
Given a user query like: “I want to know more about the features introduced in v1.42,” we aim to retrieve all chunks related to v1.42 that discuss the features introduced.
However, we are experiencing the following issues:
Imbalance in Relevance: When using an alpha value of 0.1, the retrieved chunks are related to feature updates in general, but they do not specifically pertain to v1.42.
Loss of Semantic Relevance: When we decrease the alpha value to 0.01, we do get chunks related to v1.42, but they lack semantic relevance, thus failing to capture the contextual information we need.
Irrelevant Chunks: When using an alpha value greater than 0.1, the retrieved chunks do not discuss v1.42 at all, losing the keyword matching capability entirely.
In essence, adjusting the alpha value slightly leads to a significant trade-off between keyword matching and semantic relevance.
Expected Behavior: We expect the hybrid search to balance both the keyword matching (v1.42) and the semantic relevance (features introduced) efficiently, without compromising one for the other.
Request for Support: We are seeking guidance on how to fine-tune our hybrid search parameters or any alternative approach to achieve a better balance between sparse and dense embeddings. Any insights, suggestions, or examples of how others have tackled similar issues would be greatly appreciated.
We are also using hybrid search Retrieval with single hybrid indexing and has used hybrid score normalization method similar as you.
These are the issues that we have been facing when we set the alpha < 0.5:
When the alpha value is set to less than 0.5, the relevance of the score should ideally be more towards keyword search but it is always returning the top chunks based on the semantic match.
Even if there is no keyword match in the chunk, it returns that chunk on top with higher score, compared to the chunk having the keywords. Hence giving more weight towards semantic.
There are chunks which have some semantic matching and also keywords from the query but they are still ranked lower to the chunks with just the semantic matching.
We are using BM25 encoding to embed sparse vectors and text-embedding-3-large to embed dense vectors with single hybrid index.
To incorporate a linear weighing to both vector types, we use this approach
def hybrid_score_norm(dense, sparse, alpha: float):
“”"Hybrid score using a convex combination
alpha * dense + (1 - alpha) * sparse
Args:
dense: Array of floats representing
sparse: a dict of `indices` and `values`
alpha: scale between 0 and 1
"""
if alpha < 0 or alpha > 1:
raise ValueError("Alpha must be between 0 and 1")
hs = {
'indices': sparse['indices'],
'values': [v * (1 - alpha) for v in sparse['values']]
}
return [v * alpha for v in dense], hs
So tuning the hybrid search with alpha values < 0.5, isn’t working well for us. The relevance of the results is not satisfactory.
What type of indexing were you using to store the dense and sparse vectors on Pinecone?
Pinecone recommends using separate indexing for sparse and dense and then search each index separately, combine and deduplicate results and use a reranking models to rank the chunks.
Unfortunately embeddings are a black box so it is very difficult to determine what’s going wrong here. I would suggest trying out different embeddings like SPLADE instead of BM25. SPLADE has given us better performance.
We also ended up fixing the alpha value to 0.5 which gave us the best performance.
It’ll probably be easier to debug this issue, if you can give a few examples of your chunks.