Sparse Dense Retrieval metrics


We’ve implemented Sparse / Dense retrieval in Pinecone and it seems to work well for our specific use-case. However, there is one issue we’re not sure how to deal with: the similarity metrics. We consistently get dotproduct similarity scores above 1. Assuming this is indeed measuring cosine similarity - what is going on here? Supposedly the max cosine similarty metric = 1.

We’re looking into implementing an observability feature but we need to learn how to interpret these sparse / dense values.

Any help greatly appreciated.

def normalize(v):
norm = np.linalg.norm(v)
if norm == 0:
return v
return v / norm

1 Like

Thanks. Will try that out and report back!