Sparse Dense Retrieval metrics

Nico · July 10, 2023, 7:43am

Hi,

We’ve implemented Sparse / Dense retrieval in Pinecone and it seems to work well for our specific use-case. However, there is one issue we’re not sure how to deal with: the similarity metrics. We consistently get dotproduct similarity scores above 1. Assuming this is indeed measuring cosine similarity - what is going on here? Supposedly the max cosine similarty metric = 1.

We’re looking into implementing an observability feature but we need to learn how to interpret these sparse / dense values.

Any help greatly appreciated.

LarryStewart2022 · July 14, 2023, 8:48pm

def normalize(v):
norm = np.linalg.norm(v)
if norm == 0:
return v
return v / norm

Nico · July 14, 2023, 9:34pm

Thanks. Will try that out and report back!