Embedding model for hybrid searches?

In this doc: Vector Similarity Explained | Pinecone, it says:

The basic rule of thumb in selecting the best similarity metric for your Pinecone index is to match it to the one used to train your embedding model .

I am of the understanding of the following two facts:

  • OpenAI models are trained on cosine similarity.
  • Hybrid indexes must use dotproduct metric.

Should I switch to a different embedding model that was trained on dotproduct? Are there any such models? The score variable for reasonably good matches I’ve seen is 30 or 40, which is a weird scale.

Hi @winston1, thanks for your post! Yes, a dotproduct-trained embedding model will perform better with a hybrid search, since the dotproduct distance metric is required for hybrid search.

Not all of the OpenAI models are cosine-trained, though. From our Model Gallery:

Please note the above is a nonexhaustive list.

Here are some additional resources for hybrid search: