Training Sentence Transformers with MNR Loss

Transformer-produced sentence embeddings have come a long way in a very short time. Starting with the slow but accurate similarity prediction of BERT cross-encoders, the world of sentence embeddings was ignited with the introduction of SBERT in 2019 [1]. Since then, many more sentence transformers have been introduced. These models quickly made the original SBERT obsolete.

How did these newer sentence transformers manage to outperform SBERT so quickly? The answer is multiple negatives ranking (MNR) loss.

This is a companion discussion topic for the original entry at

Thanks for the interesting and useful article!
We applied the Fast Fine-Tuning section on our data and got some nice results.
Why did you use only one epoch? do you recommend to increase the number of epochs for getting even better results?

1 Like

That’s great! It depends on your data and use-case but when you have a pretrained transformer, you can usually fine-tune on just one epoch and get optimal performance. For a lot of the sentence transformer models this is the standard but not always the case, so it’s best to experiment.

Thanks! we may try other configurations.

‫בתאריך יום ג׳, 12 ביולי 2022 ב-18:54 מאת ‪James via Pinecone Community‬‏ <‪‬‏>:‬

Pretty helpful guide.Thank you for sharing. Couple of questions

  1. When we are fine tuning with sentence-transformers, we are not explicitly training a FF network. Is that done in the backend in
  2. What would be our approach when numbers are present. For example,
    “A’s height is 1 ft more than B’s height” compared to
    “A’s height is 2 ft more than B’s height”