Fine-tuning effective dense retrieval models is challenging. Bi-encoders (sentence transformers) are the current best models for dense retrieval in semantic search. Unfortunately, they’re also notoriously data-hungry models that typically require a particular type of labeled training data.
Hard problems like this attract attention. As expected, there is plenty of attention on building ever better techniques for training retrievers.
This is a companion discussion topic for the original entry at https://www.pinecone.io/learn/genq