Fine-tune sentence-transformers model using MNR loss

Hello,

I’m a student working on a thesis about neural retrieval and I saw Pinecone’s article about the MNR loss. I’m trying to fine-tune a sentence-transformer model using the MNR loss but I’m having some troubles implementing it. In my problem I have a set of clinical trials and a set of patient topics (queries) and the ideia is to retrieve the most relevant trials for each topic. I have the relevance judgments, a list of pairs (topic, trial) and the score (0 - not relevant, 1 - eligible). In this relevance judgments list, each topic has multiple trials associated with the respective score, for instance [(topic_1, trial_10, 1), (topic_1, trial_50, 0), …, (topic_2, trial_100, 0), (topic_2, trial_10, 1), …]. Only 10% of these pairs are positive (score = 1, eligible), all others have score 0 (not relevant). In sentence-transformers documentation, it says that one should only use positive pairs.

I’m having a hard time understanding this in-batch negatives for my case, how to prepare the data and apply the MNR loss. One trial can be relevant for a topic (score = 1) and also be relevant for other topic (anchor). Could you please help me? I’m stuck here. I understood the idea in the Pinecone’s article but I’m very confuse when applying it to my data.

Using a real case scenario, with a batch size of 5:

TopicID TrialID Score
20141 NCT00005485 1
201518 NCT00005485 1
20141 NCT00005127 1
20141 NCT00149227 0
201527 NCT00166231 1

(When feeding the model I get the topics and trials texts not the ids)

In this batch the trial NCT00166231 would be considered negative to the topic (anchor) 20141, but this could also be a positive pair instance in the dataset (outside this batch). It would also have duplicates because the trial NCT00005485 appears as positive to the topics 20141 and 201518. There is also the chance of having a batch with only negative pairs because only 10% of the instances in the dataset are positive pairs.

If someone could help me I would appreciate it very much.

Thank you for your time.

Regards,
João

1 Like

How do we compute the nDCG@10 similar to the original GPL paper? are there other metrics to evaluate them?