Is it possible to use SBERT models to understand the extent of coverage of a query sentence across the whole document? For example, if a query sentence consists of ‘X and Y’ and one sentence in the document refers to ‘X’ and a separate sentence somewhere else in the document refers to ‘Y’. Or is the best you can do is just return document sentence 1 and 2 as vaguely similar sentences?
1 Like
You can encode all of the sentences in the document using SBERT and then perform a similarity search with sentence X and return the most similar other sentences in the document, say the top 100 most similar, and set a similarity threshold where any sentences over the threshold you classify as relevant and anything below is discarded.