How to best use Sentence Encoders

How do others handle encoding their data with USE(Universal Sentence Encoder ) or SBERT? Say my dataset is a 4 sentences per item and 100 items, do you process 400 at a time, or combine the 4 into 1 long sentence and process 100 at a time, or process each sentence one at a time. I could not determine when you really want to opt for one over the other. Finally how does it scale to full blown documents say a pdf with 20 pages… I think there is also a limit on the size of the sentence…

Let me give a use case for each scenario:

I have a game with complex rules on each card of the game. The game has thousands of cards. I want to import this data set with additional metadata of each card into the DB for search. Do I process the encoding of the rules on each card once card at a time, several cards batched together, or all of them at once(which may not be possible since its thousands of cards)

I have 10,000 pdf’s i want to do search against them, Do I:

  • Use the encoder and send in a batch of each paragraph in each pdf?
  • Use the endoder and send the all sentences in the pdf in one batch. What happens with really large pdf’s.

I’m not quite sure how to best handle this.