Rerank tokens calculation

Running the new rerank API I’m getting the following error:

{"error":{"code":"INVALID_ARGUMENT","message":"Request contains a query+document pair with 1417 tokens, which exceeds the maximum token limit of 1024 for each query+document pair."},"status":400}

I’d like to know how does rerank API calculates the number of tokens, thanks

@silas

Hi @falkor, and thanks for your question.

I’ve asked internally for confirmation on how the token count is derived. In the meantime, are you seeing significantly different token counts yourself locally?

What does your data look like? How are you pre-processing data and how are you forming the call to the endpoint?

Best,
Zack

Hey @falkor,

Thanks for trying out the Rerank API!

We don’t currently expose the tokens in the API, but this snippet should give you an idea of how tokens are calculated for the purpose of the limit you’re hitting.

The query is cross-encoded with each document, and the 2nd dimension of the resulting tensor shape represents the overall length of each resulting sequence.

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-m3')

query = "What is a panda bear?"
pairs = [
    [query, 'hello'], 
    [query, 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']
]
inputs = tokenizer(pairs, padding=True, return_tensors='pt')
inputs['input_ids'].shape

# >> torch.Size([2, 45])

Both a longer query or longer documents will cause the length to increase.

That being said, we’re still working on dialing in our system limitations/safeguards as well as getting the API responses right. Ideally we will be adjusting the limit you’re hitting so that it’s much more accommodating of larger requests.

Would you be able to share a bit more about your use case, in terms of number of documents, and approx. query size and document length? Just roughly by number of words would be fine, if you don’t have an idea of the token length.