Pinecone assistant used a lot of prompt token

wibu1892001 · November 5, 2024, 11:12am

Sorry for asking such questions, I’m a beginner.

I uploaded a rather big file with 16.000 characters and I think it’s the cause of the excessive tokens usage.

Do I have to divide the document into smaller chunks like what I did with the embeddings in the database? like for example, I divided the document into 30 smaller chunks with 500 words each and uploaded to the database, if I do the same with documents in the assistant, it’ll be 30 separated files

What is the proper way to reduce the usage of token?
ty

zeke_pinecone · November 8, 2024, 9:03pm

Hi @wibu1892001, thanks for your post, and there’s no need to apologize! Token usage is unrelated to the size of the files you upload to an assistant. Tokens are only consumed by actions that involve an LLM.

Pinecone Assistant consumes input tokens for both planning and retrieval. Input token usage is calculated based on the chat history, the document structure and data density (e.g., how many words are on a page), and the number of documents that meet the filter criteria. Output tokens are the number of tokens generated as part of the answer generation.

For more details, see Understanding token usage.