Excessive Input Token Usage in Pinecone Assistant Queries and Cost Concerns

I am exploring Pinecone Assistant for a specific use case and uploaded 5 files, each approximately 5 KB in size, containing around 3,000 characters per file.

When I query using the Assistant console, I noticed that the input token count increases by nearly 20,000 per query. This count grows further as the chat history thread expands. At this rate, 50 queries would result in 1 million input tokens. Given that the cost is $8 per million tokens, this would make just 50 queries cost $8, making the product financially unfeasible for our use case.

Is there a way to optimize token usage so that only the necessary chunks related to each query are considered? This would help significantly reduce the token consumption.

Hi @info12,

Welcome to the Pinecone community!

It’s not currently possible to control number of input tokens, but it’s on our roadmap. We expect to get to it very soon. In the meantime, you could use use the API and send a smaller portion of the chat history in your requests.

Hope that helps. Please let me know.

Best,
Jesse