Excessive Input Token Usage in Pinecone Assistant Queries and Cost Concerns

info12 · February 7, 2025, 8:33am

I am exploring Pinecone Assistant for a specific use case and uploaded 5 files, each approximately 5 KB in size, containing around 3,000 characters per file.

When I query using the Assistant console, I noticed that the input token count increases by nearly 20,000 per query. This count grows further as the chat history thread expands. At this rate, 50 queries would result in 1 million input tokens. Given that the cost is $8 per million tokens, this would make just 50 queries cost $8, making the product financially unfeasible for our use case.

Is there a way to optimize token usage so that only the necessary chunks related to each query are considered? This would help significantly reduce the token consumption.

jesse · February 7, 2025, 9:13pm

Hi @info12,

Welcome to the Pinecone community!

It’s not currently possible to control number of input tokens, but it’s on our roadmap. We expect to get to it very soon. In the meantime, you could use use the API and send a smaller portion of the chat history in your requests.

Hope that helps. Please let me know.

Best,
Jesse

jasonhalpern8 · April 11, 2025, 1:52pm

After experimenting with Pinecone Assistant, it is great, but the cost makes it unusable in a production sense. Even simple queries of mine take about 15-20k input tokens. At $8 per 1 million tokens, I’m looking at 16 cents per query. The same setup with OpenAI’s vector store costs me about 2-3 cents.