I am exploring Pinecone Assistant for a specific use case and uploaded 5 files, each approximately 5 KB in size, containing around 3,000 characters per file.
When I query using the Assistant console, I noticed that the input token count increases by nearly 20,000 per query. This count grows further as the chat history thread expands. At this rate, 50 queries would result in 1 million input tokens. Given that the cost is $8 per million tokens, this would make just 50 queries cost $8, making the product financially unfeasible for our use case.
Is there a way to optimize token usage so that only the necessary chunks related to each query are considered? This would help significantly reduce the token consumption.