Hi there,
I created a Pinecone Assistant for a set of ~ 1000 reports.
The reports are in JSON format and small (~5kb each) and each includes the same 15 fields e.g. a Headline, Summary, Keywords, Dates, and Citations/References. The issue is around Citations. My prompt includes instructions telling it to look in the “Citations” field, if it’s asked ‘What reports cite ABC?’ for example.
I’m getting very inconsistent results (way worse with Claude than with GPT-4o, so I’m using the latter). Sometimes I get 1 result back saying ‘This is the only report that cites ABC’, even when I ask for ‘ALL reports that cite…’, and other times getting 6 such reports back (there are actually 24 reports it should return).
I’m wondering if there’s something I’m missing like should I set up indexes (if I read them correctly, the docs say I don’t need to for an Assistant)?
Or is there a limit to how many reports Pinecone will search through before returning results?
We recently provided a new API to control number and size of text snippets returned from search and passed as context to the LLM (docs). To understand how it works and find the best parameters for your use case I recommend to first work with the context API that basically runs exactly the same pipeline that generates the context for LLM in chat calls. I recommend trying top_k=30 and a snippet size that is larger than your report jsons so you have the full reports as context for the LLM.
Hope this could help you get better results. Please let us know how it goes - we are here to help.
Is there any way to control the context in this case? Is there also a way to stream the response back to my Glide App or speed up the ‘batch’ response otherwise, as it is taking 25-30 seconds waiting for response?
From looking at Glide’s docs it seems like they currently not supporting context options and streaming. We can try to ask for those features to be added (we are not controlling this integration). Meanwhile you can try the official SDK.