Am I the only one who feels that the Context API is not hallucination-free while we use it through accessing of MCP server/client versus far accurate answers with direct usage of the Pinecone Assistant client?
I was using it with one of my projects that comprises of using the Langchain and LangGraph for multi-MCP server routing implementation and I found the accuracy of answers to be a bit degraded(inaccurate) while using the Context API than the answers found directly from Pinecone Assistants client.
Could you explain your implementation further? The Context API only returns the snippets of text that, when using the Assistant API directly, would be passed to an LLM to generate responses. So, it can’t hallucinate (insofar as generation is concerned), as it’s directly returning text that is relevant to your query.
However, it’s possible you are sending this context to your own LLM instance, which is generating a response that is hallucinating. Is this what is happening?
Let us know how you’ve integrated the Context API, and we can help further!
Please review the three types of implementations that were done. On two occasions the answer doesn’t hallucinate but in one implementation type, the answer hallucinates. I completely agree to what you are saying; that Context API would only refer to the probable texts to derive context and help LLM answer and the rest of it is for the LLM to decide. In this case, in all three implementations, gpt-4o model was used along with the Pinecone Assistant name remaining constant in all three. I don’t see any instructions/system prompt overkill either so far so to modify the answer but something seems off in implementation 3. Implementation 3 was done in a hybrid manner that use Pinecone Assistant client(to save on Contextual API token spends) + Langchain and Langgraph agent for multi-tool routing functionality + MCP client/server tool calling. Let me know if you have any questions or any info that you would need further.
Please check the different answers emanating from the same Pinecone Assistant below:
I’m still catching up, but are you just using Pinecone Assistant in that third application? I don’t see a context call anywhere in that script (you’d do .context for that) If that’s the case, that could explain the difference in generation. In any case, the context API is not degrading your responses. Rather, there’s a difference in the generation between #1, #2 and #3 that is causing the discrepancy. And, the #3rd implementation includes a combination of techniques outside of assistant which may be causing the issue.
Am I understanding this correctly? It seems like the first direct call, and using the MCP context tool both work, but somehow using the assistant generation API with your custom architecture is causing problems.
Yes, your understanding is correct. In the third application, I am using the combination of Pinecone Assistant and an external MCP for Woocommerce(nothing to do with Pinecone). As I mentioned earlier, I want to save the Context API token usage, so not using the Pinecone Assistant MCP endpoint directly like what I did for #2 code. That’s why you don’t find any Context call anywhere in #3 code.
The custom architecture is the usage of Langchain and LangGraph to not only wrap the Pinecone Assistant client as a function call but also to incorporate the routing tool logic that includes switching between Pinecone Assistant and Woocommerce MCP. Does that make sense and possible? I want to avoid using Context API, but moreover, as I am not using the Pinecone Assistant MCP, so there is no reason to use it.
Do you know if the agent did a context call to your assistant? I don’t have access to the output and I do see a print in query_pinecone_knowledge_base. In the context call you have references in the response.