Accessing Context API usage via MCP server/client

arubasak · June 22, 2025, 7:01pm

Am I the only one who feels that the Context API is not hallucination-free while we use it through accessing of MCP server/client versus far accurate answers with direct usage of the Pinecone Assistant client?

I was using it with one of my projects that comprises of using the Langchain and LangGraph for multi-MCP server routing implementation and I found the accuracy of answers to be a bit degraded(inaccurate) while using the Context API than the answers found directly from Pinecone Assistants client.

arjun · June 22, 2025, 11:25pm

Hi @arubasak, welcome to the forum!

Could you explain your implementation further? The Context API only returns the snippets of text that, when using the Assistant API directly, would be passed to an LLM to generate responses. So, it can’t hallucinate (insofar as generation is concerned), as it’s directly returning text that is relevant to your query.

However, it’s possible you are sending this context to your own LLM instance, which is generating a response that is hallucinating. Is this what is happening?

Let us know how you’ve integrated the Context API, and we can help further!

Sincerely,
Arjun

arubasak · June 23, 2025, 9:03am

Hi Arjun,

Thank you for your reply!

Please review the three types of implementations that were done. On two occasions the answer doesn’t hallucinate but in one implementation type, the answer hallucinates. I completely agree to what you are saying; that Context API would only refer to the probable texts to derive context and help LLM answer and the rest of it is for the LLM to decide. In this case, in all three implementations, gpt-4o model was used along with the Pinecone Assistant name remaining constant in all three. I don’t see any instructions/system prompt overkill either so far so to modify the answer but something seems off in implementation 3. Implementation 3 was done in a hybrid manner that use Pinecone Assistant client(to save on Contextual API token spends) + Langchain and Langgraph agent for multi-tool routing functionality + MCP client/server tool calling. Let me know if you have any questions or any info that you would need further.

Please check the different answers emanating from the same Pinecone Assistant below:

Hallucination-free implementation Code 1 - A straight-forward Pinecone Assistant Client usage(Github): arubasak/fifi-co-pilot/blob/main/fifi.py

Hallucination-free implementation Code 2 - Langchain and LangGraph implementation for routing multi-GCP tools: fifi-mcp/fifi.py at main · arubasak/fifi-mcp · GitHub

Non-hallucination free implementation Code 3 - Combination of Pinecone Assistant client + Multi tool function switching including one MCP server/client : fifi-pa-mcp-co-pilot/main.py at main · arubasak/fifi-pa-mcp-co-pilot · GitHub

Screenshot of implementation Code 2 and 3(in order):

arubasak · June 23, 2025, 9:04am

Streamlit Output chat UI 2: https://fifi-mcp-lang.streamlit.app/
Streamlit Output chat UI 3: https://fifi-pa-mcp-co-pilot.streamlit.app/

arubasak · June 23, 2025, 9:09am

Complete link of Implementation code 1 - Github(missing earlier):

github.com/arubasak/fifi-co-pilot

fifi.py

main

import streamlit as st
from pinecone import Pinecone # This will be used by the plugins
from pinecone_plugins.assistant.models.chat import Message
from pinecone_plugins.assistant.control.core.client.exceptions import PineconeApiException
import traceback # Kept for potential local debugging if needed
import datetime
# from fpdf import FPDF # Removed FPDF import

# --- Configuration from Streamlit Secrets ---
try:
    API_KEY = st.secrets["PINECONE_API_KEY"]
    ASSISTANT_NAME = st.secrets.get("PINECONE_ASSISTANT_NAME", "fifiv1")
    REGION = st.secrets.get("PINECONE_REGION", "us") # This might be used by Pinecone() or Assistant()
except KeyError as e:
    st.error(f"Missing critical secret: {e}. Please ensure this secret is configured in your Streamlit Cloud app settings or local secrets.toml.")
    st.stop()
except Exception as e:
    st.error(f"Error loading secrets: {e}. This app requires secrets to be configured.")
    st.stop()

This file has been truncated. show original

Streamlit Output chat UI 1(missing earlier): https://fifi-co-pilot.streamlit.app/

arjun · June 23, 2025, 2:16pm

Thanks for the extra info!

I’m still catching up, but are you just using Pinecone Assistant in that third application? I don’t see a context call anywhere in that script (you’d do .context for that) If that’s the case, that could explain the difference in generation. In any case, the context API is not degrading your responses. Rather, there’s a difference in the generation between #1, #2 and #3 that is causing the discrepancy. And, the #3rd implementation includes a combination of techniques outside of assistant which may be causing the issue.

Am I understanding this correctly? It seems like the first direct call, and using the MCP context tool both work, but somehow using the assistant generation API with your custom architecture is causing problems.

arubasak · June 23, 2025, 3:45pm

Thank you again for replying so quick!

Yes, your understanding is correct. In the third application, I am using the combination of Pinecone Assistant and an external MCP for Woocommerce(nothing to do with Pinecone). As I mentioned earlier, I want to save the Context API token usage, so not using the Pinecone Assistant MCP endpoint directly like what I did for #2 code. That’s why you don’t find any Context call anywhere in #3 code.

The custom architecture is the usage of Langchain and LangGraph to not only wrap the Pinecone Assistant client as a function call but also to incorporate the routing tool logic that includes switching between Pinecone Assistant and Woocommerce MCP. Does that make sense and possible? I want to avoid using Context API, but moreover, as I am not using the Pinecone Assistant MCP, so there is no reason to use it.

avi · June 24, 2025, 12:15pm

Do you know if the agent did a context call to your assistant? I don’t have access to the output and I do see a print in query_pinecone_knowledge_base. In the context call you have references in the response.