Urgent Issue with INVALID_ARGUMENT Error and Language Behavior – Workflow Stopped

ahmettyalcin35 · December 8, 2024, 6:54pm

RAG Chat Assistant, and I am encountering a critical issue with your API that is halting my workflow. Specifically:

When I ask questions entirely in Turkish, such as:

“Bir bar açmayı planlıyorum. İlgili mevzuata göre neler yapmalıyım?”
“Bir restaurant açmayı planlıyorum. İlgili mevzuata göre neler yapmalıyım?”
Bir market açmayı planlıyorum. İlgili mevzuata göre neler yapmalıyım?"
I receive the error:
INVALID_ARGUMENT: Request failed due to Content Management Policy.

Interestingly, when I write the same questions in half English and half Turkish, it works without any issue. For example:

“I’m planning to open a bar. İlgili mevzuatına göre neler yapmalıyım?”
“I’m planning to open a restaurant. İlgili mevzuatına göre neler yapmalıyım?”
“I’m planning to open a shop. İlgili mevzuatına göre neler yapmalıyım?”

Furthermore, when I fully translate these questions into English, they also work perfectly:

“I’m planning to open a bar. According to the regulations, what should I do?”

This inconsistency is causing my workflow to stop, as it is crucial for my use case to support multi-language input. Could you help me understand:

Why the API behaves differently for Turkish and English inputs?
Why mixing English and Turkish works, while pure Turkish fails?
Whether this is due to Content Management Policy rules or some other configuration issue?

I urgently need this issue resolved, as it is halting my operations.

As a paid subscriber, I would greatly appreciate your prompt attention to this matter.

Thank you for your support and clarification.

arjun · December 9, 2024, 2:57pm

Hello ahmettyalcin35, thanks for posting and welcome to the community!

Could you clarify the following in a message?

What part of the Pinecone API are you using (query, Assistant, etc.)?
What kinds of data are you working with?
What embedding model have you chosen to set up your index with?
How have you set up your RAG workflow, and with what dependencies? For example, are you constructing the API calls yourself or via Langchain or Llamaindex?
What are you using for your generating LLM, i.e the model that is accepting chunks from Pinecone to answer your questions?

Ideally, if you paste a snippet of the code causing the issue, in addition to the function that is creating the response, that would help us diagnose where the problem is coming from.

It seems that the issue may be coming from a generating LLM API call, rather than Pinecone, but without looking at the code it’s hard to tell.

Please respond ASAP, and we can continue troubleshooting.

Sincerely,
Arjun

ahmettyalcin35 · December 9, 2024, 5:12pm

Thank you for your prompt response. Let me clarify my situation further:

Pinecone API Usage: I am using the Assistant functionality from the Pinecone interface directly via the provided assistant playground. The issue arises when asking questions in Turkish or mixed languages.
Data Used: The data primarily relates to regulatory and compliance inquiries. In addition to Pinecone’s own setup, I have also uploaded my own text data, and the system processes queries based on this data as well. For example, asking about the steps to open a restaurant or bar according to local regulations.
Embedding Model: The embedding model setup is default (as pre-configured in Pinecone’s Assistant). I have not manually altered this setting.
RAG Workflow: My workflow involves Pinecone’s default RAG implementation for generating AI responses to queries. In this workflow, both Pinecone’s capabilities and my uploaded text data are utilized to generate responses. No external tools like Langchain or Llamaindex are integrated; I am purely relying on Pinecone’s Assistant playground.
Generating LLM: I believe Pinecone’s GPT-4o model is used for response generation. As noted, mixed-language queries such as “I’m planning to open a restaurant. İlgili mevzuatına göre neler yapmalıyım?” are processed successfully, while fully Turkish queries fail.

This issue is new; I am encountering it for the first time. The error occurs even within Pinecone’s own playground, with the message:
INVALID_ARGUMENT: Request failed due to Content Management Policy.

To clarify, I have not shared my API code, as the same issue is reproducible directly within your Assistant interface.

Can you help resolve this issue?

ahmettyalcin35 · December 9, 2024, 5:19pm

arjun · December 10, 2024, 2:19pm

Thank you! This is extremely helpful, and I can see exactly what is going on.

When you use our Assistant API, you utilize our vector database, embedding models, and a third-party generating LLM such as GPT-4o from OpenAI or Claude from Anthropic.

When you submit a question, it gets embedded and queried against our vector database (inside Pinecone), and then the chunks received from the query, along with the query itself, go to the chosen third-party LLM. In this case, that’s OpenAI!

For various reasons, third-party LLM providers restrict what content can be generated to prevent abuse, and often these models are predictive classifiers. I imagine you are experiencing a misclassification error from OpenAI’s content moderation API within their API call, or simply the kind of question you are asking is restricted in this language.

Unfortunately, this is not something Pinecone can solve directly, and you may need to direct your questions to the OpenAI forums. You can also try to use the Claude API instead and see if that resolves your issue. If you end up doing so, please follow up here so others may learn from your situation!

For your reference, here are OpenAI’s usage policies (which trigger the error you are seeing), and here are Claude’s.

I’m not sure exactly why OpenAI’s usage policies are triggering on the questions you are asking, but it is clear that they are causing the issue.

Again, I apologize for this happening with your use case, but I hope this information helps you!

Sincerely,
Arjun