What are the best practices for retrieving answers in different languages from Pinecone when using OpenAI?

hachemimohammedamine · July 9, 2024, 3:36pm

I am currently building a Retrieval-Augmented Generation (RAG) setup utilizing Pinecone and OpenAI. My goal is to store data in Pinecone in multiple languages and configure OpenAI to handle questions in different languages. The answers, stored in Pinecone, may be in a different language than the query. This setup aims to support multilingual data retrieval and response generation, ensuring seamless interaction across various languages.

ZacharyProser · July 10, 2024, 6:22pm

Hi @hachemimohammedamine, and welcome to the Pinecone community forums!

Thank you for your interesting question.

I want to point you to two multilingual example Jupyter Notebooks we have demonstrating the use of multilingual models alongside Pinecone.

Hopefully, they’re helpful as a reference.

As for your specified use case, you might look into metadata filtering as a means of mapping from the language used by your end user to the vectors that were derived from the target language.

You could also consider storing each language’s vectors in a separate namespace, e.g., one for en, sp fr, etc. That could improve search performance because Pinecone would only need to perform queries across a single namespace. Here’s our guide to using namespaces.

Hope this helps and let me know if you have any follow up questions!

Best,
Zack