Newbie Question- Index with divergent content

jacobgoldenart · March 20, 2023, 8:35pm

Hello, I want to create an index with documentation for all the different coding libraries and API’s that I work with. The only examples I’ve seen are based on a single set of documentation, but I’d like to query my whole toolkit and keep it updated with the latest releases. I will be using Langchain and OpenAI Embedings to interface with Pinecone. Is it best practice to add all of this documentation from different sources to the same index? I ask because I use these APIs together, so if I’m asking chat gpt about a web app I’m building, a helpful response might be sourced from (for example), the pinecone docs, open ai docs, and long-chain docs, however, I don’t want to fill my prompt with stuff it doesn’t need like my documentation for Circuit Python. I imagine that Langchain would help with this by working with pinecone to only return info based on my queries to use as prompt context, but is there anything specific I should do to better help pinecone in this process? Thanks for reading.
#Building the Future with LLMs, LangChain, and Pinecone

Cory_Pinecone · March 20, 2023, 9:59pm

Hi @jacobgoldenart,

Welcome to the Pinecone community forums!

Yes, you are correct; it’s best practice to store all of those vectors in a single index. That will be the most efficient way to store them. If needed, you can segment them by namespace, but since queries only run against a single namespace I’d recommend against that for your use case. You won’t necessarily know which doc has the right answer when you’re looking for it, so separating the docs from each other in the index doesn’t really help in searching all of them at once.

Cory