Hello, I am doing an AI Agency Automation with pinecone, but I have some doubts. I need to have different AI models from which to obtain information, I work with multiple companies so I need each model to be different and obtain data only from that trained model. But I don’t understand if I have to make multiple indexes, for each company and from each index upload the company’s documents. Or can I have an index with multiple name-spaces that each name-space belongs to a company and from there I obtain its data and upload it, which is better?
I explain myself better using a database: I want to have a database, where divided by collections or tables, each table or collection is a company, and for each company it has different data and documents. I need the companies’ data not to be combined, for example if a company uses chatgpt to chat with their data, only obtain information from their documents, do not mix with other companies, I need each company to have its own collection of documents.
I don’t want to sound exaggerated, but let’s say I have 1,000+ in the future, should I have an index for each company?
Generally, if you’re using different models, you’ll need to store the vectors in different indexes. This is because different models use different dimensions to represent your data, and an index can only have one dimension value for all of its vectors (768, 1536, 512, etc.).
If the models you’re using happen to use the same dimension, then you can use a single index to hold them. Just separate them by namespace in that case.
Thanks: I explain myself better using a database: I want to have a database, where divided by collections or tables, each table or collection is a company, and for each company it has different data and documents. I need the companies’ data not to be combined, for example if a company uses chatgpt to chat with their data, only obtain information from their documents, do not mix with other companies, I need each company to have its own collection of documents.
I don’t want to sound exaggerated, but let’s say I have 1,000+ in the future, should I have an index for each company?
Using Namespaces will be much cheaper than deploying a separate index per customer, and namespaces will still allow you to logically separate the data. When you submit a query for customer X, you’ll query the namespace that only contains customer X’s documents.
As @Cory_Pinecone mentioned, this only works if you’re using the same embedding model across all customers. If you’re using different embedding models per customer, you can’t store them in the same index unless the embedding models all have the same dimension.
If you need to have further logical structure underneath each customer like separate tables of data for each customer (for example if you have a “knowledge base” table and a “company info” table), you can still use namespaces for that. You could use namespaces like customerX_kb and customerX_info. But only if all the data has the same dimension.