Hello Pinecone User Community,
I am currently designing an application that integrates a chatbot and advanced search functionality to access and filter internal and public documentation using a vector database. The application will utilize tools such as OpenAI/Langchain, Pinecone, Streamlit, and Snowflake for efficient data management and search capabilities.
I would like to gather feedback on the feasibility of this approach and if the system design makes sense. Here’s a brief overview of the process and structure:
Users will be able to query both internal organization documents and public knowledge-based resources. The process consists of four stages:
- Uploading internal documents to Pinecone workspace.
- Sending queries to Pinecone, searching based on internal document summaries and the public knowledge base.
- Retrieving a list of public documents with their table of contents, which users can filter to refine the search results.
- Saving filtered vectors in the application, enabling users to focus on chatbot discussions or other outputs.
The Snowflake database will store user organization details and document metadata in two tables: Document Library and Table of Contents. Users will be able to add, delete, and manage records in these tables.
In the Pinecone vector database, the stored vectors will include vectors, metadata, and namespace data. This structure will allow for efficient filtering and retrieval of relevant documents based on user queries.
I would greatly appreciate any feedback or suggestions on this approach, particularly regarding the integration of Pinecone and other mentioned tools. If you have any experience in similar projects or if you foresee any potential issues or improvements, please share your thoughts.
Thank you for your time and input!