We are at a point where we need to upgrade our RAG Pipeline and I am uptaking Managing Imports → Understanding imports - Pinecone Docs , a couple of quick questions.
- Does this assume conversion of any form of file to parquet , pdf, text,csv,json → parquet
- What are mechanisms to handle incremental doc updates ? a new file is dropped in a bucket
- Is the chunking/embedding handled as a config at the index level ?
As I am writing an agentic workflow + airflow to continuously process docs to the vector database , this is a crucial step to continuously update our namespaces.