I am having the similar issue i.e. how to delete the particular document from namespace through an API.
Here is the scenario:
I am upserting multiple documents in Pinecone vector db through Langchain/flowise. Each document is converted into multiple chunks/records and then upserts in a namspace. Now deleting a particular document means deleting all the relevant chunks/records associated with that document from the namespace.
If the index is Serverless, then the only way to delete the data from the namespace is by Vector ID. In my application, Vector IDs are assigned randomly by Langchain/flowise and seemingly, there is no API in pinecone to get the IDs list of a particular document. Though one can get the list of all IDs in the namespace but there is no way to identify which IDs belong to which document.
One way to delete the records is through metadata (assuming metadata, unique to a document, is assigned to chunks/records), but this is not supported for Serverless indexes.
Will appreciate if anyone can propose a possible solution.
The issue I am facing is, how to identify the Doc ID? There are several documents in the namespace in form of chunks. To delete a particular document through ID, one need to know the IDs of all the chunks associated with a particular document. Since I am using Flowise to upsert the document to Pinecone, it is generating random ID for each chunk.
My question is how to determine the chunk IDs associated with a particular document? There exists an API that return list of IDs in a namespace but that is for all the documents.
Yes, I’m also upserting documents using Flowise via API.
A solution could be to use “metadata” to upsert (maybe sending there a unique ID) and then using that to delete all the chunks with that “metadata-id”?
I wrote Henry (from Flowise) and he told me that we could do the upsert and delete directly to the document store. Meaning:
Upsert a new PDF = Creates a new document store.
Delete PDF = Deltes the entire document store.
He’s going to implement a simple way to create new document stores with their document-loaders for the next release.