Incremental Upsert

How can I upsert additional documents in future on the same namespace without overwriting or deleting the previous records or documents in that particular namespace?

The issue is whenever i want to upsert something new in the similar namespace, I’ve to upsert it with the previous files additionally to avoid overwriting

@ZacharyProser kindly look into this and let me know if that’s a possibility or a limitation atm.TIA.

Hi @taimoorqureshi80, and thanks for your question!

I understand your concern about upserting new documents without overwriting existing ones in the same namespace.

  1. Upsert Behavior: Pinecone’s upsert operation only overwrites records with the same ID. If you’re upserting new documents with unique IDs, they will be added to the namespace without affecting existing records.
  2. Generating Unique IDs: Ensure each new document you’re upserting has a unique ID. There are several ways you could achieve this:
    • Using a UUID or a similar unique identifier generator
    • Combining a timestamp with a document identifier
    • Incrementing a counter for each new document
  3. Partial Updates: If you need to update only part of an existing record, consider using the update operation instead of upsert. This allows you to modify specific fields without overwriting the entire record.
  4. Batching: You don’t need to include all previous documents in the upsert call when upserting new documents. Instead, you can upsert only the new documents in batches. We recommend batches of 100 or fewer records:

python

Copy

index.upsert(
  vectors=[
    {"id": "new_doc_1", "values": [...], "metadata": {...}},
    {"id": "new_doc_2", "values": [...], "metadata": {...}},
    # ... more new documents ...
  ],
  namespace="your_namespace"
)
  1. Checking Existing Records: If you’re unsure whether a record already exists, you can use the fetch operation to check before upserting. This allows you to decide whether to update an existing record or insert a new one.
  2. Namespace Management: If your use case allows, consider using different namespaces for different sets of documents. This can help organize your data and make it easier to manage updates.

I hope this helps!

Best,
Zack