Incremental Upsert

ZacharyProser · June 25, 2024, 2:09pm

Hi @taimoorqureshi80, and thanks for your question!

I understand your concern about upserting new documents without overwriting existing ones in the same namespace.

Upsert Behavior: Pinecone’s upsert operation only overwrites records with the same ID. If you’re upserting new documents with unique IDs, they will be added to the namespace without affecting existing records.
Generating Unique IDs: Ensure each new document you’re upserting has a unique ID. There are several ways you could achieve this:
- Using a UUID or a similar unique identifier generator
- Combining a timestamp with a document identifier
- Incrementing a counter for each new document
Partial Updates: If you need to update only part of an existing record, consider using the update operation instead of upsert. This allows you to modify specific fields without overwriting the entire record.
Batching: You don’t need to include all previous documents in the upsert call when upserting new documents. Instead, you can upsert only the new documents in batches. We recommend batches of 100 or fewer records:

python

Copy

index.upsert(
  vectors=[
    {"id": "new_doc_1", "values": [...], "metadata": {...}},
    {"id": "new_doc_2", "values": [...], "metadata": {...}},
    # ... more new documents ...
  ],
  namespace="your_namespace"
)

Checking Existing Records: If you’re unsure whether a record already exists, you can use the fetch operation to check before upserting. This allows you to decide whether to update an existing record or insert a new one.
Namespace Management: If your use case allows, consider using different namespaces for different sets of documents. This can help organize your data and make it easier to manage updates.

I hope this helps!

Best,
Zack