Is it possible to update metadata configuration settings after index creation?

dra · March 10, 2023, 1:39pm

I want to add a setting to metadata_config.
Can I update metadata_config without rebuilding the index?

kbutler · March 10, 2023, 1:43pm

Hi dra,
You can apply the PATCH to modify the metadata_config. Warning. Pods will need to restart and rebuild the metadata index so you could experience some availability issues briefly during this time. I recommend doing it off hours and take a collection ahead of time just in case.

renrel · March 11, 2023, 8:48am

Any rough estimate on how long the downtime would be? Of course it varies depending on how many indexes and how many vectors you have and maybe what pod type you use (just guessing), but is there a rough formula of something like X seconds * num_indexes_per_vector * num_vectors we can assume or example benchmarks we can reference?

Also, would be nice if metadata_config could be added to the documentation you linked. I tested (via curl) and it worked, but it’s not listed and none of the client libraries support it.

dr.rank · April 22, 2023, 8:45pm

I am having the same issue. I sued the patch command to introduce matadata_config and received a 202 confirmation but upon checking index info it still reads metadata_config=None whilst my index is in ‘ready’ state.

CrudenBoy · May 2, 2023, 7:03am

Hi kbutler, Could you please give me some guidance? I have created an index and started to upsert vectors. I now want to add metadata and have tried various methods with no luck. I see “metadata_config”: NULL when I run pinecone.describe_index(“langchain2”).

How do I do this? I see you mention I needed to add a patch, but I can not find how to do this.

mrlowlevel · September 11, 2023, 9:50am

This does not appear to work.

curl --location --request PATCH 'https://controller.us-west1-gcp.pinecone.io/databases/REDACTED' \
--header 'Api-Key: REDACTED' \
--header 'Content-Type: application/json' \
--data '{
    "metdata_config": {
        "indexed": [
            "some_redacted_field",
            "some_other_redacted_field"
        ]
    }
}'

Issuing the request above I get a 202 response but nothing happens. @kbutler any idea if PATCH is actually a valid approach? I do recall reading elsewhere you have to recreate the index to update metadata config.

kbutler · September 11, 2023, 2:35pm

There are two options depending on your scenario:
1.) New index - create it with a metadata_config parameter to begin with before upserting data.
2.) Existing index - create a collection from the live index, then use the API to re-create the index with the new metadata_config parameter.

I am also attaching a Jupyter notebook that demonstrates the same as well as ‘resizing a mis-sized index’ example.

(Attachment How to adjust metadata_config.zip is missing)

mrlowlevel · September 11, 2023, 2:51pm

We need to index new fields on an existing index. The problem is, as far as I can tell, there is no way to complete scenario 2 without downtime. The problem is we cannot run two indexes concurrently, because it appears an index can only be instantiated from a collection, rather than importing data from a collection post-creation. Ideally the deployment process would look something like:

Create new index with updated metadata_config
Insert new data into both old and new indexes
Take a snapshot / collection from old index
Backfill data in new index from collection (only vectors with ids not already in new index)
Switch over to reading from new index
Some data integrity checks?
Write only to new index
Spin down old index

kbutler · September 11, 2023, 3:08pm

That is the ideal path. Unfortunately, the current limitation lies in the ability to copy data from a live index to another.

I have another customer that does something similar but they capture and hold the upserts while recreating the new index from the collection. Once ready, they then backfill the data for the duration of time that it takes to create the collection and reproduce the new index. But this is all handled on their side as they built out that part of the infrastructure themselves. I hope this helps.

Kevin

philippe.cailloux · March 4, 2024, 8:29pm

it’s now an old post, but just in case anyone wants to use the code snippet from @mrlowlevel above, the issue might be in the typo: metdata_config instead of metadata_config