Retrieve all vector IDs

Cory_Pinecone · October 17, 2023, 5:26pm

An endpoint to retrieve all of the vector IDs stored in the index without supplying a query vector first.

jyjayr5 · November 7, 2023, 1:26am

should have optional namespace parameter

drainedouticedout · November 17, 2023, 6:03pm

PLEASE add this, including a way to filter by metadata. Currently users are forced to do absurd workarounds like looping random query vectors in hopes to collect all of the matching vectors, or just deleting and reindexing.

nsartor · November 30, 2023, 3:41pm

Please guys, my workaround is now taking 10-15 minutes to find all the records in my DB.

gdj0nes · December 1, 2023, 2:25pm

@nsartor thanks for commenting. Would you be able to tell me a bit more about your use case? How large if an index are you exporting, how often are you doing this, and why? Thanks!

nsartor · December 1, 2023, 2:54pm

Sure. Currently in the 100k, but growing fast, some 5-10k a day.

I have IDs from another database that are stored in pinecone as they are created/edited.

I have a daily batch process that makes sure the two dbs are aligned, as sometimes the 1by1 process fails.

Currently I have the random vector workaround, batching 6 calls in parallel and iterating until all the vectors are found.

For some reasons the last hundred or so are always impossible to find and forcing it to find them all would require more than 100 calls to do so.

Luckily it’s not a time sensitive task, but I’d rather not do all these useless calls.

gdj0nes · December 1, 2023, 3:48pm

Ok got it so it is avoid skew between your “source” database and Pinecone. We are trying to simplify the keeping Pinecone What sort of database is upstream (mongo, mysql, etc.). How do you do the syncing? Is there a reason that skew emerges?

nsartor · December 1, 2023, 5:31pm

Salesforce
Errors or temporary exceptions on the functions gathering the info to be saved in pinecone as embedding. And the unreliability of the trigger.

dhruv.anand · February 2, 2024, 1:46pm

I have built a script to carry out the export using a combination of random searches and metadata filtering.
Library: GitHub - AI-Northstar-Tech/vector-io: Use the universal VDF format for vector datasets to easily export and import data from all vector databases
Please try it out using the commands here:
Dhruv Anand on LinkedIn: Quick Migration to Pinecone Serverless I've been working on a library of…

rafal1 · April 16, 2024, 10:20am

Any news on retrieving vector IDs?

Migrating pod → serverless is impossible without it.

And workarounds with random search fail when going above 100k vectors.

perry · August 16, 2024, 8:45am

Migrating from pod->Serverless can be done natively within the Console: Migrate a pod-based index to serverless - Pinecone Docs

And once you’re in Serverless, you can list the vectors by their ID with optional prefix: List vector IDs - Pinecone Docs