An endpoint to retrieve all of the vector IDs stored in the index without supplying a query vector first.
should have optional namespace parameter
PLEASE add this, including a way to filter by metadata. Currently users are forced to do absurd workarounds like looping random query vectors in hopes to collect all of the matching vectors, or just deleting and reindexing.
Please guys, my workaround is now taking 10-15 minutes to find all the records in my DB.
@nsartor thanks for commenting. Would you be able to tell me a bit more about your use case? How large if an index are you exporting, how often are you doing this, and why? Thanks!
Sure. Currently in the 100k, but growing fast, some 5-10k a day.
I have IDs from another database that are stored in pinecone as they are created/edited.
I have a daily batch process that makes sure the two dbs are aligned, as sometimes the 1by1 process fails.
Currently I have the random vector workaround, batching 6 calls in parallel and iterating until all the vectors are found.
For some reasons the last hundred or so are always impossible to find and forcing it to find them all would require more than 100 calls to do so.
Luckily it’s not a time sensitive task, but I’d rather not do all these useless calls.
Ok got it so it is avoid skew between your “source” database and Pinecone. We are trying to simplify the keeping Pinecone What sort of database is upstream (mongo, mysql, etc.). How do you do the syncing? Is there a reason that skew emerges?
Salesforce
Errors or temporary exceptions on the functions gathering the info to be saved in pinecone as embedding. And the unreliability of the trigger.
I have built a script to carry out the export using a combination of random searches and metadata filtering.
Library: GitHub - AI-Northstar-Tech/vector-io: Use the universal VDF format for vector datasets to easily export and import data from all vector databases
Please try it out using the commands here:
Dhruv Anand on LinkedIn: Quick Migration to Pinecone Serverless I've been working on a library of…
Any news on retrieving vector IDs?
Migrating pod → serverless is impossible without it.
And workarounds with random search fail when going above 100k vectors.
Migrating from pod->Serverless can be done natively within the Console: Migrate a pod-based index to serverless - Pinecone Docs
And once you’re in Serverless, you can list the vectors by their ID with optional prefix: List vector IDs - Pinecone Docs