my upload of a millions of vectors stopped midway. what’s the easiest way to find the last one and restart the upload? thanks
the easiest would be:
- to check the log and see which vector_id (if you are creating your own) or enumerated id was the last that was upserted. That is if you are writing your own log.
- If you are not using multi threading and know how many vectors there were at the beginning you can calculate how many you already upserted
- You can start from the end and go towards the beginning, checking if the vector you are inserting already exists in the database
The way I upsert most of my vectors is to
- keep a log of all vector_ids inserted with their enumerate() id
- before upsert I query the vector with top_k=1, return_metadata/values=False and check the score. If there is a vector with >=1.0 score I skip it
I do both of the upper points, but the first one would be enough I think as the idea would be to NOT calculate the vector again if I already have it in the database.
Hope this helps
Edit: Also, it probably doesn’t help but, I know the pain, most of us here do. Good luck out there
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.