How to resume the upload?

my upload of a millions of vectors stopped midway. what’s the easiest way to find the last one and restart the upload? thanks

Hi @admin2

the easiest would be:

  1. to check the log and see which vector_id (if you are creating your own) or enumerated id was the last that was upserted. That is if you are writing your own log.
  2. If you are not using multi threading and know how many vectors there were at the beginning you can calculate how many you already upserted
  3. You can start from the end and go towards the beginning, checking if the vector you are inserting already exists in the database

The way I upsert most of my vectors is to

  • keep a log of all vector_ids inserted with their enumerate() id
  • before upsert I query the vector with top_k=1, return_metadata/values=False and check the score. If there is a vector with >=1.0 score I skip it

I do both of the upper points, but the first one would be enough I think as the idea would be to NOT calculate the vector again if I already have it in the database.

Hope this helps

Edit: Also, it probably doesn’t help but, I know the pain, most of us here do. Good luck out there

3 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.