How to count vectors before uploading document

Hi there, i’m unsure if vector count is different for s1, p1 or p2 but i’m using s1. I would like to know how anyone found a way to get a vector count of the document before uploading to the database. Or is it by a certain number of chunks with langchain and tokens per document? I am specifically talking about the vector count when you query the pinecone stats for namespaces, it comes with the vector count with the namespace name.

Hi @AlphaApollo ,

Thanks for your interesting question!

You could reproduce the same vector count we provide on the backend via the Pinecone describe index stats API call by doing the following:

  • every time you pass your data through your embedding model, keep an incrementing counter of the vectors you’re going to attempt to upsert to Pinecone
  • If the call to Pinecone upsert is successful, increment the counter by the number of vectors upserted.
  • If the API call is rejected or fails, do not increment your local counter by the number of vectors you attempted but failed to upsert.
  • Finally, add some reasonable timeout of a few seconds or more ( you may need to experiment here to determine the ideal value ) before making the Pinecone describe index stats API call

This last consideration is important because there may be a slight delay before the value Pinecone returns for your index stats matches the upserted vector count you are tracking locally.

This is because Pinecone upsert requests first land in a queue of varying size depending on which Pod type you select - and in some cases, this queue can introduce a slight delay before vectors are successfully written.

I hope this is helpful!