Hi there, i’m unsure if vector count is different for s1, p1 or p2 but i’m using s1. I would like to know how anyone found a way to get a vector count of the document before uploading to the database. Or is it by a certain number of chunks with langchain and tokens per document? I am specifically talking about the vector count when you query the pinecone stats for namespaces, it comes with the vector count with the namespace name.
Hi @AlphaApollo ,
Thanks for your interesting question!
You could reproduce the same vector count we provide on the backend via the Pinecone describe index stats API call by doing the following:
- every time you pass your data through your embedding model, keep an incrementing counter of the vectors you’re going to attempt to
- If the call to Pinecone
upsertis successful, increment the counter by the number of vectors upserted.
- If the API call is rejected or fails, do not increment your local counter by the number of vectors you attempted but failed to upsert.
- Finally, add some reasonable timeout of a few seconds or more ( you may need to experiment here to determine the ideal value ) before making the Pinecone describe index stats API call
This last consideration is important because there may be a slight delay before the value Pinecone returns for your index stats matches the upserted vector count you are tracking locally.
This is because Pinecone upsert requests first land in a queue of varying size depending on which Pod type you select - and in some cases, this queue can introduce a slight delay before vectors are successfully written.
I hope this is helpful!