I am planning to use pinecone for our vector embedding and this will be used in real time mode from our cloud application lambda function. As of now, I am planning to pod type X2 with 1 replica only and want to know the following information. Wanted to have this information to strategise my production go live architecture with Pinecone as a vector embedding database.
- How many concurrent calls can we make to our Index at the same time?
- What will happen if there are more requests coming for Index query than what Index query can handle?
- How can we automatically autoscale up or down within a limit of the number of replicas depending on calls?
- How can we delete a few specific vector embeddings from Index and don’t want to recreate the index?
- How can we monitor index performance in real-time and get alert of this degradation?