Index Finetune & realtime monitoring questions

I am planning to use pinecone for our vector embedding and this will be used in real time mode from our cloud application lambda function. As of now, I am planning to pod type X2 with 1 replica only and want to know the following information. Wanted to have this information to strategise my production go live architecture with Pinecone as a vector embedding database.

  • How many concurrent calls can we make to our Index at the same time?
  • What will happen if there are more requests coming for Index query than what Index query can handle?
  • How can we automatically autoscale up or down within a limit of the number of replicas depending on calls?
  • How can we delete a few specific vector embeddings from Index and don’t want to recreate the index?
  • How can we monitor index performance in real-time and get alert of this degradation?

Hi @amitkayal,

I see that you’re engaged with one of my colleagues on the Support team to address your specific questions. I’ll close this question out so you can get more specific advice for your use in that channel.

@amitkayal Would be awesome if you could share your findings surrounding these questions.