I created a project in the Singapore availability zone. When I created an index for testing, I found that no matter how many replicas I added, after 3 concurrency, the QPS would no longer increase.
Hi @wnch201209. We need more information to know if there was a problem or not.
- What was the pod type of the index you were using?
- What was the max QPS you were seeing?
- How were you running the queries? From where were they being run?
- Did you control for the max RPS your app can handle? What is the upper bound of the number of requests you’re able to send?
- Were you using a single host to test QPS, or was this a distributed test involving more than one test host?
I checked and found an index I think is the one you were using. It looks like you were sending about 166 RPS, which is much lower than the upper bound of QPS that pod configuration can handle. So I suspect the bottleneck was somewhere outside of Pinecone.
- The pod type of the index I used is p2.x1, and the number of replica was set to 5.
- The max QPS I test is 181.
- I use python’s GRPC client for pressure testing, using multi-thread and multi-process respectively. My Client machine is purchased from Alibaba Cloud ECS in Singapore. In the same region as the Pinecone project.
- I control the frequency of sending requests by increasing the number of concurrent threads or increasing the number of concurrent processes. When the same script is testing other vector databases, the QPS can reach more than 1,000.
- Yes, the clients I tested QPS are all from the same host.