Latency analysis and variance. Cold start issue?

amitkayal · February 20, 2023, 5:52pm

Hello All,

I am doing a latency analyis to understand the impact of filter and I am using sample notebook into examples/semantic_text_search.ipynb at master · pinecone-io/examples · GitHub. I saw that for the code mentioned into “Pinecone Example Usage With Metadata” the duration of retrieval varrying from time: 749 ms (started: 2023-02-20 17:51:16 +00:00), time: 246 ms (started: 2023-02-20 17:51:47 +00:00) to time: 250 ms (started: 2023-02-20 17:50:38 +00:00). I am not able to understand why we have so much variance? Do we have also any cold start issue to be addressed here if I need very low latency? As of now, I am using Starter options and my index details are IndexDescription(name=‘semantic-text-search-demo’, metric=‘cosine’, replicas=1, dimension=384.0, shards=1, pods=1, pod_type=‘p1.x1’, status={‘ready’: True, ‘state’: ‘Ready’}, metadata_config=None, source_collection=‘’) Thanks

kbutler · February 20, 2023, 6:00pm

Hi amitkayal,
Latency on p1 pods is much lower. This sounds like you might be running the notebook from either your local system in a location that is far away from the region the Pinecone Index is hosted. The variance is likely the total end-to-end network latency from your application to Pinecone. If you relocate your notebook to a server located in the same region, preferably the same cloud, you should see dramatically faster speeds.

amitkayal · February 20, 2023, 6:11pm

Hi @kbutler I am running this from Google colab and not sure in which region they are running this. But then why the latency will varry so much? Technically if colab is running in diff region then latency will always be high unless my local network is playing the issue here. Thanks

kbutler · February 20, 2023, 6:34pm

Thank you for your input. I understand that ensuring consistency in this context is important and agree that the local machine should not have an impact on network/latency. However, given the complexity of the situation, it may not be appropriate to conduct a conclusive load test using this notebook alone. Therefore, I suggest conducting a load test with a specialized testing tool that performs queries against Pinecone, while specifying the appropriate parameters such as top-k, IncludeMetadata, and IncludeValues, to accurately determine the expected QPS/latency over time.

amitkayal · February 22, 2023, 6:42pm

Thanks a lot for all your help and now I have retested from my AWS sagemaker studio instance into us-east-1 while my index also in same aws region. The latency varries between 20 msec to 12 msec.

I have another question here regarding top-k. Can i define threshold value so that anything below should not be even considered in top-k calculation? My aim is not to get any vector in return where the similarity coefficient is < 0.6. I know that it can be done into my client side but wanted to see if we can define that into metadata or at index level.

Thanks

amitkayal · February 24, 2023, 12:46pm

Hi @ Cory_Pinecone, @kbutler would it be possible for you to guide me for this threshold value? It is quite important criteria for us and I am looking if this can be achieved by a parameter during pinecone call itself rather than getting data and filtering. Thanks

kbutler · February 24, 2023, 2:21pm

Hi Amit,
We currently do not have this as a feature in our product. However, our product team is aware of this request as other customers have asked for it too. You are correct, that it would have to be done on the client side for now.