Has anyone experimented with the new Prometheus support? I’m getting data into Grafana but when I look at the pinecone_request_latency_seconds metric (for queries) the numbers do not match up with what I see on the client side (I’m using artillery to issue queries at various rates). I think the problem may be the size of the rolling window used to calculate the quantile values but that’s just a guess.
Prometheus will give you backend latencies while Artillery is measuring round-trip latencies, hence you might be seeing different numbers there.
Also, regarding Artillery, I recommend that you try thinking a bit about how you anticipate your load to behave IRL. Artillery has the option to use uniform or Poisson process arrival distribution along with adding jitter, our system would behave differently in both cases.
Hi @rajat , thanks for your reply and suggestions. I should have given a bit more information in the original posting. Here are the odd things I notice:
artillery actually shows shorter p95 times than Prometheus rather than longer
after artillery stops sending requests the Prometheus values do not drop but seem stuck at the last values, even after several hours
Here’s a Grafana screenshot showing latencies and request rates, the panel queries are:
top panel: pinecone_request_latency_seconds{… request_type=“query”}
bottom panel: rate(pinecone_request_count_total{… request_type=“query” }[30s])
I just checked our code to be certain. The graph you are seeing is the expected behavior from Prometheus histograms (we record request latencies as histograms).
Also regarding artillery differences, we use 10s window sizes for our calculations for the metrics, is there a mismatch there?
Just checking to see if you still need a discussion or deep-dive on this. We are happy to jump on a call if needed. You can reach out to us via support@pinecone.io.