Inferencing time question

sophiem · February 28, 2022, 7:25pm

Could you tell us more about inferencing time? Say for example we have a database of million of entries, how fast would the query be? As far as I know, transformer based semantic embedding can be quite slow to operate at scale where we have millions or billions of entries to measure similarity on?

jamesbriggs · March 1, 2022, 1:20pm

In the past this has been the case, if searching through millions of dense vectors exhaustively (eg checking every possible result) it will take a very long time. Fortunately, this is no longer the case thanks to approximate nearest neighbor (ANN) search. Pinecone makes use of an incredibly fast and scalable ANN algorithm to make search times very fast.

I’d recommend you take a look at the charts in this article, with Pinecone’s P1 pods and 10M indexed vectors, we see p95 query latency of near ~100ms, the latency increase as we increase the number of vectors is very minor from here too (you can see this in the graphs). You can also use the usage calculator to get an idea for your particular use-case too.

I hope that helps!