Which ANN method does pinecone support?

zhenglaizhang · March 16, 2022, 12:52pm

Hi Pinecone experts.

I am wondering if we can tweak ANN methods based on our specific scenarios for the pinecone index? I didn’t find any doc about it.

sophiem · March 17, 2022, 2:58pm

Hello,
Thanks for your question. What type of tweaking are you looking for/what are you trying to optimize and for what purpose? Just trying to get a better sense of your use case!

zhenglaizhang · March 18, 2022, 2:54am

Thanks for following up.
Currently, I don’t have the specific context in mind, just curious that there are lots of ANN algorithms and different ones have trade-off between memory, building speed, query latency, and accuracy. And I am curious if there are opportunities that we can fine-tune it for different cases.
And I am also curious that with the same data and config, is the similarity search result exactly the same between p1 pod and s1 pod?

dave · March 21, 2022, 2:17pm

Hi. It’s a good question. We currently don’t allow customers to tweak the core algorithms or indexes. We do expose the kbits parameter (kbits=512 is the default, kbits=1024 consumes more memory, but provides very very high recall). We plan to expose more tunability in the future. If you have specific requests that you feel are unmet by Pinecone we’d love to hear about them. Also - regarding p1 and s1: yes, so long as you use the same kbits value (512 or 1024) you should expect the same recall.

zhenglaizhang · March 22, 2022, 2:35am

thanks @dave for the detailed explanation! I checked the doc again, you are referring to the index config.

"index_config": {
  "k_bits": 512,
  "hybrid": false
}

But I cannot find any more configuration about the k_bits and hybrid. So here are my follow up questions.

kbits = 1024, you mentioned provides very very high recall do you mean higher recall rate? What about the latency change here?
What’s the meaning of hybrid here?

dave · March 22, 2022, 2:59am

Yes! You can store more vectors per pod with k_bits: 512 (default) than k_bits: 1024. I don’t believe there is a substantial difference in latency. The primary tradeoff is memory for recall. But the recall with k_bits: 512 is already quite high. We are exploring how we might offer even an even greater recall tradeoff.

hybid here refers to the index type. hybrid: true means that the index is entirely stored in RAM (lower latency). Note this is only available on p1 pods. hybrid: false means that the index is partly in RAM and partly on SSD (store more vectors per pod). Note this is only available on s1 pods. Since today s1 or p1 dictate the necessary value for hybrid, it’s actually updated for you and you don’t need to set it. In the future, we may release different pod types and other such tuning parameters.

dave · March 22, 2022, 3:01am

If you had to make more of a trade-off in one direction (memory, latency, recall), what would it be?

zhenglaizhang · March 22, 2022, 6:36am

@dave thanks for the quick response!

It depends on our scenario, one important case is for online recaller - we need to maintain high throughput (~6k/s) and low latency (e.g. p95 < 20ms) for our typical scale 10w x 30D scale.
And also we want to achieve low latency for fetching by id. What’s the typical fetching latency?

greg · March 22, 2022, 3:11pm

@zhenglaizhang Typical fetching latency is under 5ms.

dave · April 15, 2022, 8:13pm

@zhenglaizhang Fetching latencies can be 10s of ms. Depends on details like metadata use and the number of dimensions.