Pinecone recall rate benchmarks

andrew1 · July 17, 2023, 6:34pm

Hi,
I ran a recall rate benchmark test with one s1 pinecone instance where I uploaded 5 million vectors using a gaussian distribution, and kept a max heap of cosine distances for a list of query vectors. I was kind of surprised that when comparing results for closest distances that I was getting recall rates averaging 15-20% I’m not sure if it’s an issue with my testing methodology or the setup. However, it is a bit alarming that the results were so low. I was also wondering if there is any tuning params I could use for the index, but couldn’t find any documentation regarding that.