In the last two charts in this article, the latency of p2 is plotted for many possible values of recall. Does this mean that one can choose the desired recall and control the latency-recall trade-off? How is that done?
the graphs you are referring to are two benchmark results. Where the benchmark is probably set up for the developers to test where to make their cutoff (I am guessing). But to answer your question no. There is no way to chose desired recall.
I think this is by design as they advocate the “it works” approach. They provide a managed service you do not need to control, tweak etc. If you desire control over these parameters I would advise setting up an in-house vector database and managing it on your own. It is more work and I don’t think the cost-performance is better, but that way you can control more or less everything (and it is kinda fun if that’s what you are into).
Hope this helps