I am building an application where I want to limit the results not by an absolute number but instead by their similarity to the query.
From the documentation it seems that the only option is top_k (a fixed number of results that meet a threshold similarity score). I am wondering how that similarity score is determined, and whether it may change as more upserts are indexed.
In other words, if I had an index with lots of statements about industrial safety and determined for my application that similarity needs to be < 0.1 in order to be useful, and then later added a bunch of statements about acrylic painting, would it change the relative score of results such that I would have to adjust my in-app cutt-off from < 0.1 to say <0.15?
Assume that I am getting embeddings from openai and the same model version each time. (the embedding model isn’t changing, I am just asking about the pinecone query results score calculation).