How to design the solution when data on which reducer is fit changes every month? Any case study worth sharing
Coming from a quant-bio background I’ve also wondered about this too, but haven’t seen it being used for vector search though. In genomic analysis carrying PCA/UMAP projections to unseen data/sample is common, but it is not re-calculated on the unseen data. The trained projection matrix is just reused.
@nlpguy This article might be helpful: PCA using Python (scikit-learn) | by Michael Galarnyk | Towards Data Science
Nandan Thakur has currently a paper finished on this (currently in review on my desk, we will publish it next month).
PCA worked rather badly in our experiments for semantic search - Performance quite dropped.
The question is what you want from smaller dimensions. Faster retrieval speed? Saving memory for the index? What works quite well:
- Storing floats as FP16 or FP8 - Reduces the embedding matrix by size 2 / 4 without any sacrifice on the performance. If multiplications are faster depends on your hardware and if they support FP16
- Using PQ to binarize your embeddings. Can reduce the embedding matrix significantly and leads to faster search (depending on your ANN index), but also reduces the performance.
Hey @greg! My doubt is more around performance and engineering aspect of using it for search.
Thanks @NilsReimers for the heads up! My motivation for this was coming from your insight on your SBERT site