In an application where users constantly upload documents, generating embeddings would be one of the biggest cost if using paid embeddings such as openai’s embedding. Using open source embeddings would be free but I am wondering on the best way to deploy using huggingface embeddings without degrading user experience too much. Any recommendations are appreciated.
A smaller model that can run on a CPU may be an option. SentenceTransformers has a list of models including their inference speed. The intfloat/e5-small-v2 · Hugging Face is a new small model. One option for deployment is HF’s Inference Endpoints.
Hope this helps!
This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.