What are you using for inference?

Hi All, @greg Invited me as I’ve recently launched “Mighty Inference Server” https://max.io …I’ve been part of the ANN search community for several years now, and built Mighty for teams using Pinecone and similar vector search engines.

I’d really like to hear what you all are currently using for inference, and what problems you have. Nothing solves every problem but we focus on specifically self-hosted and fast inference for teams that don’t want to spend months building expensive GPU deployments, and want inference to be 100% unlimited (compared to costly Inference-as-a-Service APIs like OpenAI and Cohere). Infinity also gave lots of folks sticker shock (estimated at $20k per year) - which is fine for enterprise but is out of reach for most small to medium size teams.

It’s nice to meet you all and thanks for the warm welcome Greg :wave:.

2 Likes

Welcome Max!

The pricing is definitely appealing, at $99/month. How is it so much lower than the alternatives? I wonder if it’s because the other options bake GPU costs into the pricing?

Thanks!

Yes, the infrastructure costs are built in to APIs. They also need to cover egress bandwidth, which is very expensive when serving millions of high-dimensional vectors. They are also shared, and your neighbors are probably very noisy ;). Hosting your own models in the same pod or cluster doesn’t have that overhead which is why many teams roll their own.

With Mighty you are responsible for the servers/instances…but those can be rented very cheaply when you don’t need a GPU, and teams can easily add appropriately sized compute machines to fit their needs.