As part of the Pinecone team, I get to hear from a lot of folks working on real-time vector search. A few common themes keep coming up: managing query throughput, handling frequent index updates, and optimizing memory usage. I’ve also heard people experimenting with things like sharding strategies and embedding compression to help.
I’d love to open this up—
What challenges have you run into when trying to scale vector search for real-time use cases? What’s worked (or hasn’t)?