10x Slower Query Performance From Lambda Func

I think I see what the issue is.

I checked your index, and the p50 query response time is about 140ms. The p99 is considerably higher, almost a full second. But there aren’t many queries being run, which is what I suspect is the underlying issue.

Your index is using our serverless platform; one of its features is to keep the clusters of vectors you query most available in a cache. But if you don’t run queries often, this cache can become stale and evicted. So, the next time you run a query, it has to do a cold start and reload the vectors again.

The more frequently you query your index, the more likely it will be to maintain the cached data, resulting in much faster responses.

We’re improving how the cache operates and will likely release a higher-performance serverless version later this year. But in the meantime, if you need very fast response times to infrequent querying, you are better served using an index built either with p1 or p2 pods.

Also, keep in mind that serverless is currently in public preview and is not considered suitable for production workloads in most cases.