10x Slower Query Performance From Lambda Func

My company is using Pinecone at the end of a API_Gateway lambda function call to do some querying to return back relevant nearby metadata.

When tested on a local pc, we get speeds ranging from 400-700 milliseconds when using the service.

When that same code is uploaded to the lambda function, the operation, regardless of coldboot status, can take anywhere between 3000->11000 milliseconds.

For logging, we have only been logging the speed performance specifically of the await index.query({}) call.

Is there something we are missing about the use of Pinecone with AWS that can circumvent this bottleneck?

We are using layers to import the pinecone api, and that client has to be established each time the function is invoked, but I can’t see how that would impact specifically the .query call.

Code for both sets is as follows:

import { Pinecone } from “@pinecone-database/pinecone”;

const data = {
vectors: [
/* Some Vector Data */
const API_KEY = “A valid API Key”;
const INDEX_NAME = “A valid Index Name”;

async function main() {
const pc = new Pinecone({
apiKey: API_KEY,
const index = pc.index(INDEX_NAME);

let now = Date.now();
const returnedData = await index.query({
topK: 10,
vector: data.vectors[0],
includeMetadata: true,
console.log(Query Took: ${Date.now() - now} milliseconds);


Hi @justin.clark. Which region are you hosting your Lambda function in?


Hi Cory, thanks for replying.

Our lambda functions are hosted in N.California (us-west-1). Timing the lambda function shows the bulk of the time on the call is specifically on the index query portion.

Metadata for the vectors is 2 key:value pairs of strings, so the data load isn’t large.

Our paid Pinecone server is a cloud AWS serverless hosted in Oregon (us-west-2).

I think I see what the issue is.

I checked your index, and the p50 query response time is about 140ms. The p99 is considerably higher, almost a full second. But there aren’t many queries being run, which is what I suspect is the underlying issue.

Your index is using our serverless platform; one of its features is to keep the clusters of vectors you query most available in a cache. But if you don’t run queries often, this cache can become stale and evicted. So, the next time you run a query, it has to do a cold start and reload the vectors again.

The more frequently you query your index, the more likely it will be to maintain the cached data, resulting in much faster responses.

We’re improving how the cache operates and will likely release a higher-performance serverless version later this year. But in the meantime, if you need very fast response times to infrequent querying, you are better served using an index built either with p1 or p2 pods.

Also, keep in mind that serverless is currently in public preview and is not considered suitable for production workloads in most cases.