Pinecone serverless caching old results. Referencing old results even after deleted all records

I’m using Pinecone serverless with Node and if I have results, and then delete all my records in my index in the Pinecone UI, and then query that index locally, it is accessing old cached results. It doesn’t seem to be refreshing the cached results at all. I should see no records coming back.

The only time it retrieves the correct results is if I change my query parameters to some new value I wasn’t using before. If I then change it again, it again retrieves the old results.

Code;

export const hybridScale = (
  sparse: SparseValues,
  dense: number[],
  alpha: number = 0.5,
): [SparseValues, number[]] => {
  if (alpha < 0 || alpha > 1) {
    throw new Error("Alpha must be between 0 and 1");
  }

  // Scale sparse and dense vectors to create hybrid search vectors
  const hsparse: SparseValues = {
    indices: sparse.indices,
    values: sparse.values.map(v => v * (1 - alpha)),
  };

  const hdense = dense.map(v => v * alpha);

  return [hsparse, hdense];
};

const pc = new Pinecone();

  const { embeddings } = await embedMany({
    model: openai.embedding(EmbeddingModel.OPEN_AI_TEXT_EMBEDDING_3_SMALL),
    values: [query],
  });

  // Perform hybrid search with sparse vectors
  const index = pc.Index(indexName);
  const space = namespace ? index.namespace(namespace) : index;

  const [sparseValues, denseVector] = hybridScale(await encodeQueries(query), embeddings[0], alpha);

  const hybridResults = await space.query({
    vector: denseVector,
    sparseVector: sparseValues,
    topK: topK,
    includeMetadata: true,
    includeValues: true,
  });

If I change topK from above, I get accurate results. But then when I revert back to my original topK value, it retrieves the old cached values. Why is this happening?

Hi @rachel. Welcome to the Pinecone community!

I’m not sure why you’re seeing that behavior, but it is definitely not expected in Pinecone. Would you mind sharing the full flow of your code so I can look into this further?

Best wishes,
Jesse

Sure, thank you!

Here is an example of one of my functions that simply deletes any records that matches the prefix. It looks up any vector records with this prefix, and then deletes them:

export const deletePineconeRecords = async ({
  indexName,
  namespace,
  prefixes,
}: {
  indexName: string;
  namespace?: string;
  prefixes?: string[];
}): Promise<void> => {
  const pc = new Pinecone();
  const index = pc.Index(indexName);
  const space = namespace ? index.namespace(namespace) : index;

  if (prefixes?.length) {
    for (const prefix of prefixes) {
      const allVectorIds: string[] = [];
      let nextPageId = undefined;
      do {
        const results = await space.listPaginated({
          prefix: `${prefix}#`,
          limit: 100,
          paginationToken: nextPageId,
        });

        if (results.vectors) {
          const pageVectorIds = results.vectors?.map(vector => vector.id);
          allVectorIds.push(...pageVectorIds.filter(id => id != null));
        }
        nextPageId = results.pagination?.next;
      } while (nextPageId);

      console.log("allVectorIds (BEFORE): ", allVectorIds);
      if (allVectorIds.length > 0) {
        await space.deleteMany(allVectorIds);
      }

      const verificationResults = await space.listPaginated({
        prefix: `${prefix}#`,
        limit: 100,
      });
      console.log(
        `Remaining vectors after deletion for prefix ${prefix}: ${verificationResults.vectors?.length ?? 0}`,
      );
      console.log("verificationResults: ", verificationResults);
    }
  }
};

As you can see from the terminal screenshot below, it still retrieves those old vector records:

However, in Pinecone, that record no longer exists - so it did successfully delete it. But for some reason it’s still retrieving my cached results when I run it locally (at least I think that’s what’s happening)?

Now, when I simply change one of the listPaginated limits to a different number (for example 5), it registers that the vectors no longer exist:

Then if I run it again and change the limits both back to 100, I again get the same results as the first time, where it says those 2 records (industry3# and industry1# exist). So it seems like it just keeps retrieving cached results.

Let me know if I need to provide any other details here! Thanks in advance for your help

So sorry for the delay, @rachel. These details are very helpful. I’ve shared them internally and will get back to you when I have some insight.

Best,
Jesse