I’m using Pinecone serverless with Node and if I have results, and then delete all my records in my index in the Pinecone UI, and then query that index locally, it is accessing old cached results. It doesn’t seem to be refreshing the cached results at all. I should see no records coming back.
The only time it retrieves the correct results is if I change my query parameters to some new value I wasn’t using before. If I then change it again, it again retrieves the old results.
Code;
export const hybridScale = (
sparse: SparseValues,
dense: number[],
alpha: number = 0.5,
): [SparseValues, number[]] => {
if (alpha < 0 || alpha > 1) {
throw new Error("Alpha must be between 0 and 1");
}
// Scale sparse and dense vectors to create hybrid search vectors
const hsparse: SparseValues = {
indices: sparse.indices,
values: sparse.values.map(v => v * (1 - alpha)),
};
const hdense = dense.map(v => v * alpha);
return [hsparse, hdense];
};
const pc = new Pinecone();
const { embeddings } = await embedMany({
model: openai.embedding(EmbeddingModel.OPEN_AI_TEXT_EMBEDDING_3_SMALL),
values: [query],
});
// Perform hybrid search with sparse vectors
const index = pc.Index(indexName);
const space = namespace ? index.namespace(namespace) : index;
const [sparseValues, denseVector] = hybridScale(await encodeQueries(query), embeddings[0], alpha);
const hybridResults = await space.query({
vector: denseVector,
sparseVector: sparseValues,
topK: topK,
includeMetadata: true,
includeValues: true,
});
If I change topK from above, I get accurate results. But then when I revert back to my original topK value, it retrieves the old cached values. Why is this happening?
I’m not sure why you’re seeing that behavior, but it is definitely not expected in Pinecone. Would you mind sharing the full flow of your code so I can look into this further?
Here is an example of one of my functions that simply deletes any records that matches the prefix. It looks up any vector records with this prefix, and then deletes them:
export const deletePineconeRecords = async ({
indexName,
namespace,
prefixes,
}: {
indexName: string;
namespace?: string;
prefixes?: string[];
}): Promise<void> => {
const pc = new Pinecone();
const index = pc.Index(indexName);
const space = namespace ? index.namespace(namespace) : index;
if (prefixes?.length) {
for (const prefix of prefixes) {
const allVectorIds: string[] = [];
let nextPageId = undefined;
do {
const results = await space.listPaginated({
prefix: `${prefix}#`,
limit: 100,
paginationToken: nextPageId,
});
if (results.vectors) {
const pageVectorIds = results.vectors?.map(vector => vector.id);
allVectorIds.push(...pageVectorIds.filter(id => id != null));
}
nextPageId = results.pagination?.next;
} while (nextPageId);
console.log("allVectorIds (BEFORE): ", allVectorIds);
if (allVectorIds.length > 0) {
await space.deleteMany(allVectorIds);
}
const verificationResults = await space.listPaginated({
prefix: `${prefix}#`,
limit: 100,
});
console.log(
`Remaining vectors after deletion for prefix ${prefix}: ${verificationResults.vectors?.length ?? 0}`,
);
console.log("verificationResults: ", verificationResults);
}
}
};
As you can see from the terminal screenshot below, it still retrieves those old vector records:
However, in Pinecone, that record no longer exists - so it did successfully delete it. But for some reason it’s still retrieving my cached results when I run it locally (at least I think that’s what’s happening)?
Then if I run it again and change the limits both back to 100, I again get the same results as the first time, where it says those 2 records (industry3# and industry1# exist). So it seems like it just keeps retrieving cached results.
Let me know if I need to provide any other details here! Thanks in advance for your help