Pre-filtering by IDs before Query

Looked up all the forum and found some people with the same problem but nothing was conclusive for me.
I want to run my query but constrain the return for a set of known vectors. I know their Ids based on a pre-filter I do with my relational db.
how to include this info in the query?
Without finding any proper solution, what I’m doing is returning the vectors based on their ids and calculating the similarity manually and then ranking them. It solves my problem for now, but seems the wrong way of doing it and I’m still in disbelief that this is not implemented yet and I have to do it manually. Feels like I’m doing something wrong.

my current function in node.js is:
export async function pineconeReturnItems(query, topK, preFilterIds){ // topK is the number of results to return
console.log(‘pineconeReturnItems preFilterIds’, preFilterIds)
const queryObj = {
vector: query,
topK: topK,
};
if (preFilterIds && preFilterIds.length > 0) {
queryObj.filter = {
id: {$in: preFilterIds}
};
}

try {
    const response = await sbertNamespace.query(queryObj);
    return response;
} catch (error) {
  console.error('Error during pineconeReturnItems:', error);
}

}

and it returns nothing everytime I inlcude anything on preFilterIds. It works as expected if this array is empty (but that is not my use case).

No one? Bumping it up

Hi @hetnon.freitas, are you storing your pre-filtered IDs as metadata on the records in your index?

Alternatively, would it be possible to add the logic you are using to filter in your relational database as metadata to your Pinecone records so that you can use the Pinecone filter at query time?

I’m not carrying my Ids in the metadata. I could include it, but seems to me counterintuitive that I have to replicate the id, that is already the id of each array, in the array’s metadata just so I can filter.
Filtering by id seems the most obvious thing a relational database does and I’m just a bit surprised I can’t do this here. So, the answer is no, no way to filter by id itself?

About adding all the fields from my relational database as metadata to the pinecone array so I can filter it at query, I’m not sure if it’s that simple. there are many fields and many rules that I’m just more comfortable using the relational database to do it. Things like ‘less than’, ‘greater than’, intersection, etc, all at the same time using pinecone metadata seems would be a nightmare.

1 Like