I am trying to observe the behaviour of exclude_id or filter. However, it seems like the argument is not working at all! it keeps giving me same query result even if i exclude previous query result id in next query (everytime) Can anyone help to see what is going on?
Attached is my query and a screenshot of the output:
import pinecone
from collections import defaultdict
Hi @bobziyangding. Pinecone doesn’t have an “exclude_ids” option for queries. You can use metadata filtering to filter out those vectors, but they would have to have a common piece of metadata to do that.
You could do something like this first:
for id in consumed_id:
update_response = index.update(
id=id,
set_metadata={'excluded': True},
namespace='read-card'
)
Then you would filter for those in your original query like so:
Keep in mind that the metadata will remain the same after all of this, so if you want to reset it you’ll need to set those to False, using the same update operation as before and just changing the boolean value.
Hi. @Cory_Pinecone Thank you so much for the help! However, I still need some sorts of id restriction because, essentially, what Im doing is I need to supply a list of content id that a user has already consumed, so that the query should only query results that ‘has not seen’ by a specific user.
In this way, for each user, its query is basically searching in a different subset of all the samples in the database, and that subset has to be specified in someway, and I was thinking about using id.
Given this situation, are you aware of any solution? I know that you can put id as a metadatafield, but that will immensly increase the cardinality and making the performance really bad…
Is there any alternative way to have this solved? I am okay with approximate and non-exact subset query solutions
I have integrated a similar structure on my current platform. I’ve found that relying solely on Pinecone’s metadata for exclusion criteria isn’t the most efficient method. To address this, I’ve integrated Redis into the workflow for handling exclusions more effectively.
Our process begins by querying our Redis database to determine which items the user has already seen. Once we have this information, we then proceed to interact with Pinecone, using it to specifically exclude the IDs of items already viewed by the user. This approach is facilitated by two specific metadata filters within Pinecone, which are tailored to support this exclusion process.
This hybrid method of using Redis for initial exclusion checks, followed by Pinecone queries with refined exclusion criteria, has proven to be a more optimal solution in our use case.
@bobziyangding while I’m obviously reticent about telling Pinecone users to add another database to their stack, I think @s170559 may have the optimal solution for you here. Would this be feasible for you to implement?