Can you apply filtering on metadata without having to also input a query?

I would like to only perform a filter on the metadata to select a subset of my vectorstore, to get a list of the relevant results (and that’s really it for now). So I was wondering if one can perform a pure filter on metadata without having to perform also semantic search on the vector values. So far I only have seen these examples:

query = ''asking something about documentaries "
embedded_query = embeddings.embed_query(query)
        "genre": {"$eq": "documentary"},
        "year": 2019

where you pass an embedded query (a dense vector) and the filters you want. I know you can pass an empty query and it still works, like: query = " ", which I’m guessing is not producing much of a semantic search, and at that point only the filters are really doing the job. I suppose here I can give a very high top_k to make sure I get all the results matching the filters. But is this the only way?
How can I get back a list of results just based on metadata filtering?

Yeah, passing a n long vector of zeros in place of vector is the best way to just get the best top_k based on metadata alone.

However, I have no idea of doing this is deterministic or if all you will get back is random results each run. So if you are using openAI text-ada-embedding-002 you would just pass a 1536 long array of zeros.

Ok thanks! You would pass something like this you mean: vector=[0.0 for _ in range(1536)] ?

Do you think passing that would be better than passing an embedded empty query…? I ask because the result of embedding an empty string: “” is a vector with values different from zeros…

Other than that I think the latency is the same, and the final results seem too. The order of relevance and score is not important at the moment, I just need to make sure I get all the results from those filters.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.