I have and index with almost 10,000 vectors and I am trying to run queries that only filter out a few of them. Basically, I pass a vectors of zeros so no semantic search is performed, as I’m only interested in filtering, and the filter sometimes is quite soft. The result is that I am getting back most of my vectorstore. Here is an example were I have many movies reviews and I want to get back all of them except horror movies:
This type of query is not really a semantic search and would be better handled by a more traditional DB. If this is core to your use case, plus you also require semantic search, I’d consider leveraging multiple types of DBs (Vector + SQL, or Vector + Key/Value, etc.), to best support the query patterns you have.
This is because you are hitting the limit of the return request size for topK with data. As @silas says - if you are attempting to do this is would be easier to track ids + data in a Relational DB and so a SQL query.
However, if you must loop over the entire index without hitting rate limits - this works and is how we do it so syncing vector databases for VectorAdmin