Is it possible to search metada directly?

Hello everyone. I have RAG project and I use metadata for my documents. One of them is a filename : “file_name” I added it for every file. I want to get all filename and check because of the duplication. Can you help me about this ? or any idea for this ?

Hi @Ernecna,

First, thanks for joining our forums.

Now, to your question. Strictly speaking, there isn’t a way to search just by metadata today. Each query presumes that you are looking for similar/nearby vectors, so they require a vector to be included in the search. Metadata is only used to filter vectors from that search.

There are some ways to approximate this, though.

The first is if you used the URI for the file as part of the vector ID. You could then use a List operation to show the vectors that match that URI.

If you didn’t use the URI for vector IDs, there’s another option. This one requires you to have used an index with integrated index. You would be able to search for vectors that have the filename as part of their text. Note that this approach would not be perfect, as it presumes that the vectors with a given file name have that name as part of their vector embedding, which may not be the case. It could also return unrelated vectors (files that reference the one you’re looking for, for example), but you can handle that with a metadata filter.

The last option I can think of would be to use a query with a notional vector (a random vector, or all zeros) and a top_k large enough to capture all of the vectors you expect to have that file name in their metadata. Then include a metadata filter so you’re only returning those vectors. The only reason I hesitate to mention this is on the off chance you have more than 1000 vectors for a given file. In which case this approach wouldn’t capture all of them due to top_k limits.

Cory