How to query based on Source(file name) of metadata

1.As i want to create embeddings for many loan cases txt files, so I should check weather the embedding for particular file, before creating and storing embeddings in pinecone.
2. While asking queries i need to ask based on file name, because all files have similar kind of data, for example if I ask “what is Bank Name?” Then will get the bank details from all the files embeddings, but we want bank name from specific file.
How can we achieve these.

Hi again @anushagudipati64,

You would want to use a combination of index IDs and metadata. Depending on your application you might also use namespaces, to segment your vectors based on bank (if you won’t have to query across multiple banks at once, for example).

Index IDs can be whatever you like, as long as they are < 512 bytes in length and use the ASCII portion of UTF-8 (so no emojis or non-English letters). We generally recommend that each source file have its own ID prefix based on the name of the file, and then an integer to further identify the chunk of that file that a particular embedding represents.

Note that while this technique is supported on both pod-based and serverless indexes, making full use of prefix IDs via List is only supported on serverless. See this part of the prefix ID page for more details.

Metadata filtering is a little more flexible, in that you can set whatever metadata keys you like and then store a variety of data types. See supported metadata types for the exact ones. But in this case, you could use a string to identify the source bank of a given file, and then filter your query to only match against those vectors. There’s a quick version of how to make use of metadata filtering in our docs, but for a fuller discussion of metadata filtering, I recommend this article.

Does that answer your question?