I am trying to Query using Metadata Filtering, specifically over array of strings, Returning No Data

I am using pincone typescript client, and was not able to filter the query using metadata.
So I tried the same set of queries using CURL, and I think few syntax are not working.
or I am doing something wrong ?

metadata: {
 tags: ["Creator", "Consumer"]
}

Now after storing vectors, and now trying to query the data

"vector": [.....]
"filter" : { "tags": { "$in": ["Creator", "creator"] } }, 
"namespace" : "valid-namespace",
"topK": 4,
"includeMetadata": true,

I am getting results if I use

"filter": { "tags.0": "Creator"}, 

For other ways, I am not getting the results:

  "filter": { "tags": { "$in": ["Creator", "creator"] } }, 
  "filter": { "tags": { "$eq" :"Creator"} }, 
  "filter": { "tags": "Creator"},

how do I fix it ?

1 Like

Did you ever figure this out? I am having a similar issue.

I have upserted into pinecone in the following way:

for doc in tqdm(docs_restaurant):
    chunks = text_splitter.split_text(str(doc.metadata))
    for i, chunk in enumerate(chunks):
        documents.append({
            'id': f'{doc.page_content}_{i}',
            'text': chunk,
            'metadata': {
                'file': 'test'
            }
        })

for i in tqdm(range(0, len(documents), batch_size)):
    i_end = min(len(documents), i+batch_size)
    res = s.post(
        f"{endpoint_url}/upsert",
        headers=headers,
        json={
            "documents": documents[i:i_end]
        }
    )

The data is in pinecone. I can query it with no filter successfully ala:

query_response = self.index.query(
                    top_k=query.top_k,
                    vector=query.embedding,
                    filter=None,
                    include_metadata=True,
                )

However, when I try this:

self.index.query(
top_k=query.top_k,
vector=query.embedding,
filter={'file': {'$eq': 'test'}},
include_metadata=True,)

It returns nothing. Any guidance would be much appreciated.

1 Like

I am facing the same problem. Does anyone know what is going on?

Hello
I tried it and everything worked fine.
I still don’t understand why it’s not working for you.

import { PineconeClient } from '@pinecone-database/pinecone';

async function main() {
    const client = new PineconeClient();

    await client.init({
        apiKey: "YOUR_API_KEY",
        environment: "YOUR_ENVIRONMENT",
    });

    const index = client.Index("test-index");

    const queryRequest = {
        vector: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
        topK: 10,
        includeValues: true,
        includeMetadata: true,
        filter: {
            file: { $eq: "restaurants" },
        },
        namespace: "",
    };
    const queryResponse = await index.query({ queryRequest });

    if (queryResponse.matches && queryResponse.matches.length > 0) {
        const match = queryResponse.matches[0];
        for (const key in match) {
            console.log(`key: ${key}, value: ${match[key]}`);
        }
    }
}

main().catch(error => {
    console.error(error.message);
    console.error(error.stack);
});
import pinecone

def main():
    pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT")
    
    index = pinecone.Index("test-index")

    query_response = index.query(
        namespace="",
        top_k=10,
        include_values=True,
        include_metadata=True,
        vector=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
        filter={
            "file": {"$eq": "restaurants"}
        }
    )

    print(query_response.matches[0])

try:
    main()
except Exception as error:
    print(f"An error occurred: {error}")

Quick update on a problem I was facing.

First, my problem was with filtering using a list of strings as metadata. I was using LangchainJS and the Typescript Pinecone client, and I couldn’t quite get it to work as expected. The problem was that the lists of strings were not stored correctly.

I managed to solve the issue by upgrading Langchain to version 0.0.92 from 0.0.84 (using Pinecone version 0.0.10). Now everything is working smoothly.