Metadata filtering dynamically on query id

All of the metadata filtering examples I can find filter at query time with known values. For example:

index.query(
    vector=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
    filter={
        "genre": {"$eq": "documentary"},
        "year": 2019
    },
    top_k=1,
    include_metadata=True
)

you have to know year is 2019, and genre is documentary. Is it possible to filter on whatever metadata is currently associated with the query vector? Something like this

index.query(
    vector=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
    filter={
        "genre": <whatever genre query vector has stored>
        "year": <whatever year query vector has stored>
    },
    top_k=1,
    include_metadata=True
)

I can obviously query once to get the metadata values associated with vector and then query again, but a dynamic single query would be nice.

To clarify, do you mean query by id rather than querying with a vector? More like:

index.query(
    id="id-a",
    filter={
        "genre": <whatever genre query vector has stored>
        "year": <whatever year query vector has stored>
    },
    top_k=1,
    include_metadata=True
)

Thanks for sharing your feedback!

Pinecone

yes, by id. Or vector. Metadata filtering applies either way right?

Yes you can use filters with query by Id or Vector.

Right, but if I don’t know the filter value at query time? Can pinecone determine the associated value with an id. An example:

# I upload a few vectors with metadata called "genre" and some have the genre "comedy" and some  
# have "horror". One of the "comedy" vectors has id 123, but I don't know that it's genre is 'comedy'
pinecone.query(id='123', top_k=5, filter={'genre': 'use whatever pinecone has stored here'})

# as opposed to
result = pinecone.query(id='123', top_k=1, include_metadata=True)
pinecone.query(id='123', top_k=5, filter={'genre':result[0]['metadata']['genre'])

I’m trying to avoid querying twice

What’s the way to query by id without a metadata filter?

@regutonlabs the metadata filter is always optional. If you don’t include a filter parameter but do include the vector ID to use for the query, you’ll be doing exactly as you described.

thank you! I think I was thinking of fetching when I asked the question… I’ve found a workaround that way.

Hey @Cory_Pinecone , I’m hitting this error which hints there must be a limit on the number of ids to fetch. To be exact, I tried with 775 ids and got the error.

Found here on your docs that : * Fetches and deletes

  • Max vectors per fetch or delete request is 1,000.

So, not sure why I’m getting the error… then. It’d be helpful to know if there’s a fixed/hard limit or any other method to check before so that I can avoid this Exception. Worth mentioning this exercise is being performed on a Serverless index.
Thank you in advance!

/usr/local/lib/python3.10/dist-packages/pinecone/core/client/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    258                 raise ServiceException(http_resp=r)
    259 
--> 260             raise PineconeApiException(http_resp=r)
    261 
    262         return r


PineconeApiException: (414)
Reason: Request-URI Too Large
HTTP response headers: HTTPHeaderDict({'Server': 'awselb/2.0', 'Date': 'Mon, 05 Feb 2024 11:38:03 GMT', 'Content-Type': 'text/html', 'Content-Length': '142', 'Connection': 'close'})
HTTP response body: <html>
<head><title>414 Request-URI Too Large</title></head>
<body>
<center><h1>414 Request-URI Too Large</h1></center>
</body>
</html>

There’s a limit of 1000 IDs in a single fetch operation, but I don’t think that’s the issue here. If you’re returning all of the values and metadata for these vectors you’re probably running up against the limits in a single HTTP request. Since that’s more about how much data is being returned it’s not so much a Pinecone limit as a protocol limit.

Try using smaller iterative batches of 100, to stay under the HTTP transfer size limit. Or only return vectors or metadata but not both if that fits your use case.

Thanks for your prompt input!

is that even possible when fetching records? I thought it’s only Query that allows includeValues and includeMetadata as body params.
Thanks in advance!

You’re right, not including either metadata or values is only an option for queries, not fetches. Sorry for the confusion.