you have to know year is 2019, and genre is documentary. Is it possible to filter on whatever metadata is currently associated with the query vector? Something like this
index.query(
vector=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
filter={
"genre": <whatever genre query vector has stored>
"year": <whatever year query vector has stored>
},
top_k=1,
include_metadata=True
)
I can obviously query once to get the metadata values associated with vector and then query again, but a dynamic single query would be nice.
Right, but if I don’t know the filter value at query time? Can pinecone determine the associated value with an id. An example:
# I upload a few vectors with metadata called "genre" and some have the genre "comedy" and some
# have "horror". One of the "comedy" vectors has id 123, but I don't know that it's genre is 'comedy'
pinecone.query(id='123', top_k=5, filter={'genre': 'use whatever pinecone has stored here'})
# as opposed to
result = pinecone.query(id='123', top_k=1, include_metadata=True)
pinecone.query(id='123', top_k=5, filter={'genre':result[0]['metadata']['genre'])
@regutonlabs the metadata filter is always optional. If you don’t include a filter parameter but do include the vector ID to use for the query, you’ll be doing exactly as you described.
Hey @Cory_Pinecone , I’m hitting this error which hints there must be a limit on the number of ids to fetch. To be exact, I tried with 775 ids and got the error.
So, not sure why I’m getting the error… then. It’d be helpful to know if there’s a fixed/hard limit or any other method to check before so that I can avoid this Exception. Worth mentioning this exercise is being performed on a Serverless index.
Thank you in advance!
/usr/local/lib/python3.10/dist-packages/pinecone/core/client/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
258 raise ServiceException(http_resp=r)
259
--> 260 raise PineconeApiException(http_resp=r)
261
262 return r
PineconeApiException: (414)
Reason: Request-URI Too Large
HTTP response headers: HTTPHeaderDict({'Server': 'awselb/2.0', 'Date': 'Mon, 05 Feb 2024 11:38:03 GMT', 'Content-Type': 'text/html', 'Content-Length': '142', 'Connection': 'close'})
HTTP response body: <html>
<head><title>414 Request-URI Too Large</title></head>
<body>
<center><h1>414 Request-URI Too Large</h1></center>
</body>
</html>
There’s a limit of 1000 IDs in a single fetch operation, but I don’t think that’s the issue here. If you’re returning all of the values and metadata for these vectors you’re probably running up against the limits in a single HTTP request. Since that’s more about how much data is being returned it’s not so much a Pinecone limit as a protocol limit.
Try using smaller iterative batches of 100, to stay under the HTTP transfer size limit. Or only return vectors or metadata but not both if that fits your use case.
is that even possible when fetching records? I thought it’s only Query that allows includeValues and includeMetadata as body params.
Thanks in advance!