Guidance on making asynchronous queries

jltparc · February 2, 2023, 8:43pm

Hello,

I was wondering if there documentation on how to conduct asynchronous queries of data (just like the example for upserts in parallel. My index is relatively small (20k vectors) but I need to query millions of ‘unseen’ vectors against the index to find similar vectors.

Thanks,

Cory_Pinecone · February 8, 2023, 3:04pm

Hi @jltparc,

First, welcome to the Pinecone forums!

Queries are run in the order they’re received, and are non-blocking read operations. So you shouldn’t have to include any asynchronous logic when running them.

Can you share more details about the types of queries you’re running? If you’re running hundreds at a time we don’t have a mechanism for batched reads like with upserts; there’s a deprecated method in the docs to query multiple vectors simultaneously but we don’t recommend using that.

You might be able to average your vectors in groups, run a single read on the resulting vector, then only query the constituent vectors if that one meets a certain threshold. This may take some experimenting to get right (if your outliers are too close to your corpus of expected values they may get drowned out if they’re averaged with too many expected values).

daniel1 · June 9, 2023, 6:37pm

Hi @Cory_Pinecone – thanks for the response. To clarify non-blocking read operations, it seems that index.query is a blocking operation in the current version of the Python SDK, and (as you mention) the batched query system seems to be deprecated.

As such, in a use case where a large of queries need to be made at once, the naive approach does not allow any parallelization. Is the guidance to use the built-in Python primitives for parallelism (asyncio, threadpool, etc), or is there any Pinecone specific recommendations? I’m not able to find much in the documentation and SDK source code, so I imagine the Python builtins are my best bet, but wanted to check.

Thanks,
Daniel

shono · July 3, 2023, 7:58am

You were able to find any best practices here?

vossen.w · July 5, 2023, 7:50pm

I would also want to be able to await an index.query. Making it blocking greatly reduces performance when lots of sequential queries are necessary.

david.herman · August 11, 2023, 2:09pm

i am querying against over 3million input embeddings. so i need this to be as fast as possible. so parallel querying (e.g. multiple pods or replicas??) would be very helpful. a best practices code in python would be very useful

scott · September 10, 2023, 5:02pm

I made changes to support async querying. It turns out the underlying HTTP library does support this (through multiprocessing pool), if you pass async_req=True to query. However, the pinecone library’s Index::query method doesn’t handle the AsyncResult that comes back, as they attempt to parse_query_response assuming it’s actually a response instead of an AsyncResult. So I inherited off of Index and made my own class.

You can see the approach here:

gist.github.com

https://gist.github.com/sshumaker/4972aa0f8a35eb9ef6cf9d9b05614a4b

asyncindex.py

from typing import Optional, List, Union, Dict, Tuple, Generic, Callable, TypeVar
from pinecone.index import Index, parse_query_response, fix_tuple_length, Iterable, _OPENAPI_ENDPOINT_PARAMS # type: ignore
from pinecone.core.client.models import QueryVector, QueryResponse, SparseValues, QueryRequest  # type: ignore
from pinecone.core.utils.error_handling import validate_and_convert_errors  # type: ignore


T = TypeVar('T')

# This class is an async wrapper for multiprocessing.AsyncResult functions that don't use regular async/await
# Pinecone is unfortunately one of those libraries.

This file has been truncated. show original

It returns an ‘AsyncHandle’ (wrapper class in the gist), so you can call ‘get’ when you want to wait for the results. I didn’t have an easier way to chain this on short notice.

To use, instantiate AsyncIndex with num_threads to be like 30 or something. Remember you want to keep the index around.

ana.w · February 19, 2025, 6:02pm

If anyone is still looking for guidance on making asynchronous requests, we now support asyncio in v6 of our Python SDK! You can learn more here: Python SDK - Pinecone Docs