Vectorstore search gives error: Response not transcoded because the transcoder's internal buffer size exceeds the configured limit

giacomo.fonseca · November 17, 2023, 8:44am

I have and index with almost 10,000 vectors and I am trying to run queries that only filter out a few of them. Basically, I pass a vectors of zeros so no semantic search is performed, as I’m only interested in filtering, and the filter sometimes is quite soft. The result is that I am getting back most of my vectorstore. Here is an example were I have many movies reviews and I want to get back all of them except horror movies:

res= index.query(
    vector = [0.0 for _ in range(1536)],
    filter = {
              "genre": {'$ne' : 'horror'}
              }
    top_k=10000,
    include_values=False,
    include_metadata=True
)

I set up top_k as the whole number of chunks I have to make sure I don’t limit the response. The problem is that, I run into the error:

ServiceException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({‘content-length’: ‘99’, ‘content-type’: ‘text/plain’, ‘date’: ‘Fri, 17 Nov 2023 08:07:50 GMT’, ‘server’: ‘envoy’})
HTTP response body: Response not transcoded because the transcoder’s internal buffer size exceeds the configured limit.

I can only get a response if I limit the top_k to 3500 in my case.
Do you know what this is due to, and how I can increase the buffer size?

Jasper · November 17, 2023, 9:42am

Hi @giacomo.fonseca

there are some limitations to PInecone, check them out here Limits

You can not increase the buffer size. You can remove the metadata from your results and only get ids that you can then fetch if needed.

Hope this helps

silas · November 17, 2023, 4:19pm

This type of query is not really a semantic search and would be better handled by a more traditional DB. If this is core to your use case, plus you also require semantic search, I’d consider leveraging multiple types of DBs (Vector + SQL, or Vector + Key/Value, etc.), to best support the query patterns you have.

tim · November 18, 2023, 2:43am

This is because you are hitting the limit of the return request size for topK with data. As @silas says - if you are attempting to do this is would be easier to track ids + data in a Relational DB and so a SQL query.

However, if you must loop over the entire index without hitting rate limits - this works and is how we do it so syncing vector databases for VectorAdmin

github.com

Mintplex-Labs/vector-admin/blob/273456317edc38f9400d6c6fb5f62370aa9e4058/backend/utils/vectordatabases/providers/pinecone/index.js#L213-L239


      
          async rawQuery(host = "", queryParams = {}) {
            var queryResponse;
            const initialPageSize = queryParams?.topK || 1_000;
          
            queryResponse = await this._rawQuery(host, queryParams);
            if (
              !queryResponse.hasOwnProperty("error") ||
              queryResponse?.error?.code !== 500
            )
              return queryResponse;
          
            queryResponse = await this._rawQuery(host, {
              ...queryParams,
              topK: Math.floor(initialPageSize / 2),
            });
            if (
              !queryResponse.hasOwnProperty("error") ||
              queryResponse?.error?.code !== 500
            )
              return queryResponse;

This file has been truncated. show original

system · December 2, 2023, 2:43am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.