I’ve been using Pinecone on a small scale project for a while and have recently started having problems with connection resets during upsert operations. I"m using the free tier for the moment.
My code is python with a small FastAPI based service. I’m mostly doing single vector upserts from an interactive UI that accumulates knowledge over time, rather than large batch upserts of existing materials. The UI calls my FastAPI service, which then calls Pinecone.
Now, single upserts spaced out over time works fine.
However, if I have say 10 upserts to do in quick succession (a small interactive batch), I find that Pinecone frequently fails with a “connection reset by peer” error after five or so. I’m guessing that this is some kind of rate limiting kicking in.
So my questions are:
1/ Is this in fact how Pinecone behaves when rate limiting?
2/ Is it the upsert that is being rate limited, or concurrent connections that are being constrained? (see below)
2/ What is the limit I should manage to?
3/ How to best mitigate it other than simply back off and retry since there doesn’t seem to be any back pressure or other signal available beyond the hard connection failure. (No 429 with ‘Retry-after’ style feedback)
Concurrency:
FastAPI has an interesting concurrency model given Pinecone python presents as a synchronous API. Each concurrent HTTP request is initially handled within FastAPI by the main thread, but then handed off to a pooled thread to do the actual “work”. My code is run within the pooled thread. This allows multiple synchronous execution paths to be handled in parallel. But it also means I end up with more than one pinecone connection since I establish the connection within my code.
If in fact the issue isn’t upsert rate limiting but is rather concurrent connection limits, I’ll need to change the way I’m handling the creation of pinecone connections to share them somehow … perhaps by wrapping my code with an async declaration, creating a sharable pinecone connection in that first layer and then invoking the main body of my code within a manually constructed thread. Although that assumes a pinecone connection is thread-safe for concurrent usage.
Answers and suggestions appreciated.