Connection terminated when fetching indexes

rafal1 · April 7, 2024, 3:30am

I’m working on migrating from a pod-based index to a serverless one and seems like the connection gets terminated by the Pinecone server.

The approach:

Migrate namespace by namespace
Get all record IDs from each namespace (through random search)
Batch fetch all records (vectors & metadata) by ID

I’m stuck on step 3 where the Pinecone server randomly terminates the connection without returning any specific error. Here’s an excerpt from the logs:

>>> Request: GET https://[index].svc.us-central1-gcp.pinecone.io/vectors/fetch?ids=ba9839f5-9359-449c-b4c9-4d55b74cf617&[... lots of ids here]&namespace=f36d5fc8-341c-4656-9313-d607ab8f9686
>>> Headers: {"User-Agent":"@pinecone-database/pinecone v2.1.1; lang=typescript; node v21.4.0","Api-Key":"***REDACTED***"}

<<< Status: 200
Failed migrating namespace: f36d5fc8-341c-4656-9313-d607ab8f9686 TypeError: terminated
    at Fetch.onAborted (node:internal/deps/undici/undici:11014:53)
    at Fetch.emit (node:events:519:28)
    at Fetch.emit (node:domain:488:12)
    at Fetch.terminate (node:internal/deps/undici/undici:10200:14)
    at Object.onError (node:internal/deps/undici/undici:11132:38)
    at _Request.onError (node:internal/deps/undici/undici:7303:31)
    at errorRequest (node:internal/deps/undici/undici:9863:17)
    at TLSSocket.onSocketClose (node:internal/deps/undici/undici:9018:9)
    at TLSSocket.emit (node:events:531:35)
    at TLSSocket.emit (node:domain:488:12) {
  [cause]: ResponseContentLengthMismatchError: Response body length does not match content-length header
      at Parser.onMessageComplete (node:internal/deps/undici/undici:8915:32)
      at TLSSocket.onSocketEnd (node:internal/deps/undici/undici:8990:18)
      at TLSSocket.emit (node:events:531:35)
      at TLSSocket.emit (node:domain:488:12)
      at endReadableNT (node:internal/streams/readable:1696:12)
      at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

The first part of the log is the output of PINECONE_DEBUG and second looks like unhandled error from fetch.

This happens in relatively small namespaces with 10k records already. I am aware of max 1000 vectors per fetch limit; are there any other limits in place that may cause the connection to be terminated? Does the Pinecone team have a recommended approach for how to export all vectors from a pod-based index?

rafal1 · April 9, 2024, 10:36am

Update: The error is still happening regularly, but after adding a retry mechanism I am able to fetch all the records successfully.

Lowering the number of IDs to sub 100 also seems to help with the issue (but slows down the migration considerably)

patrick1 · April 10, 2024, 5:08pm

Hello @rafal1

Can you tell us a little more about where you’re running this, is it your laptop or on a cloud service?

If you were hitting a limit, an error message should be returned, it feels like something else is going on here.

rafal1 · April 11, 2024, 8:05am

Hi Patrick!
The script is running on my local machine. I’m thinking this may be a network issue (eg. lost packets or unreliable WiFi), especially considering large payloads.

Maybe this issue would be better reported to the team maintaining the @pinecone-database/pinecone NPM package?