I am trying to upsert embeddings to my pod, and it always stops just when it hits 2000 records. How can I go around this. Is there any workaround? This is driving me mad. I’ve got about 200000 rows to go. Any help is appreciated. Please assist!
Here’s the code I am currently using:
from tqdm.auto import tqdm
import http
MODEL = "text-embedding-ada-002"
count = 0 # we'll use the count to create unique IDs
batch_size = 250 # process everything in batches of 32
for i in tqdm(range(0, len(trec['Content']), batch_size)):
# set end position of batch
i_end = min(i+batch_size, len(trec['Content']))
# get batch of lines and IDs
lines_batch = trec['Content'][i: i+i_end]
#print(lines_batch)
ids_batch = [str(n) for n in range(i+3600, i_end+3600)]
# create embeddings
res = openai.Embedding.create(input=list(lines_batch), engine=MODEL)
embeds = [record['embedding'] for record in res['data']]
# prep metadata and upsert batch
meta = [{'Content': line} for line in lines_batch]
to_upsert = zip(ids_batch, embeds, meta)
# upsert to Pinecone
index.upsert(vectors=list(to_upsert))
Anyone know how I can get it to continuously keep uploading?
Do you get any error messages when the process stops? Or does the process just hang? If the latter, can you add some print() statements so we can see where it’s hanging? It could be the OpenAI call, it could be the upserts, it could be something else entirely. Without knowing where it’s getting blocked, it’s hard to say how to fix it.
We also recommend using batch sizes of 100 when performing upserts. I doubt it has anything to do with the issue you’re running into, but I wanted to call it out as just a good habit to be in.
15%
8/54 [00:25<02:38, 3.45s/it]
---------------------------------------------------------------------------
InvalidRequestError Traceback (most recent call last)
<ipython-input-98-80e716b39ad7> in <module>
15 # create embeddings
16
---> 17 res = openai.Embedding.create(input=list(lines_batch), engine=MODEL)
18 embeds = [record['embedding'] for record in res['data']]
19 # prep metadata and upsert batch
4 frames
/usr/local/lib/python3.8/dist-packages/openai/api_requestor.py in _interpret_response_line(self, rbody, rcode, rheaders, stream)
677 stream_error = stream and "error" in resp.data
678 if stream_error or not 200 <= rcode < 300:
--> 679 raise self.handle_error_response(
680 rbody, rcode, resp.data, rheaders, stream_error=stream_error
681 )
InvalidRequestError: ['In the 2007 season, mcgowdu01 had a record of 12 wins and 10 losses, with an earned run average (ERA) of 4.08. He was the starting pitcher in 27 of the 27 games in which he pitched. He recorded 1 shutouts, 0 saves, and finished 0 games. He walked 61 batters, and struck out 144. He allowed 14 home runs, was responsible for 2 hit by pitches, and threw 13 wild pitches. He had 2 complete games.', 'In the 2007 season, mcleama01 had a record of 0 wins and 0 losses, with an earned run average (ERA) of 8.22. He was the starting pitcher in 0 of the 4 games in which he pitched. He recorded 0 shutouts, 0 saves, and finished 0 games. He walked 2 batters, and struck out 5. He allowed 4 home runs, was responsible for 0 hit by pitches, and threw 0 wild pitches. He had 0 complete games.', 'In the 2007 season, mclemma02 had a record of 3 wins and 0 losses, with an earned run average (ERA) of 3.86. He was the starting pitcher in 0 of the 29 games in which he pitched. He recorded 0 shutouts, 0 saves, and finished 9 games. He walked 18 batters, and struck out 35. He allowed 5 home runs, was responsible for 1 hit by pitches, and threw 2 wild pitches. He had 0 complete games.', 'In the 2007 season, mechegi01 had a record of 9 wins and 13 losses, with an earned run average (ERA) of 3.67. He was the starting pitcher in 34 of the 34 games in which he pitched. He recorded 0 shutouts, 0 saves, and finished 0 games. He walked 62 batters, and struck out 156. He allowe...```
That looks like an OpenAI error. I hate to punt on customer problems, but I’m far from an OpenAI expert and fear I could give you the wrong advice if I tried to debug this. I recommend you crosspost on their community: https://community.openai.com/