Error during upsert with Langchain

Hello @clyde.hunter1984 and thank you for posting.

The error message RuntimeError: can't start new thread is a low-level error; starting the Python runtime cannot create a new thread. This normally happens when a program uses a large number of file handles.

Looking at the code, I see inside the loop you’re creating a vectorstore each with 10 threads. I suspect that the vectorstore is not getting cleaned up and holding open the file handles (connections).

The correct approach would be to create the vectorstore outside the loop.

vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings)
  for f in globFile:
    if f.name.split(".")[0].isnumeric():
      loader = BSHTMLLoader(f)
      document = loader.load()
      if sys.getsizeof(document[0].page_content) > 40000:
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=200)
        split_html = text_splitter.split_documents(document)
        for h in split_html:
          document_split = [Document(page_content=h.page_content, metadata={"source": h.metadata["source"], "title": h.metadata["title"]})]
          vectorstore.add_documents(document_split)
      else:
        vectorstore.add_documents(document_split)

I recommend reading our LangChain guide.