Hello @clyde.hunter1984 and thank you for posting.
The error message RuntimeError: can't start new thread
is a low-level error; starting the Python runtime cannot create a new thread. This normally happens when a program uses a large number of file handles.
Looking at the code, I see inside the loop you’re creating a vectorstore
each with 10 threads
. I suspect that the vectorstore
is not getting cleaned up and holding open the file handles (connections).
The correct approach would be to create the vectorstore
outside the loop.
vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings)
for f in globFile:
if f.name.split(".")[0].isnumeric():
loader = BSHTMLLoader(f)
document = loader.load()
if sys.getsizeof(document[0].page_content) > 40000:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=200)
split_html = text_splitter.split_documents(document)
for h in split_html:
document_split = [Document(page_content=h.page_content, metadata={"source": h.metadata["source"], "title": h.metadata["title"]})]
vectorstore.add_documents(document_split)
else:
vectorstore.add_documents(document_split)
I recommend reading our LangChain guide.