Accumulation of threads and increased RAM usage over time while embedding

mhachemi · April 15, 2025, 11:42am

Hi everyone, I have a question regarding threads while embedding data into Pinecone. When I call the add_documents method, I notice that seven new threads are spawned, which seems expected. However, after the operation completes, these threads remain alive, and with each subsequent request, additional threads are created and also kept alive. This results in an accumulation of threads and increased RAM usage over time, eventually causing my service to crash in production. I’ve ensured that both the Pinecone vector store and embedding instances follow a singleton pattern. Any suggestions or insights would be greatly appreciated!

print(f"Active threads before: {[t.name for t in threading.enumerate()]}")    
            text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
            print(f"Active threads 1 : {[t.name for t in threading.enumerate()]}")    

            pages = text_splitter.create_documents([text_input])
            print(f"Active threads 2 : {[t.name for t in threading.enumerate()]}")  
            combined_training_text = ""

            for doc in pages:
                combined_training_text += doc.page_content + " "  # Add space between texts if needed

            characters_count = count_characters(combined_training_text)
            print(f"Active threads 3 : {[t.name for t in threading.enumerate()]}")  
            pinecone_index = select_pinecone_manager(llm)
            print(f"Active threads 4 : {[t.name for t in threading.enumerate()]}")  
            vectorstore = pinecone_index.get_vectorstore(namespace_id)
            print(f"Active threads 5 : {[t.name for t in threading.enumerate()]}")  
            vector_ids = vectorstore.add_documents(documents=pages)

            print(f"Active threads 6 : {[t.name for t in threading.enumerate()]}")  
            return vector_ids, characters_count, combined_training_text[:1000]

jenna · April 17, 2025, 5:22pm

Hi @mhachemi - Thanks for reaching out!

Can you share more context here? Looks like you’re using libraries external to Pinecone SDK. Which versions are you using?

jenna · April 21, 2025, 7:26pm

OP fixed by updating to the latest version of langchain and pinecone.