Where to save BM25Encoder?

ftihsrglu · September 22, 2025, 2:25pm

Hello everyone,

I am trying to build a RAG system with hybrid search for my application. In the applciation users will upload their documents and later on they will be able to chat with their documents. I can store the dense and sparse vectors to a Pinecone instance, so far so good. But I have BM25 encoder to encode the queries to make hybrid search, where should i save this encoder? I am aware that there is a model in Pinecone called pinecone-sparse-english-v0 for sparse vectors but I think this model is only for English language, as the name suggests.

I can save the encoder to an AWS S3 bucket but I feel like it’s overkill

So, if anyone knows what to do, please let me know.

bm25_encoder = BM25Encoder() ## where to save this encoder after creating it?
bm25_encoder.fit([chunk.page_content for chunk in all_chunks])

perry · September 22, 2025, 2:26pm

Why not use our hosted sparse model? pinecone-sparse-english-v0 - Pinecone Docs

ftihsrglu · September 22, 2025, 2:27pm

As I have written, I think it’s only for English. But I want my app to be multilingual.