Hey guys!
I am working on a multimodal rag for complex pdfs, but i am facing a problem regarding retrieval of irrelevant images for every query.
Problem : when i ask a query that do not require any image as answer, the model sometimes return random images (from uploaded pdf) for those queries. I checked LangSmith traces, this happens when documents with images are retrieved from the pinecone vectorstore, the model doesn’t ignore the context and displays images anyway.
This happens for even simple query such as “Hello”. For this query, i expect only “Hello! How can I assist you today?” as answer but it also returns some images from the uploaded documents along with the answer.
Architecture:
For texts and tables: embeddings of the textual and table content are stored in the vectorstore
For images: For text and tables : Summaries are stored in the vector database, the original chunks are stored in MongoDBStore. These 2 are linked using doc_id
For images : Summaries are stored in the vector database, the original images chunks ( i.e. images in base64 format ) are stored in MongoDBStore , these 2 are also linked using doc_id.
I’m using MultiVectorRetriever by Langchain.
retriever = MultiVectorRetriever(
vectorstore=vectorStore,
docstore=docstore,
id_key=id_key,
)