Bad RAG Performance

Hello everyone,

a few weeks ago I managed to complete my first RAG-Chatbot. After the completion, I inserted a chatbot memory and a gradio interface. Unfortunately, from this point on everything went downhill. My chatbot couldn’t process the provided information anymore in the right way and many answers were factually incorrect.

I’ve provided my entire code to you, so maybe someone can fix this problem.

My code after the memory and gradio insertion:

import gradio as gr
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA, ConversationChain
from langchain.chains.conversation.memory import ConversationBufferMemory
import os
import glob

Load documents from directory

loader = DirectoryLoader(‘portfolio’, glob=‘**/*.txt’)
docs = loader.load()

Set environment variables for API keys

os.environ[‘OPENAI_API_KEY’] = “secret”
os.environ[‘PINECONE_API_KEY’] = “secret”

Initialize embeddings and vector store

embeddings = OpenAIEmbeddings(model=“text-embedding-3-small”)
index_name = “pinecone-chatbot”

Split documents into sections

text_splitter = RecursiveCharacterTextSplitter()
split_docs = text_splitter.split_documents(docs)

Create a vector store from document sections

vectorstore = PineconeVectorStore.from_documents(split_docs, embeddings, index_name=index_name)

Definieren der Suchanfrage

query = “Was ist die Studierendenanzahl an der Hochschule?”

Durchführung einer Ähnlichkeitssuche im VectorStore

similar_docs = vectorstore.similarity_search(query)

Create an instance of ChatOpenAI

llm = ChatOpenAI(model=“gpt-4o”, temperature=0)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type=“stuff”, retriever=vectorstore.as_retriever())

Initialize the ConversationChain with memory

memory = ConversationBufferMemory()
conversation_chain = ConversationChain(llm=llm, memory=memory)

Define a function that uses Gradio to interact with the chatbot

def chatbot_response(user_input):
if isinstance(user_input, str):
response = conversation_chain.invoke(user_input.strip())
if isinstance(response, dict) and ‘response’ in response:
# Return only the response part to the Gradio interface
return response[‘response’]
elif isinstance(response, str):
return response
else:
return “Irgendwas ist schiefgelaufen. Bitte versuche es erneut.”
return “Ungültiger Eingabetyp.”

Start the Gradio interface

def main():
interface = gr.Interface(
fn=chatbot_response,
inputs=“text”,
outputs=“text”,
title=“Virtueller Assistent”,
description=“Hallo! Ich bin ein virtueller Assistent.”
)
interface.launch(share=True)

if name == “main”:
main()

my code before the changes:

from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
import os
import glob

loader = DirectoryLoader(‘portfolio’, glob=‘**/*.txt’)

docs = loader.load()

NICHT EMPFOHLEN: Hartkodieren von API-Schlüsseln

OPENAI_API_KEY = “secret”
PINECONE_API_KEY = “secret”

os.environ[‘OPENAI_API_KEY’] = OPENAI_API_KEY
os.environ[‘PINECONE_API_KEY’] = PINECONE_API_KEY

embeddings = OpenAIEmbeddings(
model=“text-embedding-3-small”
)

index_name = “pinecone-chatbot”

Split our documents into chunks

text_splitter = RecursiveCharacterTextSplitter()
split_docs = text_splitter.split_documents(docs)

vectorstore = PineconeVectorStore.from_documents(split_docs, embeddings, index_name=index_name)

Definieren der Suchanfrage

query = “Was ist die Studierendenanzahl an der Hochschule?”

Durchführung einer Ähnlichkeitssuche im VectorStore

similar_docs = vectorstore.similarity_search(query)

from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

Erstellen einer Instanz von ChatOpenAI mit dem Modell GPT-4o

llm = ChatOpenAI(
model=“gpt-4o”,
temperature=0
)

Erstellen einer RetrievalQA-Instanz, die das VectorStore Retrieval verwendet

qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type=“stuff”,
retriever=vectorstore.as_retriever()
)

Aufrufen der RetrievalQA Instanz mit einer Query

def main():
print(“Hallo! Ich bin ein virtueller Assistent. Frage mich etwas oder tippe ‘exit’ zum Beenden.”)

while True:
    user_input = input("Du: ")
    if user_input.lower() == 'exit':
        print("Chatbot: Auf Wiedersehen!")
        break

    # Nutze den Chatbot zur Beantwortung der Frage
    response = qa.invoke(user_input)
    
    # Überprüfe, ob die Antwort ein Dictionary ist und ein 'result' Schlüssel vorhanden ist
    if isinstance(response, dict) and 'result' in response:
        print(f"Chatbot: {response['result']}")
    else:
        print(f"Chatbot: {response}")

if name == “main”:
main()

Thank you for your help!

Hi @CodingNoob, and thanks for your question!

Here’s a couple of thoughts off the top of my head:

  1. What are you supplying in terms of data? How many files, what size on average, and what information do your files include?
  2. If you simplify and start with a simpler LangChain setup, such as our Quickstart guide, using your same index - do you get the same performance/results?
  3. Most of the time, the quality and amount of your data matter most for RAG pipeline performance, assuming all the other components are functioning properly

Hope this helps!

Best,
Zack