Creating a chatbot with Pinecone, ada002 and the OpenAI API in Python

CodingNoob · May 8, 2024, 5:56pm

Hello everybody,

before I start to describe my issue, let me describe my starting point and my coding project. I started coding some weeks ago and this is my first bigger project. The aim of it is to create a chatbot with my own data (txt files) that can be used on a website for support reasons.

So before starting with the actual issue, let me share my recent code with you:

import os
import openai
from openai import OpenAI
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter
from pinecone import Index

API-Keys setzen

os.environ[“OPENAI_API_KEY”] = “secret”
os.environ[“PINECONE_API_KEY”] = “secret”
os.environ[“PINECONE_ENVIRONMENT”] = “us-east-1”

OpenAI Client initialisieren

client = OpenAI(api_key=os.environ[“OPENAI_API_KEY”])

Pinecone Index initialisieren

index = Index(
index_name=“secret”,
host=“secret”,
api_key=os.getenv(“PINECONE_API_KEY”),
environment=os.getenv(“PINECONE_ENVIRONMENT”)
)

Dokumente laden und aufteilen

loader = TextLoader(“C:/Users/secret/OneDrive/secret.txt”)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

Funktionen definieren

def embed_and_index_documents():
“”“Dokumente in Embeddings umwandeln und in Pinecone indexieren.”“”
doc_id = 1 # Starten mit einer ID von 1
for doc in docs:
# Text aus dem Attribut page_content verwenden
response = client.embeddings.create(
model=“text-embedding-ada-002”,
input=[doc.page_content] # Liste von Strings für die Batch-Verarbeitung
)
embedding = response.data[0].embedding
# Vektoren und Metadaten korrekt an Pinecone senden
index.upsert(vectors=[(str(doc_id), embedding, {“text”: doc.page_content})])
doc_id += 1 # Erhöhen der doc_id für das nächste Dokument

def ask_question(question):
“”“Eine Frage beantworten, indem zuerst der Pinecone-Index abgefragt wird und dann eine Antwort generiert wird.”“”
response = client.embeddings.create(
model=“text-embedding-ada-002”,
input=[question]
)
question_embedding = response.data[0].embedding
results = index.query(positive_vectors=[question_embedding], top_k=3)
document_ids = [result.id for result in results[“matches”]]
documents = [index.fetch(ids=[doc_id]).get(doc_id).metadata[“text”] for doc_id in document_ids]

# Das Foundation Model nutzen, um eine Antwort zu generieren
context = " ".join(documents)
response = openai.Completion.create(
    model="text-davinci-002",
    prompt=f"Answer the question based on the text: {context}\n\nQuestion: {question}",
    max_tokens=150
)
return response.choices[0].text

Dokumente indexieren

embed_and_index_documents()

Frage stellen und Antwort erhalten

response = ask_question(“Why are dolphins so funny?”)
print(“Antwort:”, response)

So, whenever I try to execute the code, the following error message appears:

“C:\Users\secret\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\core\client\rest.py”, line 261, in request raise PineconeApiException(http_resp=r) pinecone.core.client.exceptions.PineconeApiException: (400) Reason: Bad Request HTTP response headers: HTTPHeaderDict({‘Date’: ‘Tue, 07 May 2024 21:14:31 GMT’, ‘Content-Type’: ‘application/json’, ‘Content-Length’: ‘90’, ‘Connection’: ‘keep-alive’, ‘x-pinecone-request-latency-ms’: ‘44’, ‘x-pinecone-request-id’: ‘7683065097316603496’, ‘x-envoy-upstream-service-time’: ‘32’, ‘server’: ‘envoy’}) HTTP response body: {“code”:3,“message”:“Cannot provide both ‘ID’ and ‘vector’ at the same time”,“details”:}

As I already mentioned, I am quite new to coding, so your help would be very much appreciated. If you recognize any other weakpoints in my code that could cause any issues, please let me know. I am also looking for people who are familiar with creating chatbots, so if you would like to connect and maybe work on the project together, don’t hesitate to reach out to me.

Thank you very much in advance for your help! I wish y’all a nice day!

ZacharyProser · May 8, 2024, 6:52pm

Hi @CodingNoob,

Welcome to the Pinecone forums and thanks for your question!

Your index.upsert statement looks a little suspect to me. Please have a look at our Quickstart guide which shows what the upsert call should look like:

index = pc.Index(index_name)

index.upsert(
    vectors=[
        {"id": "vec1", "values": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]},
        {"id": "vec2", "values": [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]},
        {"id": "vec3", "values": [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]},
        {"id": "vec4", "values": [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]}
    ],
    namespace="ns1"
)

index.upsert(
    vectors=[
        {"id": "vec5", "values": [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]},
        {"id": "vec6", "values": [0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6]},
        {"id": "vec7", "values": [0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7]},
        {"id": "vec8", "values": [0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8]}
    ],
    namespace="ns2"
)

So the ID is a string and the values is a list of floating point numbers (vectors) that you get back from OpenAI’s ada embedding model.

In your code, it looks like you’re trying to upsert a single list:

index.upsert(vectors=[(str(doc_id), embedding, {“text”: doc.page_content})])

Since you mentioned you’re new to coding, you may have luck looking through our example Jupyter notebooks, which include examples of chatbots, here: examples/learn at master · pinecone-io/examples · GitHub

We’ve also got complete tutorials you could follow to build your chatbot with Pinecone and LangChain:

I’d recommend going through the notebooks and then this tutorial end to end so you can see it working and digest each step more easily. Once that’s done, you could modify your LangChain tutorial code to use your own data.

That may be an easier path to success.

Hope this helps!

Best,
Zack