Hello everybody,
before I start to describe my issue, let me describe my starting point and my coding project. I started coding some weeks ago and this is my first bigger project. The aim of it is to create a chatbot with my own data (txt files) that can be used on a website for support reasons.
So before starting with the actual issue, let me share my recent code with you:
import os
import openai
from openai import OpenAI
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter
from pinecone import Index
API-Keys setzen
os.environ[“OPENAI_API_KEY”] = “secret”
os.environ[“PINECONE_API_KEY”] = “secret”
os.environ[“PINECONE_ENVIRONMENT”] = “us-east-1”
OpenAI Client initialisieren
client = OpenAI(api_key=os.environ[“OPENAI_API_KEY”])
Pinecone Index initialisieren
index = Index(
index_name=“secret”,
host=“secret”,
api_key=os.getenv(“PINECONE_API_KEY”),
environment=os.getenv(“PINECONE_ENVIRONMENT”)
)
Dokumente laden und aufteilen
loader = TextLoader(“C:/Users/secret/OneDrive/secret.txt”)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
Funktionen definieren
def embed_and_index_documents():
“”“Dokumente in Embeddings umwandeln und in Pinecone indexieren.”“”
doc_id = 1 # Starten mit einer ID von 1
for doc in docs:
# Text aus dem Attribut page_content
verwenden
response = client.embeddings.create(
model=“text-embedding-ada-002”,
input=[doc.page_content] # Liste von Strings für die Batch-Verarbeitung
)
embedding = response.data[0].embedding
# Vektoren und Metadaten korrekt an Pinecone senden
index.upsert(vectors=[(str(doc_id), embedding, {“text”: doc.page_content})])
doc_id += 1 # Erhöhen der doc_id für das nächste Dokument
def ask_question(question):
“”“Eine Frage beantworten, indem zuerst der Pinecone-Index abgefragt wird und dann eine Antwort generiert wird.”“”
response = client.embeddings.create(
model=“text-embedding-ada-002”,
input=[question]
)
question_embedding = response.data[0].embedding
results = index.query(positive_vectors=[question_embedding], top_k=3)
document_ids = [result.id for result in results[“matches”]]
documents = [index.fetch(ids=[doc_id]).get(doc_id).metadata[“text”] for doc_id in document_ids]
# Das Foundation Model nutzen, um eine Antwort zu generieren
context = " ".join(documents)
response = openai.Completion.create(
model="text-davinci-002",
prompt=f"Answer the question based on the text: {context}\n\nQuestion: {question}",
max_tokens=150
)
return response.choices[0].text
Dokumente indexieren
embed_and_index_documents()
Frage stellen und Antwort erhalten
response = ask_question(“Why are dolphins so funny?”)
print(“Antwort:”, response)
So, whenever I try to execute the code, the following error message appears:
“C:\Users\secret\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pinecone\core\client\rest.py”, line 261, in request raise PineconeApiException(http_resp=r) pinecone.core.client.exceptions.PineconeApiException: (400) Reason: Bad Request HTTP response headers: HTTPHeaderDict({‘Date’: ‘Tue, 07 May 2024 21:14:31 GMT’, ‘Content-Type’: ‘application/json’, ‘Content-Length’: ‘90’, ‘Connection’: ‘keep-alive’, ‘x-pinecone-request-latency-ms’: ‘44’, ‘x-pinecone-request-id’: ‘7683065097316603496’, ‘x-envoy-upstream-service-time’: ‘32’, ‘server’: ‘envoy’}) HTTP response body: {“code”:3,“message”:“Cannot provide both ‘ID’ and ‘vector’ at the same time”,“details”:}
As I already mentioned, I am quite new to coding, so your help would be very much appreciated. If you recognize any other weakpoints in my code that could cause any issues, please let me know. I am also looking for people who are familiar with creating chatbots, so if you would like to connect and maybe work on the project together, don’t hesitate to reach out to me.
Thank you very much in advance for your help! I wish y’all a nice day!