I am working with Pinecone where I have already made the embeddings for my information. Now, I have this code to help my chatbot refine the questions and provide the answers. Initially, I encountered a problem with not being able to make the embeddings due to a mismatch between the dimensions of Pinecone (1536) and the model I was using. Currently, I use text-embedding-ada-002 which, according to my research, matches the dimensions, as it also supports 1536. Now when I run my chatbot, I encounter this new error:
“Vector dimension 8 does not match the dimension of the index 1536”
I don’t understand where it comes from and I’m not able to resolve it. Any ideas?
from sentence_transformers import SentenceTransformer
from transformers import GPT2TokenizerFast
import pinecone
import openai
import streamlit as st
openai.api_key = "API_KEY"
model = GPT2TokenizerFast.from_pretrained('Xenova/text-embedding-ada-002')
pinecone.init(api_key='API_KEY', environment='gcp-starter')
index = pinecone.Index('test')
def find_match(input):
input_em = list(model.encode(input))
result = index.query(input_em, top_k=2, includeMetadata=True)
return result['matches'][0]['metadata']['text']+"\n"+result['matches'][1]['metadata']['text']
def query_refiner(conversation, query):
response = openai.completions.create(
model="gpt-3.5-turbo-instruct",
prompt=f"Given the following user query and conversation log, formulate a question that would be the most relevant to provide the user with an answer from a knowledge base.\n\nCONVERSATION LOG: \n{conversation}\n\nQuery: {query}\n\nRefined Query:",
temperature=0.7,
max_tokens=256,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
return response.choices[0].text
def get_conversation_string():
conversation_string = ""
for i in range(len(st.session_state['responses'])-1):
conversation_string += "Human: "+st.session_state['requests'][i] + "\n"
conversation_string += "Bot: "+ st.session_state['responses'][i+1] + "\n"
return conversation_string