GPT4 - Pinecone - Streamlit

roegger · April 19, 2023, 5:15pm

Dear Colleagues,

I did read in a number of documents of a hotel, describing it´s services. I created the chunks and used OpenAIEmbeddings. Then I initialized pinecone

# initialize pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV  
)
index_name = "hotel"

docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

my query now looks like:

from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

llm = OpenAI(temperature=0, openai_api_key=os.environ['OPENAI_API_KEY'])
chain = load_qa_chain(llm, chain_type="stuff")

query = "What are the room categories of the hotel?"
docs = docsearch.similarity_search(query, include_metadata=True)

chain.run(input_documents=docs, question=query)

and the results are something like "’ The hotel offers Komfort Doppelzimmer, Turmstudio, and Wohnstudio rooms.’

My questions:

How can I define the model to generate the results? I would like to use GPT-4
Am I right, that I am currently always building the index again and again?
If I want to deploy it and only search in pinecone without building the index again? How would I do that?
Finally I would like to have a simple streamlit app. So what is needed, that I don´t need to run the index again and only do the embedding of the input, search for similar vectors and get better results using GPT-4

Many questions… I know… Thank you!