How do I get the answer to a question quicker

yishai.rasowsky · May 31, 2023, 10:02am

Why does it take me a full minute to generate an answer using the following code?

def answer_with_pinecone(query, documents, embeddings):

    pinecone.init(api_key=os.environ.get('PINECONE_API_KEY_REVENUED'),

                  environment=os.environ.get('PINECONE_ENVIRONMENT'))

    docsearch = Pinecone.from_texts(

        texts=[t.page_content for t in documents],

        embedding=embeddings, index_name=os.environ.get('index_name'))

    docs = docsearch.similarity_search(query)

    llm = OpenAI(temperature=0, openai_api_key=openai.api_key)

    chain = load_qa_chain(llm, chain_type="map_reduce")

    answer = chain.run(input_documents=docs, question=query).lstrip()

    return answer

I would much rather would take one second especially since this is for a production use case

I am very happy to pay for a premium subscription or a membership to your service if it will mean that I get faster response time. I am working for a client who has money to pay for this kind of thing period my main thing is just that I don’t want it to take a minute or a half a minute to get a answer to a question I wanted to take less than a second preferably.

ZacharyProser · October 17, 2023, 6:17pm

Hi @yishai.rasowsky ,

Thanks for your question, and I’m sorry you’re encountering this issue.

A couple of thoughts:

Try moving the call to pinecone.init outside of your function - that should only need to be done once - and then you can re-use the client for future calls, saving you repeated API calls, which will speed up your answer_with_pinecone function
I see you posted this in May - but we recently improved the speed of upserts within LangChain with Pinecone by 5x: LangChain's Pinecone upsert speed increased by 5X | Pinecone
Same idea with your call to OpenAI - can you move that out of your function and re-use the same llm object?

I hope those suggestions are helpful, and I’d recommend re-trying this with the latest LangChain version to see if any of the performance enhancements we’ve made since then improve your overall latency.

Feel free also to check out our many examples in GitHub - pinecone-io/examples: Jupyter Notebooks to help you get hands-on with Pinecone vector databases, which are all open-source Jupyter Notebooks that you can run in Google Colab for free - we have tons of different scenarios represented here, and some of them do not use LangChain - see if any of those patterns or if using Pinecone directly without LangChain reduces your overall runtime.

I hope that’s helpful!