I'm trying to iterate through around a dozen or so prompts using langchain's similarity_search_with_score function which works with Pinecone but getting the same documents and cosine scores?

abemdxb · October 31, 2023, 4:57pm

Hi all, I’m just going through the Pinecone and Arize RAG tutorial on a colab notebook and making some minor modifications- seem to be hitting an issue - when iterating through a dataframe of prompts and getting the top 2 relevant documents for each, I seem to be getting the same “documents” and prompt embeddings across different promps.

Wondering if this is a langchain or pinecone issue or just how the notebook is set up?? I’m on the free tier for both.

I tried replicating the code in two separate cells with two separate prompts and got the same documents and cosine values; but when I waited a few minutes between executing cells it seems to work.

tim · October 31, 2023, 5:07pm

If you are on the gcp-starter tier it can take “up to 10 seconds” for new documents to index before showing up in a query. Realistically, it’s more like 45-60 seconds.

If this data has been upserted for a while and you are getting the same result it sounds like a class instantiation is not being re-created and its reusing the same params every request.

Getting the same documents would be possible, but the same cosine similarity is nearly impossible unless the queried vectors are 100% the same - indicating an issue with your code, possibly

abemdxb · October 31, 2023, 5:15pm

Thanks- that was helpful and I have a couple of experiments to chase up.

abemdxb · November 2, 2023, 6:22pm

Solved it.

It was because i wasnt deleting and recreating the index at the beginning of the notebook. I was thus reloading the same chunks everytime i ran the notebook!

system · November 16, 2023, 6:22pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.