Hi all, I’m just going through the Pinecone and Arize RAG tutorial on a colab notebook and making some minor modifications- seem to be hitting an issue - when iterating through a dataframe of prompts and getting the top 2 relevant documents for each, I seem to be getting the same “documents” and prompt embeddings across different promps.
Wondering if this is a langchain or pinecone issue or just how the notebook is set up?? I’m on the free tier for both.
I tried replicating the code in two separate cells with two separate prompts and got the same documents and cosine values; but when I waited a few minutes between executing cells it seems to work.
If you are on the gcp-starter tier it can take “up to 10 seconds” for new documents to index before showing up in a query. Realistically, it’s more like 45-60 seconds.
If this data has been upserted for a while and you are getting the same result it sounds like a class instantiation is not being re-created and its reusing the same params every request.
Getting the same documents would be possible, but the same cosine similarity is nearly impossible unless the queried vectors are 100% the same - indicating an issue with your code, possibly
It was because i wasnt deleting and recreating the index at the beginning of the notebook. I was thus reloading the same chunks everytime i ran the notebook!