Firstly, thank you so much in advance for your support!
I am building AI chatbots using flowise. I have set up a free index as a test on Pinecone. I scrapped data from a site using cheerio web scraper and put it into a Pinecone upsert document. This stage worked as the vector count is 2423.
I then attempted to load this index, but the new chatbot keeps returning with “I don’t know” (my default answer when it cannot find the information).
There is no error code, so it linked up correctly.
Do you know why this might be? Is it because I am only on a free plan?
It’s hard to guess what could be going wrong without knowing more, but your best bet is to try to break down the app and verify what is happening at each step. In particular since you suspect the data is not able to be found in Pinecone, I would suggest starting there.
Verify the data you think you’re indexing is really being indexed. Debug and step through the code, or add print statements, etc. to see what is being embedded and inserted to Pinecone.
At query time, isolate what is happening specifically with the query. What is the query string? Is the query being embedded with the same model? Do search results come back for the query? Get as close to the actual Pinecone API/SDK as you can and inspect it there. I’ve seen cases where wrapping frameworks like Langchain applies its own additional filtering on the query results, so what can seem like a Vector DB/Semantic search problem is actually a framework problem.
If you’re getting search results, then look to what’s going into your LLM prompt. Is it formatted correctly? Does it contain the search results and all the relevant instructions?
Have you spot-checked your metadata and that it has an attached text key in the metadata? When using tools like Flowise + Langchain/LLamaIndex most dont realize all they do is vectorize the chunk and then attach the full human-readable text as a text key on the metadata.
Then during retrieval, you fetch similar documents by vector, but the part being passed into the LLM is the text key - not the vector. By default, Flowise does not attach any metadata unless configured in the flow.
So you likely are getting similarity matches, but those matches have no text key and therefore your LLM is just as clueless as before about your question.