Missing or lack of understanding of beyond search seminar

I’ve been following this codebase the beyond search webinar (examples/integrations/openai/beyond_search_webinar at master · pinecone-io/examples · GitHub).

I’ve successfully upserted embeddings to my index of the vector db, but all the sources point to a dependency of a mapping.json file that I can’t find any reference to.

Line 27 here: examples/app.py at master · pinecone-io/examples · GitHub

Apologies, cause might be a dumb question, but I’m not sure where this file is supposed to originate from.

The mappings.json file is a JSON file containing mappings from record IDs to the plaintext. It was used to store the text locally, then when querying Pinecone we’d retrieve the most relevant IDs, then map them back to the original text using mappings.json — you can also store the text directly in Pinecone as long as it isn’t too long (and doesn’t exceed the metadata limits)

We have a more up-to-date version that will work better (using more up to models and simpler code), links for it are:

Thanks James! I’ll look into this.

@jamesbriggs I was able to get up and running, very well done tutorial, thanks for setting this up.

After getting my data up to pinecone and able to do some searches, I have a few follow up questions.

I’ve scraped data from my company’s website in order to add a more interactive UX rather than them looking for the answers they want. I have about 1k vectors, which is way less than the content of the whole website just for a POC.

In my experience, if I ask a direct question that is clearly defined in the vectors, the response seems to take its own path and is only slightly correct.

  1. Could this be because asking to answer in a “paragraph” format, the model is trying to fill space instead of focusing on accuracy?
  2. I’m not using metadata, so could the model be mixing my vectors with its base data and so the answers are a bit mixed between my website content and non-associated data?

I should note that “conservative Q&A” always returns “I don’t know”, so I’m assuming I have some shortcomings.