Sources that are easily integrated with Pinecone?

Are any sources (Wikipedia for example) easily integrated into Pinecone for ODQA (open domain question answering)?

1 Like

It depends on the source, for Wikipedia specifically it’s a case of performing some web scraping or using existing datasets like SQuAD or SQuAD v2, encoding the text with something like sentence transformer models and then storing those vectors in Pinecone. There are no utilities from Pinecone to handle obtaining data and encoding it, but there are many of utilities out there that make this process simpler.