Embedding tables?

tytung2020 · July 24, 2023, 4:14pm

I have looked at the examples in the LangChain AI handbook pages, but all the examples show query made over normal texts like wikipedia etc.
What if I want to do a semantic search of my query string over many different tables (xlsx, css etc.)? Is the process of embedding the same for string texts and for tables?

ZacharyProser · October 17, 2023, 5:46pm

Hi @tytung2020,

Thanks for your question! At a high level, the process for embedding text is essentially the same regardless of the source of that text:

You clean and prepare your data
You assemble your data into chunks (so that no single fragment is too short or too long)
You pass your text data into an embedding model such as OpenAI’s text-embedding-ada-02
You upsert the vectors that come out of your embedding model into the Pinecone database so you can query them later using query vectors.

I would recommend taking a look at our many example Jupyter Notebooks at GitHub - pinecone-io/examples: Jupyter Notebooks to help you get hands-on with Pinecone vector databases - if you click into the learn subdirectory, we have a guide and YouTube video for how to get started with them, and each Notebook covers a different use case for vectorizing text or other data and upserting it into Pinecone.

You can run the Notebooks for free using Google Colab and the provided Run in Colab buttons on all our Notebooks.

I hope this is helpful!