Automatically chunck large document

Hello,

I hope you are well. I’m building an AI chatbot that I want to search in specific files (e.g. books). I understand that I should do a vector database but first need to segment the book into smaller pieces and transform it into embedding. I was told Pinecone can help me doing it automatically. However I could not find how to do it.

Could someone guide me there please?

Thank you

Hi @david_swiss24, and welcome to the Pinecone community forums!

Thank you for your question.

This is precisely what Pinecone Assistant does!

Pinecone Assistant resources:

  1. Pinecone Assistant launch announcement + features
  2. Pinecone Assistant getting started Jupyter Notebook
  3. Pinecone Assistant Sample App (shown in demo video above - this is the piece that exposes your assistant to the world so others can use it)

Hope this helps, and let me know how you do.

Best,
Zack

Hi Zachary,

Many thanks for this. I will definitely try this later today and keep you posted.
David

@david_swiss24 Terrific - I’m glad it was useful.

Please do let us know your experience when you can, as we’re eager to continue improving this service to make it even better.

Thanks so much,
Zack

Hi Zack,
Following-up on this one: How can I access the data within the assistant, i.e. the segmented version of the document within Pinecone that is in the embedded format (i.e. the numbers)? I’d like to check how it is segmented inside.
Thank you