Automatically chunck large document

david_swiss24 · August 20, 2024, 2:55pm

Hello,

I hope you are well. I’m building an AI chatbot that I want to search in specific files (e.g. books). I understand that I should do a vector database but first need to segment the book into smaller pieces and transform it into embedding. I was told Pinecone can help me doing it automatically. However I could not find how to do it.

Could someone guide me there please?

Thank you

ZacharyProser · August 20, 2024, 3:00pm

Hi @david_swiss24, and welcome to the Pinecone community forums!

Thank you for your question.

This is precisely what Pinecone Assistant does!

Pinecone Assistant resources:

Pinecone Assistant launch announcement + features
Pinecone Assistant getting started Jupyter Notebook
Pinecone Assistant Sample App (shown in demo video above - this is the piece that exposes your assistant to the world so others can use it)

Hope this helps, and let me know how you do.

Best,
Zack

david_swiss24 · August 20, 2024, 3:07pm

Hi Zachary,

Many thanks for this. I will definitely try this later today and keep you posted.
David

ZacharyProser · August 21, 2024, 1:35pm

@david_swiss24 Terrific - I’m glad it was useful.

Please do let us know your experience when you can, as we’re eager to continue improving this service to make it even better.

Thanks so much,
Zack

david_swiss24 · October 3, 2024, 6:36am

Hi Zack,
Following-up on this one: How can I access the data within the assistant, i.e. the segmented version of the document within Pinecone that is in the embedded format (i.e. the numbers)? I’d like to check how it is segmented inside.
Thank you