Chunking Strategies for LLM applications

In the context of building LLM-related applications, chunking is the process of breaking down large pieces of text into smaller segments. It’s an essential technique that helps optimize the relevance of the content we get back from a vector database once we use the LLM to embed content. In this blog post, we’ll explore if and how it helps improve efficiency and accuracy in LLM-related applications.

As we know, any content that we index in Pinecone needs to be embedded first. The main reason for chunking is to ensure we’re embedding a piece of content with as little noise as possible that is still semantically relevant.

This is a companion discussion topic for the original entry at
1 Like

I’ve built a “talk with your books” chatbot using this chunking approach and Pinecone. A fork of GitHub - mayooear/gpt4-pdf-chatbot-langchain: GPT4 & LangChain Chatbot for large PDF docs

A big problem here is that the bot cannot summarize a book or any large document because it only sees 1-2 chunks at a time. It can answer questions as long as the answer is fully contained inside a chunk, but not more than that. Are there any approaches to chunking that will allow a model to be able to “digest” a large document?


Hi, one way is to make smaller chunks and concatenate top-k before feeding them to the LLM. Even this approach is limited by the context length of the LLM. If the document is very large, I suppose currently the only method is to fine-tune the LLM on it.

1 Like

Thanks, both of those approaches sound good!

How do you think your model would perform with summarization instead of chunking?