Handling Full-Document Statistics in RAG Architectures

climax040916 · September 17, 2025, 5:58pm

Hi everyone, I have a question about RAG system design.

I understand that semantic search is typically used to retrieve the top-k relevant chunks. However, in cases where we need to analyze the entire document to compute statistics, retrieving just k chunks isn’t sufficient.

How should a RAG system be designed to handle this scenario?

martha · September 17, 2025, 10:12pm

Hi there! I’m the product manager for our Inference, Assistant and context products. I want to make sure to answer you most accurately – would you mind giving me a better understanding of your use case?

arnab · September 23, 2025, 7:00am

As you mentioned, this is a common scenario where things are smooth in the initial stages, but once you move to production, more complexities arise. The issue you’re encountering is often one of the first challenges you’ll face in such a setup.

To address this, the key approach is to introduce a query classification step. When a query is received, it’s important to classify it appropriately to determine whether it requires full document understanding, or is a pinpoint query, i.e., suitable for RAG. This classification will help you route the query to the right process and ensure that you’re using resources efficiently.

To implement this, you can create a classifier for query classification. You have a couple of options here:

Using an LLM as a classifier, which would offer flexibility and strong performance for complex queries.
Using simpler classifiers such as SVM, BERT or Set-Fit models. These models are well-suited for such tasks and are often simpler to implement while still providing good results. (Set-Fit worked well for us, can easily train it with fewer labelled datapoints)

By incorporating this classifier, you can ensure that each query is processed appropriately based on its nature, leading to better performance in production.