When should RAG be used vs fine-tuning?

(This question was submitted in a recent workshop .)

Abhishek G. asked: When should RAG be used vs fine-tuning? There still isn’t a good framework around this. A few examples would be helpful.

Tarek A. asked: Do you think using a fine-tuned LLM for RAG makes sense, and what use-cases you think work best for a fine-tuned LLM rather than a regular LLLM (GPT, Anthropic, etc) would be useful?

1 Like

FWIW: most people think they want a fine-tune, but they actually only need RAG.

Fine-tuning: Do not need perfect recall. Want a model to act, behave or have “knowledge” about some concept in its general knowledge. Do not need citations and this information is non-dynamic.

Pro: The knowledge can be “inherent” to the model. Costs less tokens per query.
Con: 99% of people dont have enough data to fine-tune. The underlying data is too dynamic to tune on. Can be expensive if you want to do it right.

RAG: Can update documents on the fly, data is flexible and dynamic.

Pro: Can cite the exact text. Cheaper to implement. Highly flexible.
Con: More token cost per query.

Ideal: You have a fine-tuned model that is trained on data that is conversational and geared towards the most-popular document retrieval/answers & your specific use case. You use RAG on top of a fine tune to now have inherent ability augmented by dynamic data.

In summary: it depends on your use case :joy: