RAG
A method that fetches relevant documents at query time and feeds them to the model, so answers are grounded in real sources.
We are working on a detailed page for RAG - covering why it matters, how it works, related terms, and the tools that use it.
Related terms
From the glossaryFrequently asked questions
What is the difference between RAG and fine-tuning?+
RAG retrieves fresh external data at query time, so it stays current without retraining. Fine-tuning bakes knowledge into the model weights permanently but goes stale as the world changes.
Does RAG work with any LLM?+
Yes. RAG is an architectural pattern, not a model-specific feature. You can pair it with GPT-4, Claude, Llama, or any model that accepts retrieved context in the prompt.
Is RAG expensive to run?+
The main costs are the vector database and the extra tokens from retrieved chunks added to each prompt. For most production use-cases this is far cheaper than fine-tuning.