AI Glossary · Last reviewed May 2026

RAG

· Retrieval-Augmented Generation

Hand-written by a real person. Reviewed against current practice in May 2026.

Definition

A method that fetches relevant documents at query time and feeds them to the model, so answers are grounded in real sources.

Full write-up coming soon

We are working on a detailed page for RAG - covering why it matters, how it works, related terms, and the tools that use it.

Related terms

From the glossary

Frequently asked questions

What is the difference between RAG and fine-tuning?+

RAG retrieves fresh external data at query time, so it stays current without retraining. Fine-tuning bakes knowledge into the model weights permanently but goes stale as the world changes.

Does RAG work with any LLM?+

Yes. RAG is an architectural pattern, not a model-specific feature. You can pair it with GPT-4, Claude, Llama, or any model that accepts retrieved context in the prompt.

Is RAG expensive to run?+

The main costs are the vector database and the extra tokens from retrieved chunks added to each prompt. For most production use-cases this is far cheaper than fine-tuning.

Explore other terms

From the glossary

AI Agents

A program that takes goals and figures out the steps to reac...

API

The way one piece of software talks to another.

Chain of Thought

A prompting technique where the model reasons out loud, step...

Context Window

How much text a model can read at once.

Embeddings

Numeric fingerprints of text or images that let computers me...

Few-shot Learning

Showing a model two to five examples in the prompt so it fol...

View all 22 terms