RAG (Retrieval-Augmented Generation) is a method in which, before answering, a language model retrieves the most relevant passages from the organization’s own document set (contracts, procedures, technical docs, archives) and generates its answer grounded in those sources. The model answers from the organization’s real knowledge rather than from memory alone — and it can cite its sources.
How it works
- Documents are split into chunks and indexed semantically (a vector database).
- When a user asks a question, the most relevant chunks are retrieved.
- Those chunks are given to the model as context.
- The model produces an answer grounded in that context and indicates which document it came from.
Why it matters
- Freshness: new documents can be added without retraining the model.
- Verifiability: because the answer is tied to a source, it can be checked; hallucination is managed by architecture.
- Privacy: in an on-premise deployment, documents never leave the organization’s boundary.
RAG is the layer that combines a large language model with the organization’s knowledge base.