What is RAG?
Permalink to “What is RAG?”RAG, or Retrieval-augmented generation, is the technique agents use to fetch relevant information into their context window before executing a task. When an agent receives a request, a retriever searches a knowledge index, returns the most relevant passages, and passes them into the prompt alongside the user’s query. The model then grounds its answer in both its training and that freshly retrieved context.
RAG exists because language models have two hard limits. Their training data has a cutoff date, and their context windows can only hold a finite amount of information.
RAG solves the first by allowing agents to pull in current, private, or company-specific data at runtime. It solves the second limitation by selecting the right chunks of knowledge, small enough to fit an agent’s context window.
A standard RAG pipeline has five stages:
- Document ingestion: Source documents are collected from internal systems, external APIs, or connected knowledge stores.
- Chunking: Documents are split into smaller overlapping passages (chunks) that are easy to search and retrieve.
- Embedding: Each chunk is converted into a vector representation, so the system can match by meaning rather than keywords.
- Retrieval: At query time, the retriever returns the chunks most semantically relevant to the agent’s request.
- Generation: The retrieved chunks are passed to the model, which grounds its response in them.
RAG has become the default method for giving agents access to information beyond their training data. It powers customer support agents, internal research assistants, document Q&A tools, and most domain-specific agent workflows in production today.
Its scope, though, is narrow by design. RAG is a runtime retrieval pipeline. It operates on whatever index it has been pointed at. It does not decide what belongs in that index, whether the underlying documents are current, or whether two systems define the same metric differently. The quality of its output is capped by the quality of the source it retrieves from, and RAG itself does nothing to fix that upstream.
What is context engineering?
Permalink to “What is context engineering?”Context engineering is the discipline of deciding what information an agent sees, where that information comes from, and how it stays accurate over time. Anthropic defines it as “the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference.”
Context engineering operates on two layers:
Application layer: How an agent manages the limited information inside its context window while executing a task. Context windows are finite, so agents and their developers have to decide, turn by turn, which information to pull in, which to hold onto, which to summarize, and which to offload. The application layer uses four context engineering strategies:
- Write: Saving what the agent has learned or decided so far (notes, plans, intermediate findings) outside the context window, so it can pull them back in later.
- Select: Pulling the specific documents, tool outputs, or past notes the agent needs for executing the task at hand.
- Compress: Summarizing earlier conversations, reasoning traces, or tool outputs so they take up fewer tokens without losing the essentials.
- Isolate: Splitting a big task across multiple sub-agents, each with its own smaller context window focused on one piece of the job.
Of the four, RAG is part of the Select strategy. It’s one of the dominant methods agents use to pull in the specific information they need to execute a task.
Infrastructure layer: The organizational work of building and governing the knowledge agents actually draw from. This covers modeling business entities, defining metrics, tracking lineage, enforcing access policies, and keeping the whole thing up to date.
Both layers are relevant, but they answer different questions. The application layer asks, “Given a knowledge source, how do I get the right piece of it into the agent’s context window?” The infrastructure layer asks, “Is that knowledge source worth drawing from in the first place?”
How does RAG fit inside context engineering?
Permalink to “How does RAG fit inside context engineering?”RAG is one technique within the broader context engineering discipline. It lives within the Select strategy, one of four strategies instrumental in determining what goes into an agent’s context window.
But the quality of a RAG pipeline is capped by the quality of the source it points to. Take a simple query: “Identify customers who have churned in the last quarter.” The agent searches for “churn” across connected systems and pulls back multiple chunks, each using the word differently:
- Finance tracks revenue churn.
- Customer success tracks logo churn.
- Product tracks account deactivations.
In this case, a RAG pipeline has no way to know which “churn” the query actually meant. It retrieves whichever chunk looks most semantically similar, passes it to the model, and produces a confidently wrong answer. Inference is fast. Retrieval is technically correct. But the answer is contextually wrong, because “churn” was never defined in a way the agent could trust.
This is where context engineering plays a crucial role. It performs the upstream tasks that RAG does not: determining the canonical definition of “churn,” assigning ownership, tracking provenance, and ensuring the definition stays current. Once this work is established, RAG can access information that the business can reliably trust.
The comparison below shows how the two relate across the dimensions that matter in production.
| Dimension | RAG | Context engineering |
|---|---|---|
| Layer | Application (runs at inference) | Infrastructure (runs before inference) |
| Scope | Retrieval pipeline for a given agent | Knowledge that every agent shares |
| Primary question | “How do I retrieve relevant context?” | “How do I build context worth retrieving?” |
| Failure mode | Poor retrieval accuracy | Perfect retrieval from an ungoverned source |
| Fix | Tune chunking, re-ranking, and embeddings | Govern definitions, lineage, freshness, and ownership |
| Temporal pattern | Runs on request | Runs continuously |
| Output | Relevant chunks injected into a prompt | Trustworthy, machine-readable knowledge layer ready for inference |
Enterprises have already started reshaping RAG to overcome its limitations. GraphRAG goes beyond semantic similarity by retrieving information from a knowledge graph of entities and their relationships. But GraphRAG still has to query something, and increasingly that “something” is a context graph: a governed layer consisting of definitions, lineage, policies, and decision history that agents can reason over.
Why does treating context engineering as “better RAG” fail at enterprise scale?
Permalink to “Why does treating context engineering as “better RAG” fail at enterprise scale?”Teams that frame their AI roadmap as “better RAG” invest in tuning their retrieval pipeline — adjusting chunk sizes, optimizing ranking algorithms, improving hybrid search. But none of that changes the truth: the lack of contextual retrieval of information.
Here are the most common reasons a RAG pipeline, however well-tuned, fails at enterprise scale:
- Stale definitions served confidently: When a company’s business glossary is stale, the RAG pipeline pulls in outdated information, leading the agent to make incorrect assumptions and reach the wrong conclusions.
- Conflicting definitions across systems: When your Snowflake and Tableau instances compute ARR differently and both are being indexed, the RAG pipeline fetches the ARR definition based on ranking and not business context. There’s a 50% chance the agent executes using the wrong definition.
- Missing governance signals: Information fetched by a RAG pipeline doesn’t include metadata such as access controls, retention rules, governance policies, and sensitivity tags. This leads the agent to execute tasks without adhering to governance rules.
- Broken lineage: When lineage isn’t captured, the agent can’t tell whether a definition the RAG pipeline fetched is correct. Every answer inherits that uncertainty.
- No decision context: The reasoning behind past decisions isn’t in the corpus the RAG pipeline retrieves from. Every new agent builds its own decision logic from scratch, producing inconsistent outputs for the same scenario.
MIT’s 2025 study found 95% of enterprise GenAI pilots deliver little to no measurable impact on P&L. LangChain’s State of AI Agents 2025 report found 32% of teams cite quality as the top barrier to deployment, ahead of cost or latency. The fix starts with a reliable context layer.
What does reliable context engineering look like in practice?
Permalink to “What does reliable context engineering look like in practice?”Reliable context engineering starts with the knowledge layer, not the pipeline. Once that foundation is in place, RAG works. Without it, no retrieval tuning will rescue you.
Five things have to be true of the context layer before retrieval strategies become dependable:
- Canonical definitions: One authoritative answer to “what is ARR?” One owner. One last review date. When agents query “customer,” they get one definition, not three.
- Lineage and provenance: Every definition is traceable to its source. Every retrieval is auditable. When an answer is wrong, you can find where the context came from and fix it at the source.
- Freshness signals: Continuous monitoring of context drift, including schema changes, stale glossary entries, and broken lineage. Not quarterly audits. Drift is detected before the agent serves an incorrect answer.
- Access and policy embedding: Governance rules travel with the context, not beside it. An agent querying a customer record knows what it’s allowed to do with that record without a separate permissions check.
- Feedback loops: Agent interactions flow back into the context layer as institutional learning. Corrections get reviewed and certified before they are committed. Every agent that runs makes the next one smarter.
How does Atlan’s context layer make RAG reliable?
Permalink to “How does Atlan’s context layer make RAG reliable?”Most enterprises don’t just need a better retrieval pipeline. They also need a governed context layer beneath it. That’s the gap Atlan fills.
Atlan operates as the enterprise context layer: the metadata lakehouse that turns scattered metadata (data definitions, business logic, lineage, governance rules) into a machine-readable knowledge repository that any agent can read during runtime. Components that complement the RAG pipeline:
- Enterprise Data Graph: Unified map of every data asset with lineage, SQL usage, and quality signals. Structured so retrieval returns consistent results across systems.
- Active Ontology: A living model of business entities and their relationships, so “customer” means one thing across the organization.
- Enterprise Memory: Memory accumulated from agent interactions, including corrections, evaluations, and decision traces, that compound over time.
- Context Repos: Portable, versioned, policy-embedded units of context any agent can consume via MCP or API.
Real stories: Context engineering strategies in production
Permalink to “Real stories: Context engineering strategies in production”"All of the work that we did to get to a shared language amongst people at Workday can be leveraged by AI via Atlan's MCP server. We are co-building the semantic layers that AI needs with new constructs like context products."
Joe DosSantos
VP Enterprise Data & Analytics, Workday
"We have moved from privacy by design to data by design to now context by design. Atlan's metadata lakehouse is configurable across all tools and flexible enough to get us to a future state where AI agents can access lineage context through the Model Context Protocol."
Andrew Reiskind
Chief Data Officer, Mastercard
Wrapping up
Permalink to “Wrapping up”RAG is a real, useful technique. It’s also a narrow one: a retrieval pipeline that runs at inference time and returns exactly what you ask it to from whatever index you’ve built. That’s the ceiling.
Context engineering is the discipline of raising that ceiling. It operates a layer below the pipeline, on the governed knowledge layer that RAG (and every other retrieval method) draws from. Teams that treat context engineering as “better RAG” will keep tuning chunk sizes while their agents cite stale definitions. Teams that treat it as infrastructure will invest in the knowledge layer first, then find that retrieval starts working.
Before the next quarter of retrieval tuning, audit what your pipeline is retrieving from. That’s the real work of context engineering.
FAQs on RAG and context engineering
Permalink to “FAQs on RAG and context engineering”1. Is RAG a type of context engineering?
Permalink to “1. Is RAG a type of context engineering?”Yes. RAG is a technique within the broader context engineering discipline, specifically the dominant implementation of the Select strategy, which involves pulling relevant information into the context window at inference time. Context engineering includes RAG, along with other application-layer strategies such as Write, Compress, and Isolate. It also includes the infrastructure work of building and governing the knowledge layer that those strategies draw from, which RAG itself does not address.
2. Can I build a good RAG pipeline without a context layer?
Permalink to “2. Can I build a good RAG pipeline without a context layer?”You can build a technically sound RAG pipeline without a context layer, but the output will hit a quality ceiling set by whatever you’re retrieving from. If your business glossary is stale or your metric definitions conflict across systems, even a perfectly tuned RAG pipeline will serve confident wrong answers. The retrieval layer is only as reliable as its underlying context, and context engineering is the discipline that makes that context reliable.
3. Does context engineering replace RAG?
Permalink to “3. Does context engineering replace RAG?”No. Context engineering and RAG operate at different layers and solve different problems. RAG is the inference-time retrieval technique that brings information into the context window. Context engineering is the infrastructure discipline that ensures what RAG retrieves is accurate, governed, and up to date. Enterprises need both, but the infrastructure must come first for retrieval to produce reliable results.
4. What’s the difference between RAG and context engineering in one line?
Permalink to “4. What’s the difference between RAG and context engineering in one line?”RAG is what you do at inference time. Context engineering is what you build before inference ever runs. RAG is a pipeline pattern that runs per request, while context engineering is a continuous, organization-wide discipline of building and maintaining the knowledge layer that all retrieval methods draw from.
5. Why do enterprise RAG systems often produce wrong answers even when retrieval works?
Permalink to “5. Why do enterprise RAG systems often produce wrong answers even when retrieval works?”Because retrieval accuracy and context accuracy are different problems. A RAG pipeline can return exactly what you asked for from the exact right document, but if the document itself is outdated, conflicts with another source, or lacks the governance context needed to interpret it correctly, the agent still produces an incorrect answer. It sounds authoritative because the retrieval was precise, which is what makes the failure especially dangerous at enterprise scale. The fix is to build a better context, not better retrieval.
Share this article