What context engineering is — and what it replaced
Permalink to “What context engineering is — and what it replaced”In January 2025, Andrej Karpathy described context engineering as “the art of filling the context window with exactly the right information at the right time.” The framing was deliberate: context engineering is the evolution beyond prompt engineering — not just what you tell the model, but everything you put in front of it and how you manage it.
Prompt engineering is a craft: you optimize the instruction, the framing, the examples, the persona. Context engineering is infrastructure: you design and operate the systems that determine what the model sees before it generates anything.
| Prompt engineering | Context engineering | |
|---|---|---|
| Focus | What you tell the model | Everything the model sees |
| Scope | System prompt and user message | Retrieval, structure, freshness, governance, delivery |
| Bottleneck | Phrasing and instruction quality | Data quality and governance |
| Skill type | Craft — language and psychology | Infrastructure — data engineering |
| Scale | Per-prompt optimization | Per-system architecture |
| Who does it | Prompt engineers, product teams | Data engineers, data architects |
The shift matters because the failure mode changed. Prompt engineering fails when the model doesn’t understand the task. Context engineering fails when the data the model sees is wrong, stale, or ungoverned. For enterprise AI agents working with real data, the second failure mode is far more common — and far harder to debug.
| Era | Approach | Where it fails |
|---|---|---|
| Prompt engineering era | Optimize the instruction | Model doesn’t understand the task |
| Context engineering era | Optimize what the model sees | Data is wrong, stale, or ungoverned |
The three layers of context engineering
Permalink to “The three layers of context engineering”Context engineering for agents has three functional layers. Most teams build the third layer — retrieval and structuring — without first establishing the first two. This is why enterprise AI works in demos and breaks in production.
Governance is the horizontal layer that determines whether retrieval, structure, and freshness work reliably.
Layer 1: Retrieval
Permalink to “Layer 1: Retrieval”Retrieval is about getting the right information from the right source. The key questions: what context is relevant to this agent’s task, where does that context live, and what retrieval strategy matches the context type?
Not all context retrieves well with the same strategy:
- Catalog metadata (asset descriptions, certification status, ownership) → structured query — filter by domain, asset type, certification status
- Documentation and policy (runbooks, governance policies, business context) → semantic search — vector embeddings and similarity ranking
- Lineage (upstream sources, downstream dependents, transformation paths) → graph traversal — hop-by-hop from asset to related nodes
- Business glossary (domain-specific term definitions) → exact match with semantic fallback
The retrieval layer fails when teams apply one strategy to all context types — typically semantic search everywhere, because vector stores are well-documented and familiar. A vector store of lineage nodes doesn’t enable graph traversal; a semantic search over structured metadata returns conceptually similar but not structurally relevant results.
Layer 2: Structure
Permalink to “Layer 2: Structure”Retrieved context is raw material. The structure layer organizes it into something an agent can reason over efficiently.
Structuring involves: ranking retrieved items by relevance and governance signal (certified and current items ranked higher than provisional or stale ones), deduplicating overlapping context from multiple sources, compressing large documents to their most relevant sections, and formatting each context type in the schema the agent expects — JSON for catalog metadata, natural language for documentation, annotated tables for data assets.
The structure layer also carries governance signals: certification status, staleness flags, confidence scores, and access-control metadata flow through the structure layer into the context window, where the agent can use them to modulate its confidence.
Layer 3: Freshness
Permalink to “Layer 3: Freshness”Freshness is the layer teams most consistently fail to build — and the one that determines whether the context system works after month one.
Freshness management requires: TTL policies (each context type has a validity window after which it must be refreshed), change detection (subscribe to catalog and data source events so refreshes trigger on change, not just on schedule), staleness alerts (tell the agent when context it’s using is past its TTL), and re-indexing triggers (pipeline completions, schema changes, certification updates should automatically kick off context refresh).
Without freshness management, context that was accurate at launch drifts silently. Agents don’t know their context is stale — they operate on it with the same confidence as on current data. By month six of a production deployment, teams typically find that the performance degradation they’re seeing is not a model problem — it’s a staleness problem.
Inside Atlan AI Labs & The 5x Accuracy Factor
Learn how context engineering drove 5x AI accuracy in real customer systems. Explore real experiments, quantifiable results, and a repeatable playbook for closing the gap between AI demos and production-ready systems.
Download E-BookWhy governance is the fourth — and most critical — layer
Permalink to “Why governance is the fourth — and most critical — layer”Retrieval, structure, and freshness all fail without governance. Governance is the layer that answers the questions the other three assume away: is this context trustworthy? Who certified it? Who is allowed to see it? When does it expire?
Without governance, an agent with excellent retrieval, well-structured context, and fresh data can still:
- Retrieve context from an uncertified asset and present it as authoritative
- Surface context to a user who isn’t authorized to see it
- Use lineage from a source where the certification has since been revoked
- Have no way to indicate to downstream consumers that this answer is based on provisional, not verified, data
Governance adds four properties to the context pipeline:
Certification: Not all context is equally trustworthy. Certified assets have been validated by a named owner. Provisional assets haven’t. Agents should know the difference — and the context they receive should carry that signal explicitly.
RBAC: Context access should mirror data access. An agent serving a Finance analyst should see the same catalog metadata that analyst could see manually. When RBAC isn’t applied to context access, agents surface sensitive data to unauthorized requestors — a governance failure at the speed of AI.
Audit trails: In regulated environments, knowing what context was used to generate an output is a compliance requirement. Every agent decision should be traceable back to the specific context it used — which assets, which lineage nodes, which glossary terms, with what certification status at the time of the query.
Confidence signaling: When context quality is low — provisional source, past TTL, low quality score — the agent should know. Governance injects those signals into the context window so the agent can modulate confidence rather than output false certainty.
The governance failure pattern plays out predictably: a team builds excellent retrieval and structure, skips governance, deploys, and months later finds that agents are giving confident wrong answers from uncertified sources. The debugging is hard because the failure is upstream — in the data, not the model.
Context engineering for data agents specifically
Permalink to “Context engineering for data agents specifically”Data agents have context requirements that are fundamentally different from document-based AI assistants. Where a customer support agent might need product documentation and ticket history, a data agent needs: asset metadata (schema, description, certification status, SLA), lineage (upstream sources, downstream dependents, transformation paths), business glossary (domain-specific term definitions), quality scores (confidence signals on trustworthiness), and ownership records (who to contact, who is accountable).
This context is not static documentation — it changes continuously as the data estate evolves. A table’s certification can be revoked. An asset’s owner can change. A lineage connection can be added by a new pipeline. A quality score can drop after a bad data load.
This is why context engineering for data agents is a data governance problem as much as a technical one. The enterprise context layer is the infrastructure answer — it governs what context exists, keeps it current, and determines who can access it.
The differences from document RAG:
| Dimension | Document RAG | Data agent context |
|---|---|---|
| Context type | Static documents | Live, structured metadata |
| Change frequency | Low — documents change slowly | High — catalog changes continuously |
| Governance | Optional | Required for compliance |
| Relationships | Mostly independent | Rich cross-asset relationships (lineage, glossary) |
| Retrieval strategy | Semantic search | Semantic + structured + graph |
How to build context engineering infrastructure for enterprise agents
Permalink to “How to build context engineering infrastructure for enterprise agents”The build sequence matters more than the components. Teams that start with vector stores and add governance later spend months retrofitting — and the governance layer never quite fits because the retrieval and structure layers weren’t designed with it in mind.
The right sequence:
1. Inventory context sources. What data is relevant to the agent’s domain? What’s the quality of each source? Which assets are certified? This step is not a technical step — it’s a data governance assessment. Output: a context source inventory with quality scores and staleness risk ratings.
2. Govern the sources. Certify the assets you want in context. Assign ownership. Apply RBAC. Define staleness policies. This step happens in your data catalog — Atlan, DataHub, or equivalent. Output: governed context sources with access controls and certification status.
3. Build retrieval infrastructure. Choose the right retrieval strategy for each context type. Implement vector store for semantic retrieval, structured API for metadata, graph traversal for lineage. Include governance metadata in your retrieval schema. Output: queryable, structured context store.
4. Assemble the context pipeline. Build the logic that retrieves, ranks, compresses, and injects governance signals into context for each agent request. Test with real queries — are the top-5 retrieved items what you’d want the agent to see? Output: context assembly pipeline.
5. Deliver via MCP (or API). Expose the context pipeline as agent-callable tools. MCP is the emerging standard; it decouples the agent from the delivery mechanism and makes the pipeline composable. Output: MCP server or equivalent delivery interface.
6. Build the maintenance layer. Subscribe to change events. Implement TTL-based refresh. Add staleness monitoring and alerting. Build feedback loops from agent errors back to context quality issues. Output: living context layer that stays accurate as the data estate evolves.
Common context engineering failures
Permalink to “Common context engineering failures”The six failure patterns that account for most enterprise context engineering problems:
The demo problem: RAG on polished documentation works. Production data is messy, stale, and ungoverned. The gap between demo performance and production performance is almost always a context quality problem, not a model problem.
Context contamination: Multiple domains in one context store means an agent querying Finance context retrieves HR definitions alongside Finance ones. Precision degrades. Bounded context spaces — domain-isolated context environments — are the architectural solution.
The staleness cliff: Context is accurate at launch, deteriorates by month three as data estates evolve without triggering context refresh, and by month six the team blames the model for problems caused by stale data.
Confidence overflow: Agents don’t know which context comes from a certified source and which doesn’t. They present all retrieved context with the same authority. Users lose trust when the confident wrong answer eventually surfaces.
RBAC gap: Agent serves a sensitive asset to a user who wasn’t authorized to see it. The data governance team had the right policies in the catalog; they were never extended to the context pipeline.
Missing lineage in context: Agents can answer “what is this table?” but not “where does this table’s data come from?” or “what breaks if this column changes?” — because lineage was never included in the context pipeline.
Build Your AI Context Stack
Get the blueprint for implementing context graphs across your enterprise. This guide walks through the four-layer architecture — from metadata foundation to agent orchestration — with practical implementation steps for 2026.
Get the Stack GuideHow Atlan powers context engineering for data agents
Permalink to “How Atlan powers context engineering for data agents”Atlan’s governed catalog is the data layer that context engineering draws from. Certified assets, business glossary terms, active lineage, quality scores, and ownership records are the context ingredients. The MCP server is the delivery mechanism — structured, governed context exposed as agent-callable tools.
Context Studio is Atlan’s tooling for building context products: reusable, governed bundles of context for a specific domain or use case. A Finance context product packages all Finance-relevant metadata, lineage, and glossary terms into a governed context source that any Finance agent can query within the appropriate RBAC boundaries.
Atlan AI Labs research shows that governed context — certified assets, domain-scoped glossary, active lineage — produces significantly better agent accuracy compared to unstructured retrieval from raw data. The difference isn’t in the model. It’s in what the model sees.
The AI context stack guide walks through the four-layer architecture Atlan recommends for enterprise context engineering — from metadata foundation to agent orchestration.
Real stories from real customers: Context engineering at Workday and DigiKey
Permalink to “Real stories from real customers: Context engineering at Workday and DigiKey”"We're excited to build the future of AI governance with Atlan. All of the work that we did to get to a shared language at Workday can be leveraged by AI via Atlan's MCP server…as part of Atlan's AI Labs, we're co-building the semantic layer that AI needs with new constructs, like context products."
— Joe DosSantos, VP of Enterprise Data & Analytics, Workday
"Atlan is much more than a catalog of catalogs. It's more of a context operating system…Atlan enabled us to easily activate metadata for everything from discovery in the marketplace to AI governance to data quality to an MCP server delivering context to AI models."
— Sridher Arumugham, Chief Data & Analytics Officer, DigiKey
Context engineering is infrastructure — and infrastructure requires governance
Permalink to “Context engineering is infrastructure — and infrastructure requires governance”What Workday and DigiKey describe is the same insight from different angles: the value of AI agents in a data stack depends on the quality and structure of the context they receive. Workday built the shared language first — the semantic layer, the business glossary, the governed vocabulary — and then activated it for AI. DigiKey built the metadata operating system and activated it as MCP context for models. Both arrived at the same place: agents that work because the context layer works.
Context engineering is infrastructure in the same way that the data warehouse was infrastructure — it’s not visible in the output, it doesn’t get credit when things work, but it’s the reason things work. And like any infrastructure, it requires ongoing maintenance, governance, and ownership. The teams that treat it as a one-time setup discover the staleness cliff. The teams that build it as an actively maintained layer discover that their agents stay reliable as the data estate evolves.
The how to implement an enterprise context layer guide is the practical starting point for teams building this infrastructure — from the first catalog assessment through to MCP delivery and freshness management.
FAQs
Permalink to “FAQs”1. What is context engineering?
Context engineering is the discipline of designing, building, and maintaining the context that AI agents receive — covering retrieval (what to fetch), structure (how to organize it), freshness (how to keep it current), and governance (who can access it and whether it’s trustworthy).
2. What is the difference between prompt engineering and context engineering?
Prompt engineering optimizes what you tell the model — the instruction and framing. Context engineering optimizes everything the model sees — the retrieved documents, metadata, examples, tool outputs, and system instructions that fill its context window. Context engineering is an infrastructure discipline; prompt engineering is a craft.
3. What goes into the context window of an AI agent?
Depending on the agent, the context window may contain: system instructions, retrieved documents, catalog metadata, lineage graphs, business glossary terms, quality scores, tool outputs from previous steps, conversation history, and examples. Context engineering determines what goes in, in what order, and how much.
4. How do you keep agent context fresh at enterprise scale?
Freshness requires: change detection (subscribe to catalog and data source change events), TTL policies (each context type has an expiry after which it must be refreshed), staleness alerts (agents are notified when context they’re using is past its TTL), and re-indexing triggers (pipeline completions, schema changes, certification updates).
5. What is a context layer for AI?
A context layer is the governed data infrastructure that AI agents draw from — a combination of a data catalog, business glossary, lineage graph, and access control system that provides structured, trustworthy context. It’s the layer between the agent and the raw data estate.
6. What tools are used for context engineering?
Context engineering tools span multiple layers: data catalogs (Atlan, DataHub) for governed metadata; vector databases (Pinecone, Weaviate) for semantic retrieval; orchestration frameworks (LangChain, LangGraph) for context assembly; and delivery protocols (MCP) for agent-friendly access.
7. How does governance affect context engineering?
Governance determines which context is trustworthy (certification), who can access it (RBAC), where it came from (lineage), and when it needs refreshing (staleness policy). Without governance, retrieval may be accurate but agents can’t distinguish certified context from provisional context.
8. What is a context product?
A context product is a reusable, governed bundle of context for a specific domain or use case — all relevant metadata, lineage, and glossary terms packaged and maintained as a product that multiple agents can consume. It applies data product thinking to context management.
Sources
Permalink to “Sources”- Andrej Karpathy, “Software is Changing (Again)”, Ycombinator / X, Jan 2025
- Workday Context as Culture, Atlan Regovern
- DigiKey Context Readiness, Atlan Regovern
- Atlan AI Labs Ebook — The 5x Accuracy Factor
- Model Context Protocol Specification, Anthropic
- Enterprise Context Layer for AI, Atlan
- AI Context Stack Guide, Atlan
- How to Implement Enterprise Context Layer, Atlan
- Bounded Context Spaces for AI Agents, Atlan
- Simon Willison: Context Engineering, simonwillison.net
Share this article
