Who coined the term "agent engineering" and when?

The practice crystallized as a named discipline in early 2026. Mitchell Hashimoto, creator of Terraform and Ghostty, distilled OpenAI's internal agent infrastructure work into the formula Agent = Model + Harness. The OpenAI publication "Harness engineering: leveraging Codex in an agent-first world" (February 11, 2026) provided the foundational framework that Hashimoto synthesized.

How is agent engineering different from prompt engineering?

Prompt engineering focuses on crafting the instruction for a single model inference. Agent engineering addresses the entire system governing how the agent behaves across thousands of inferences in production, including guides, sensors, context pipelines, and orchestration. Prompt engineering is one input to a harness; agent engineering is the practice of building and maintaining the harness itself.

Why do most AI agent projects fail?

88% of AI agent projects fail to reach production. The five dominant failure modes are: integration complexity with legacy systems, inconsistent output quality at volume, absence of monitoring tooling, unclear organizational ownership, and insufficient domain-specific context. These are harness failures, not model failures. The model produces plausible outputs; the system around it cannot verify or constrain those outputs reliably.

What is the relationship between context engineering and agent engineering?

Context engineering is the foundation of agent engineering. It designs the pipelines that supply the agent with accurate, governed information at runtime: semantic layers, metadata graphs, RAG systems, and ontologies. Agent engineering builds the complete harness on top of that context infrastructure. You can have excellent orchestration and strong sensors, but if the agent is reasoning over poor-quality context, the system will produce unreliable outputs.

What frameworks do agent engineering teams use?

The leading orchestration frameworks as of 2026 are LangGraph (enterprise-preferred, graph-based, strong audit trail support), CrewAI (role-based, fastest to prototype), and Google ADK (tightly integrated with Vertex AI and Gemini). For context exposure, MCP (Model Context Protocol) is the emerging open standard that allows agent frameworks to query governed metadata infrastructure through a consistent interface.

How does a context layer improve agent engineering outcomes?

A governed context layer provides the agent with accurate, versioned, policy-compliant information from every system it needs to operate across. Snowflake's research found that adding an ontology layer improved agent accuracy by 20% and reduced unnecessary tool calls by 39%, compared to a prompt-only baseline. Context layers also enable continuous improvement: each agent interaction produces signals that refine the context graph over time.

What Is Agent Engineering?

Emily Winks

Data Governance Expert

Updated:05/04/2026

Published:05/04/2026

18 min read

Watch Context Agents Live Get Context Layer Ebook

Key takeaways

Agent engineering designs the full system around an AI model: guides, sensors, context pipelines, and orchestration.
The Agent = Model + Harness formula separates production agents from demos — the harness, not the model, is what matters.
88% of AI agent projects fail in production — missing context infrastructure is the primary cause, not model capability.
Context engineering underpins agent engineering: governed, queryable context separates production agents from POC demos.

What is agent engineering?

Agent engineering is the discipline of designing, building, and governing the complete system that surrounds an AI model in production. It covers everything that is not the model itself: the guides that direct agent behavior, the sensors that verify outputs, the context pipelines that supply relevant information at runtime, and the orchestration layer that coordinates agent actions. The core formula is Agent = Model + Harness.

Core components

Guides (system prompts, AGENTS.md files, constraint documents)
Sensors (evals, validation loops, output parsers)
Context pipelines (metadata graphs, RAG systems, semantic layers)
Orchestration (routing, multi-agent coordination, recovery paths)

Are your AI agents stuck in POC?

Assess Context Maturity

Build Your AI Context Stack

Get the blueprint for implementing context graphs across your enterprise. Four-layer architecture from metadata foundation to agent orchestration, with practical implementation steps for 2026.

Get the Stack Guide

What is agent engineering?

Agent engineering is the practice of designing, building, and continuously improving the full system that governs how an AI agent behaves in production. The model is just one part of that system. Agent engineering covers everything else: the instructions that shape the agent’s reasoning, the mechanisms that verify its outputs, the pipelines that supply it with relevant context, and the orchestration layer that coordinates its actions across tools and other agents.

The discipline crystallized as a named field in early 2026. The trigger was an OpenAI publication on internal agent infrastructure that documented how the team built systems around Codex to make it reliable at enterprise scale. Mitchell Hashimoto, creator of Terraform and Ghostty, distilled the paper’s core insight into a formula that engineers immediately adopted:

Agent = Model + Harness

The harness is everything you build around the model: system prompts, tool definitions, context policies, sandboxes, subagent routing, feedback loops, and recovery paths. OpenAI’s paper stated the conclusion plainly: a decent model with a great harness beats a great model with a poor harness.

Hashimoto articulated the governing principle of the discipline: “Anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.” Agent engineering is the systematic application of that principle across every component of the system.

How agent engineering differs from building a chatbot

That definition has a practical implication. A chatbot, even a sophisticated one with memory and tool access, handles discrete conversational turns. An agent receives a goal and takes a sequence of actions across many steps, calling tools, retrieving information, and adapting based on intermediate results. The engineering challenge shifts from “what is the right prompt?” to “what is the right system?”

That system must handle partial information, multi-step reasoning, tool failures, policy constraints, and runtime uncertainty. A chatbot’s failure mode is a bad response. An agent’s failure mode is an incorrect action taken autonomously, at scale, with consequences that propagate across systems. Building agents reliably is the work of agent engineering.

Agent engineering vs. prompt engineering vs. context engineering

These three disciplines are related but address different layers of the AI stack. Understanding the distinction is essential before choosing where to invest engineering effort.

Prompt engineering

Prompt engineering focuses on the instruction itself: how you phrase a question, structure a system message, or format an output request for a single model inference. It operates at the level of one call to one model. Skilled prompt engineers do incorporate context (through few-shot examples, chained prompts, and retrieval augmentation), but the practice centers on the instruction layer. As a standalone discipline applied to production agent systems that run thousands of inferences across changing data, prompt engineering is necessary but insufficient.

Context engineering

Context engineering determines what the agent knows when it reasons. It covers the design of pipelines that retrieve, filter, rank, and format information from external sources before it reaches the model. This includes semantic layers, metadata graphs, RAG systems, and ontologies.

Context engineering is the layer beneath prompt engineering. You can write the perfect prompt, but if the agent is reasoning over stale, incomplete, or miscategorized data, the output will be wrong. A Snowflake experiment demonstrated this concretely: adding an ontology layer to an agent’s context improved accuracy by 20% and reduced unnecessary tool calls by 39%, compared to a prompt-only baseline.

An MIT study reached the same conclusion from the enterprise side. 95% of GenAI pilots delivered no measurable P&L impact. The bottleneck was not model capability. It was missing business context: the agent did not know the organization’s definitions, policies, and data relationships.

Agent engineering

Agent engineering is the complete system layer. It encompasses prompt engineering and context engineering, and adds everything else required to make an agent reliable in production: guides, sensors, orchestration, evaluation, governance, and continuous improvement processes.

The three disciplines form a hierarchy. Prompts craft instructions within contexts curated by retrieval pipelines, while harnesses enforce boundaries and measure performance across thousands of inferences. None replaces the others. The question is which layer needs more investment at a given stage of maturity. For most enterprise teams in 2026, prompt engineering is already in place; context infrastructure and harness governance are the gaps that keep agents from reaching production. 82% of IT and data leaders agree that prompt engineering alone is no longer sufficient to power AI at scale, according to the 2026 State of Context Management Report.

Core components of agent engineering

The harness is not a single artifact. It is an ensemble of components, each addressing a different failure mode. Here is the standard architecture that agent engineering teams build and maintain.

Guides

Guides are the documents and structures that direct agent behavior at runtime. They include system prompts, AGENTS.md files (the open standard that emerged from collaboration across OpenAI, Google, Cursor, and others in August 2025), constraint documents, and role definitions for multi-agent systems.

Good guides do two things: they define what the agent should do in normal operating conditions, and they define what the agent should do when something unexpected happens. Guides are versioned artifacts, not set-and-forget configurations.

Sensors

Sensors are the mechanisms that verify what the agent produces. They include evaluation suites, output parsers, validation loops, consistency checks, and confidence scoring systems. Without sensors, you have no signal about whether the agent is working correctly. With sensors, every failure becomes a data point that improves the system.

Sensors are the implementation of Hashimoto’s principle: when the agent makes a mistake, you build a sensor that catches that mistake class, then use it to verify the fix holds across future deployments.

Context pipelines

Context pipelines are the retrieval and packaging systems that supply the agent with information at runtime. They pull from databases, data catalogs, knowledge graphs, semantic layers, and external APIs. They filter, rank, and format that information so it fits within the model’s context window and is relevant to the current task.

Context architecture for AI agents is a distinct engineering problem. The agent must receive not just data, but governed data: information tagged with ownership, lineage, quality scores, and access policies. An agent operating on ungoverned context is an agent operating on assumptions.

Orchestration

Orchestration covers how the agent routes between tools, how multi-agent systems delegate subtasks, and how the system recovers from partial failures. It includes the logic that decides when to call an external API, when to invoke a sub-agent, when to ask for clarification, and when to abort with an error rather than hallucinate an answer.

Orchestration is where most enterprise agent engineering complexity lives. A single agent calling a single tool is tractable. An agent that coordinates a pipeline of specialized sub-agents, each with its own tools and context requirements, is an engineering system that demands explicit design.

Why agent engineering matters in 2026

Inside Atlan AI Labs & The 5x Accuracy Factor

Learn how context engineering drove 5x AI accuracy in real customer systems. Explore real experiments, quantifiable results, and a repeatable playbook for closing the gap between AI demos and production-ready systems.

Download E-Book

Enterprise AI investment is accelerating, but production deployments are not keeping pace. The numbers make the gap clear.

88% of AI agent projects fail to reach production. Analysis of enterprise deployments across 2024 and 2025 found that fewer than 1 in 8 agent initiatives successfully reach sustained production operation. 78% of enterprises have at least one agent pilot running, but only 14% have scaled an agent to organization-wide use.

The average cost of a failed AI agent project reaches $340,000 in direct expenses alone, before accounting for the opportunity cost of delayed automation benefits.

Five failure modes account for 89% of scaling failures:

Integration complexity with legacy systems
Inconsistent output quality at volume
Absence of monitoring tooling
Unclear organizational ownership of agent behavior
Insufficient domain-specific training context

The LangChain State of Agent Engineering survey, covering 1,340 industry professionals, found that 32% cite output quality and hallucinations as the top barrier to production. Among organizations with 10,000 or more employees, context engineering and managing context at scale ranked as the top ongoing challenge.

These failure modes reflect a mix of model and system challenges, but they cluster around a common theme: the infrastructure surrounding the model is not production-grade. Models do hallucinate, and model improvements are real and ongoing. But the failures that prevent 88% of projects from reaching production (integration complexity, missing monitoring, unclear ownership, insufficient domain context) are harness failures. They require engineering solutions, not just better models.

Agent engineering is the response to this reality. It treats AI agent deployment as an engineering discipline that must address context quality, evaluation, orchestration, and governance, not just model selection. The five failure modes map directly to harness components: integration complexity is an orchestration and context pipeline problem; inconsistent output quality requires sensors and eval suites; absent monitoring is an observability gap; unclear ownership is a governance gap; insufficient domain context is a context layer gap.

Addressing all five simultaneously is the work of agent engineering.

How the context layer powers agent engineering

The deepest challenge in agent engineering is not orchestration or evaluation. It is context. What does the agent know? Where did that knowledge come from? Is it current? Is it accurate? Is it authorized?

These questions define the context layer for enterprise AI: the governed metadata infrastructure that makes agent reasoning reliable. The context layer is not a feature of the model. It is the infrastructure the harness draws from.

The context cold start problem

Enterprise agents fail when they hit the organizational cold start problem: the agent does not know what the organization knows. Tribal knowledge, the shared understanding of what terms mean, what data sources are authoritative, and what policies apply to which systems, lives in people’s heads and not in queryable infrastructure.

Encoding that knowledge into a governed, machine-readable context layer is the foundational work of enterprise agent engineering. Teams attempt this in different ways: fine-tuning models on domain-specific data, maintaining prompt libraries, or building custom RAG pipelines. Each approach can help. The challenge at enterprise scale is that tribal knowledge spans thousands of concepts, changes continuously, and must be governed across multiple systems simultaneously. A unified context layer (queryable, versioned, and policy-aware) is the most durable architecture for making that knowledge accessible to agents at runtime.

Governed context as runtime infrastructure

Context engineering resolves this by treating metadata as runtime infrastructure, not documentation. The agent does not consult a glossary passively. It queries a live context graph that carries semantic definitions, data lineage, quality scores, governance policies, and ownership information for every asset it touches.

Gartner predicts that by 2028, more than 50% of AI agent systems will rely on context graphs for guardrails, observability, and audit. Context graphs provide the “why” and “how” behind every agent action: the decision traces that systems of record always miss. This makes them the foundation of the four pillars Gartner identifies for trustworthy agentic systems: guardrailing, observability, evaluation, and self-learning.

Context that improves continuously

A well-engineered context layer does not stay static. Every agent interaction produces a signal: a correction, a successful retrieval, a policy boundary hit. Those signals become context updates. Correct answers become regression tests in the eval suite. The context layer gets better every week without changing the model.

This is the compounding advantage of investing in context infrastructure. Organizations that treat context as one-time setup discover the agent regresses as data evolves. Organizations that build continuous context lifecycle management discover the agent improves as the organization learns.

The CIO's Guide to Context Graphs

Discover the key strategies that CIOs are using to implement context layers and scale AI.

Get the Guide

Tools and frameworks for agent engineering

Agent engineering requires a stack of tools: some for orchestration, some for context management, and some for evaluation and governance.

Orchestration frameworks

LangGraph is the leading enterprise-grade orchestration framework as of 2026, having surpassed CrewAI in GitHub stars driven by enterprise adoption. It models agents as nodes in a directed graph, with state flowing through edges and conditional logic determining routing. LangGraph maps cleanly to production requirements: audit trails, rollback points, and deterministic state management. Best suited for cyclical tasks with feedback loops and for systems where observability is a hard requirement.

CrewAI uses role-based agent teams with intuitive task delegation. It is the fastest path from idea to working demo (typically 2-3 engineer-days). Best suited for linear task pipelines (A to B to C), though its debugging experience for complex cycles is weaker than LangGraph.

Google ADK (released April 2025) provides a hierarchical agent tree where a root agent delegates to sub-agents, integrating tightly with Vertex AI, Gemini models, and Google Cloud infrastructure.

Microsoft AutoGen is effectively in maintenance mode as of 2026. Microsoft shifted strategic focus to its broader Agent Framework. Bug fixes and security patches continue, but major feature development has stopped.

Context and protocol standards

MCP (Model Context Protocol) is the open standard for exposing context layer infrastructure to external agent frameworks. It allows agent frameworks like Claude, LangGraph, and Vertex to query governed metadata through a consistent interface. Atlan’s MCP server exposes the enterprise context layer to any MCP-compatible agent.

AGENTS.md is the open specification for declaring an agent’s behavioral constraints in a structured document. It emerged from collaboration across OpenAI, Google, Cursor, and Factory in August 2025 and is now a standard component of harness design for AI agents in software engineering and beyond.

Evaluation and governance tooling

Sensors require tooling. Evaluation frameworks that run automated test cases against agent outputs, observability platforms that trace individual agent steps and tool calls, and context versioning systems that track which version of a context graph an agent ran against. These are the governance layer of agent engineering.

89% of enterprise organizations have implemented some form of agent observability, according to LangChain’s State of Agent Engineering report. 62% have detailed tracing that allows inspection of individual agent steps.

The tools are mature enough. The gap in most enterprise programs is not the orchestration framework or the eval library. It is the governed context infrastructure those tools draw from. The following examples show what changes when organizations close that gap.

Real stories from real customers: Context layer in production agent engineering

"We're excited to build the future of AI governance with Atlan. All of the work that we did to get to a shared language at Workday can be leveraged by AI via Atlan's MCP server…as part of Atlan's AI Labs, we're co-building the semantic layer that AI needs with new constructs, like context products."

— Joe DosSantos, VP of Enterprise Data & Analytics, Workday

Watch Now →

Workday’s data and analytics team built a shared semantic language across the organization over many years. The challenge was making that institutional knowledge accessible to AI at runtime. By connecting Atlan’s MCP server to their agent infrastructure, the semantic layer Workday already invested in became the context layer their agents query at every step. The AI benefits from the same shared language the organization uses, without any additional prompt engineering.

"Atlan is much more than a catalog of catalogs. It's more of a context operating system…Atlan enabled us to easily activate metadata for everything from discovery in the marketplace to AI governance to data quality to an MCP server delivering context to AI models."

— Sridher Arumugham, Chief Data & Analytics Officer, DigiKey

Watch Now →

DigiKey’s Chief Data and Analytics Officer describes Atlan as a context operating system. That framing captures the shift from data catalog to agent engineering infrastructure: the metadata layer is no longer passive documentation. It is an active, queryable system that delivers governed context to AI models through MCP, data quality signals to governance tools, and discovery capabilities to the data marketplace.

Agent engineering is a data infrastructure discipline

Agent engineering arrived as a name in 2026, but the underlying problem is older: enterprise systems fail when they cannot share reliable context across boundaries. AI agents expose this problem more acutely than any previous technology because they act on context autonomously, at scale, and with consequences that propagate across systems.

The engineering challenge is clear. Build the harness: the guides, sensors, context pipelines, and orchestration that make a model production-ready. Invest in context infrastructure: the governed metadata layer that makes the harness reliable. Instrument everything: the eval suites, observability tooling, and governance processes that turn each failure into a system improvement.

The roughly 12% of organizations that successfully scale AI agents to production operate across a range of model providers and frameworks. What they share is not superior model access. They tend to have invested earlier in governance, evaluation infrastructure, and context pipelines: the components that the other 88% treat as future work. Agent engineering is the discipline that makes those investments systematic.

Book a Demo

Frequently asked questions about agent engineering

What is agent engineering?
Agent engineering is the discipline of designing, building, and governing the complete system around an AI model in production. This system, called the harness, includes guides that direct agent behavior, sensors that verify outputs, context pipelines that supply relevant information at runtime, and orchestration that coordinates agent actions. The core formula is Agent = Model + Harness.
Who coined the term “agent engineering” and when?
The practice crystallized as a named discipline in early 2026. Mitchell Hashimoto, creator of Terraform and Ghostty, distilled OpenAI’s internal agent infrastructure work into the formula Agent = Model + Harness. The OpenAI publication “Harness engineering: leveraging Codex in an agent-first world” (February 11, 2026) provided the foundational framework that Hashimoto synthesized.
How is agent engineering different from prompt engineering?
Prompt engineering focuses on crafting the instruction for a single model inference. Agent engineering addresses the entire system governing how the agent behaves across thousands of inferences in production, including guides, sensors, context pipelines, and orchestration. Prompt engineering is one input to a harness; agent engineering is the practice of building and maintaining the harness itself.
Why do most AI agent projects fail?
88% of AI agent projects fail to reach production. The five dominant failure modes are: integration complexity with legacy systems, inconsistent output quality at volume, absence of monitoring tooling, unclear organizational ownership, and insufficient domain-specific context. These are harness failures, not model failures. The model produces plausible outputs; the system around it cannot verify or constrain those outputs reliably.
What is the relationship between context engineering and agent engineering?
Context engineering is the foundation of agent engineering. It designs the pipelines that supply the agent with accurate, governed information at runtime: semantic layers, metadata graphs, RAG systems, and ontologies. Agent engineering builds the complete harness on top of that context infrastructure. You can have excellent orchestration and strong sensors, but if the agent is reasoning over poor-quality context, the system will produce unreliable outputs.
What frameworks do agent engineering teams use?
The leading orchestration frameworks as of 2026 are LangGraph (enterprise-preferred, graph-based, strong audit trail support), CrewAI (role-based, fastest to prototype), and Google ADK (tightly integrated with Vertex AI and Gemini). For context exposure, MCP (Model Context Protocol) is the emerging open standard that allows agent frameworks to query governed metadata infrastructure through a consistent interface.
How does a context layer improve agent engineering outcomes?
A governed context layer provides the agent with accurate, versioned, policy-compliant information from every system it needs to operate across. Snowflake’s research found that adding an ontology layer improved agent accuracy by 20% and reduced unnecessary tool calls by 39%, compared to a prompt-only baseline. Context layers also enable continuous improvement: each agent interaction produces signals that refine the context graph over time.
What does Gartner predict for agent engineering infrastructure?
Gartner predicts that by 2028, more than 50% of AI agent systems will rely on context graphs for guardrails, observability, and audit. Gartner identifies four pillars of trustworthy agentic systems (guardrailing, observability, evaluation, and self-learning), all grounded in context graph infrastructure. Gartner also predicts that 40% of agentic AI projects will be canceled by end of 2027, driven largely by governance and context infrastructure gaps.

Sources

Share this article

Atlan is the next-generation platform for data and AI governance. It is a control plane that stitches together a business's disparate data infrastructure, cataloging and enriching data with business context and security.

Book a Demo See Context Studio Live

What Is Agent Engineering?

Key takeaways

What is agent engineering?

Core components

What is agent engineering?

How agent engineering differs from building a chatbot

Agent engineering vs. prompt engineering vs. context engineering

Prompt engineering

Context engineering

Agent engineering

Core components of agent engineering

Guides

Sensors

Context pipelines

Orchestration

Why agent engineering matters in 2026

How the context layer powers agent engineering

The context cold start problem

Governed context as runtime infrastructure

Context that improves continuously

Tools and frameworks for agent engineering

Orchestration frameworks

Context and protocol standards

Evaluation and governance tooling

Real stories from real customers: Context layer in production agent engineering

Agent engineering is a data infrastructure discipline

Frequently asked questions about agent engineering

Sources

Agent engineering: related reads

Bridge the context gap.Ship AI that works.

Bridge the context gap.
Ship AI that works.