Prompt engineering got AI teams off the ground. Context engineering made agents smarter. But neither solved production reliability, and that is where harness engineering enters.
The three-tier evolution maps to a progression in where failure lives:
- Prompt engineering. Failure mode: poorly worded instructions produced inconsistent outputs. Solution: craft better instructions.
- Context engineering. Failure mode: agents confidently reasoned over the wrong information. Solution: architect what goes into the context window.
- Harness engineering. Failure mode: well-instructed agents with good context still failed unpredictably at scale. Solution: engineer the entire operational environment.
Each discipline contains the one before it. The shift at each boundary is not a replacement; it is an expansion of what counts as the controllable surface.
Below, we explore: the core distinctions, what prompt engineering is, what context engineering is, what harness engineering is, how they compare head-to-head, how they work together, and how Atlan addresses the data layer.
Build Your AI Context Stack
Get the blueprint for implementing context graphs across your enterprise. This guide walks through the four-layer architecture — from metadata foundation to agent orchestration — with practical implementation steps for 2026.
Get the Stack GuideQuick comparison: prompt engineering vs context engineering vs harness engineering
Permalink to “Quick comparison: prompt engineering vs context engineering vs harness engineering”| Dimension | Prompt Engineering | Context Engineering | Harness Engineering |
|---|---|---|---|
| What it controls | Single instructions | Context window content | Full agent environment |
| Level of abstraction | Message-level | Session-level | System-level |
| Era | 2022-2024 | 2025 | 2026+ |
| Primary failure mode | Poorly worded instructions | Wrong or incomplete context | Data layer failures |
| Tools needed | None beyond LLM | RAG, retrieval systems | Orchestration + data governance |
| Best for | Simple queries | Knowledge-grounded answers | Autonomous, multi-step agents |
| Who coined it | Community/informal | Karpathy, Lutke (2025) | Martin Fowler, Ryan Lopopolo (2026) |
| Key practitioner quote | “Use better words” | “Context is RAM, model is CPU” | “Agents aren’t hard; the Harness is hard” |
Context engineering is defined in the full guide on this site.
Prompt engineering vs context engineering vs harness engineering: what’s the difference?
Permalink to “Prompt engineering vs context engineering vs harness engineering: what’s the difference?”The three disciplines are nested layers, not competing methodologies. Think of it as a computing stack: the model is the CPU, context is the RAM, and the harness is the operating system. Each layer assumes the previous one is in place.
Prompt engineering operates at the message level. It controls what you type, how you structure instructions, and how you frame a question to elicit a useful response from a large language model. It is the smallest unit of control.
Context engineering operates at the session level. It controls what information the model sees across an entire task: retrieved documents, memory fragments, tool call outputs, conversation history, and schema definitions. Andrej Karpathy defined it as “the delicate art and science of filling the context window with just the right information for the next step.”
Harness engineering operates at the system level. Martin Fowler’s formulation is the clearest: “Agent = Model + Harness.” The harness is everything in an agent except the model itself: guides, sensors, toolchain management, memory systems, and lifecycle management.
Historical context
Permalink to “Historical context”Between 2022 and 2024, prompt engineering was the primary lever for improving LLM output quality. It worked well for single-turn, bounded tasks.
In 2025, Karpathy’s definition crystallized context engineering as a distinct discipline. Tobi Lutke (Shopify) described it as “the art of providing all the context for the task to be plausibly solvable by the LLM.” Simon Willison captured why the rename mattered: prompt engineering had been “redefined to mean typing prompts full of stupid hacks into a chatbot.” The term was diluted by overuse, but the practice of carefully crafting instructions was not diminished.
In 2026, harness engineering emerged as agents operating at scale revealed problems that neither prompt design nor context architecture alone could solve. Ryan Lopopolo of the OpenAI Codex team named the shift precisely: “Agents aren’t hard; the Harness is hard.”
Why confusion persists
Permalink to “Why confusion persists”The three disciplines are nested, not competing, which means practitioners solving a harness problem are also doing context engineering and prompt engineering simultaneously. Martin Fowler’s framing complicates this further: he positions harness engineering as a specific form of context engineering for coding agents, not above it. That definitional instability is real.
The more consequential confusion: practitioners who fix the harness architecture but not the data it reads are solving the wrong 20% of the problem. Understanding what an agent harness actually contains helps clarify which failure mode you are actually facing.
Inside Atlan AI Labs & The 5x Accuracy Factor
Learn how context engineering drove 5x AI accuracy in real customer systems. Explore real experiments, quantifiable results, and a repeatable playbook for closing the gap between AI demos and production-ready systems.
Download E-BookWhat is prompt engineering?
Permalink to “What is prompt engineering?”Prompt engineering is the discipline of designing instructions that elicit accurate, useful responses from large language models. It operates at the message level: the user’s input, system prompt, and turn-by-turn instructions within a single interaction.
The primary techniques are well established:
- Zero-shot prompting: asking the model directly, without examples, relying on pre-trained knowledge
- Few-shot examples: providing sample inputs and outputs to calibrate model behavior without fine-tuning
- Chain-of-thought reasoning: prompts that elicit step-by-step reasoning to improve accuracy on complex tasks
- Role assignment: instructions that define the model’s context, expertise, and behavioral constraints for a session
- Output format directives: instructions that control response structure (JSON, markdown, bullet lists) for downstream parsing
Why it mattered, and why it reached its limits
Permalink to “Why it mattered, and why it reached its limits”Between 2022 and 2024, prompt engineering was the fastest way to improve LLM output quality. For single-turn or short-session tasks (writing, classification, extraction) it delivered meaningful gains quickly.
The gains were capped at the instruction layer. Prompt engineering for data retrieval helped teams extract structured output from raw data, but it could not prompt its way to reliable multi-step agent behavior. The failure mode was not poor wording. It was that the unit of control was too small. Once agents needed to reason over retrieved information, manage state across steps, and call external tools, a better question was not sufficient.
Where it still applies
Permalink to “Where it still applies”Prompt engineering is not dead. It is a contained, necessary discipline inside every context engineering and harness engineering system. Every context-building pipeline still produces prompts; every harness still issues instructions. The difference is that prompt engineering is no longer the primary lever; it is a component inside a larger architecture.
Teams that treat prompt tuning as the answer to multi-step agent failures are applying a message-level fix to a system-level problem.
What is context engineering?
Permalink to “What is context engineering?”Context engineering is the discipline of deciding what information populates the context window at inference time. The unit of control shifts from the individual message to the entire session: all inputs the model sees in a single task, including retrieved documents, memory fragments, tool outputs, conversation history, and schema definitions.
Karpathy’s definition is the most-cited: “the delicate art and science of filling the context window with just the right information for the next step.” The key word is “right”: not just any information, but information that is relevant, timely, and trustworthy.
Core components
Permalink to “Core components”Context engineering involves five interconnected systems:
- Retrieval pipeline: the mechanism (RAG, vector search, keyword search) that selects which documents or data chunks enter the context window
- Memory system: short-term conversation history, long-term episodic memory, and semantic memory that persists across sessions
- Context compression: techniques (summarization, filtering, prioritization) that fit more relevant information into a fixed token budget
- Dynamic tool result injection: real-time injection of tool call outputs (API results, database queries, search snippets) into the active context
- System context layer: metadata, schema definitions, business glossary terms, and ontology anchors that give the model semantic grounding beyond raw text
Why context engineering is the dominant discipline in 2025-2026
Permalink to “Why context engineering is the dominant discipline in 2025-2026”The shift from prompt to context engineering coincided with the rise of autonomous agents: systems that run multi-step tasks without human input at each step. Jeff Huber, CEO of Chroma, framed it directly: “Context engineering is the job today. If you’re looking to build a production system, that is the job.”
Gartner identified context engineering as a critical skill for successful AI-enabled processes. Enterprises are now hiring context designers alongside ML engineers, reflecting how central the discipline has become to production AI deployment.
What context engineering alone cannot solve
Permalink to “What context engineering alone cannot solve”In practice, most context engineering implementations operate without adequate data governance; they treat retrieved information as trustworthy by default. In enterprise environments, this assumption fails constantly.
Stale lineage, uncertified tables, schema drift, and conflicting business definitions enter the context window and produce reliable-sounding hallucinations. The agent is not making a reasoning error. It is reasoning correctly over incorrect inputs. That is a data problem, not a model problem. A rigorous context engineering system would include data certification as part of its information selection criteria, but most implementations skip this step entirely.
What is harness engineering?
Permalink to “What is harness engineering?”Harness engineering is the discipline of designing and managing the entire environment in which an AI agent operates. That means everything except the model itself. Martin Fowler’s formulation is clean: “Agent = Model + Harness.” The harness is the operational wrapper that determines whether the model’s capabilities translate to reliable real-world behavior.
The term crystallized in early 2026 as agent deployments at scale revealed that orchestration problems could not be solved at the prompt or context layer alone. Ryan Lopopolo of the OpenAI Codex team put it plainly: “Agents aren’t hard; the Harness is hard.”
Core components
Permalink to “Core components”Fowler’s framework distinguishes two control mechanisms:
- Guides (feedforward controls): AGENTS.md files, constraint documents, system-level instructions that shape agent behavior before the task begins
- Sensors (feedback controls): validation loops, output checkers, and monitoring systems that evaluate agent actions in real time and trigger corrections
Beyond these controls, the harness includes:
- Toolchain management: the set of tools, APIs, and data sources the agent can call, including permission scopes and rate limits
- Memory and state systems: persistent storage of agent decisions, task history, and context across multi-step and multi-session workflows
- Lifecycle management: deployment, versioning, rollback, and observability for agents running in production environments
What the harness changes in practice
Permalink to “What the harness changes in practice”The performance evidence is striking. In one documented case, the same model, same data, same prompt achieved a 42% to 78% task success rate improvement purely through improved harness configuration. Independent research found harness configurations can improve agent solve rates by 64% compared to basic setups.
At scale, the OpenAI Codex team used harness engineering to produce approximately 1 million lines of code and 1,500 pull requests over five months, with zero human-written code. Stripe’s “Minions” system merged 1,300 AI pull requests per week without human oversight. These numbers are not model achievements; they are harness achievements.
The unclaimed failure mode
Permalink to “The unclaimed failure mode”Most practitioner discussion on harness engineering focuses on the wiring: orchestration architecture, constraint files, AGENTS.md docs, feedback loops. This is important work. But the most common failure point is what flows through those controls, not the controls themselves.
In enterprise deployments, what flows through the harness is data. The quality, certification status, and semantic richness of that data determines whether a correctly built harness produces reliable agents or sophisticated hallucination machines. Data quality for AI agent harnesses addresses this failure mode directly.
Head-to-head comparison
Permalink to “Head-to-head comparison”The most important divergence between the three disciplines is not what they control; it is where their failures live. All three share a root cause in enterprise deployments: the data layer.
| Dimension | Prompt Engineering | Context Engineering | Harness Engineering |
|---|---|---|---|
| Primary focus | Instruction quality at message level | Information selection and session architecture | Full agent operating environment |
| Key stakeholder | Any LLM user | AI/ML engineers, data engineers | Senior AI engineers, platform teams |
| Measurement approach | Output quality per prompt | Context relevance, retrieval precision | Agent task success rate, error recovery rate |
| Implementation scope | Single prompt or conversation | Retrieval pipeline, memory systems | Orchestration + constraints + toolchain + data governance |
| Time to value | Immediate | Days to weeks | Weeks to months |
| Tooling requirements | LLM API, prompt IDE | RAG framework, vector DB, memory store | Orchestration layer + data catalog + quality monitoring |
| Organizational impact | Individual productivity | Team-level agent quality | Enterprise AI reliability |
| Failure mode | Poorly worded instructions; ambiguous outputs | Wrong, stale, or semantically ambiguous retrieved content | Data layer failures: uncertified tables, schema drift, stale lineage |
| Maturity indicators | Consistent output quality on known tasks | Reliable grounding across diverse queries | Production agents with self-healing behavior and explainable decisions |
| Containment relationship | Subset of context engineering | Subset of harness engineering | Contains both |
Real-world example: an enterprise financial analyst agent
Permalink to “Real-world example: an enterprise financial analyst agent”A team builds an agent to answer questions about quarterly revenue by product line. They craft careful prompts (prompt engineering). They build a RAG pipeline retrieving from their data warehouse (context engineering). They add an orchestration layer with validation loops and constraint files (harness engineering).
The agent still returns stale numbers. The revenue table in the warehouse has a last_certified timestamp 90 days old. Schema drift introduced a new adjusted_revenue column that the agent’s retrieval logic does not know about. The lineage for net_revenue traces back to a deprecated ETL job.
The harness is correctly built. The data it reads is not trustworthy. A complete harness would include data certification checks in its sensor layer, but this is precisely the gap most harness implementations leave open.
How prompt engineering, context engineering, and harness engineering work together
Permalink to “How prompt engineering, context engineering, and harness engineering work together”The three disciplines are nested, not competing. Harness engineering contains context engineering, which contains prompt engineering. Each layer contributes something the others cannot provide alone.
Integration pattern 1: an autonomous research agent
Permalink to “Integration pattern 1: an autonomous research agent”A research agent uses prompt engineering to structure each search query, context engineering to select and rank retrieved documents into the context window, and harness engineering to manage tool calls, validate source quality, and track task state across multi-step research sessions.
- Prompt engineering contributes: Query formulation, result structuring directives
- Context engineering contributes: Document retrieval, relevance ranking, context compression
- Harness engineering contributes: Tool orchestration, session state, output validation
- Combined outcome: An agent that produces cited, structured research summaries without hallucinated sources
Integration pattern 2: a data validation loop for an analytics agent
Permalink to “Integration pattern 2: a data validation loop for an analytics agent”An analytics agent pulls data via SQL (harness toolchain), filters it into the context window (context engineering), then uses prompted reasoning to generate insights (prompt engineering). Harness sensors validate that data assets used in each query carry a certified: true flag before populating the context.
- Prompt engineering contributes: Insight framing, output format specification
- Context engineering contributes: SQL result injection, schema context, column definitions
- Harness engineering contributes: Data certification checks, query sandboxing, anomaly detection sensors
- Combined outcome: An analytics agent that only reasons over certified, lineage-tracked data, reducing hallucination risk at the source
Integration pattern 3: a multi-agent orchestration system
Permalink to “Integration pattern 3: a multi-agent orchestration system”A lead orchestrator dispatches tasks to specialized sub-agents. Each sub-agent has its own prompt layer and context retrieval logic. The harness manages inter-agent communication, prevents context bleed between tasks, and enforces a shared constraint file that all agents read before acting.
- Prompt engineering contributes: Per-agent role definitions and task instructions
- Context engineering contributes: Task-specific retrieval, memory routing between agents
- Harness engineering contributes: Orchestration, constraint enforcement, inter-agent communication governance
- Combined outcome: Agent systems that can merge thousands of PRs weekly without human oversight
When to prioritize each
Permalink to “When to prioritize each”Start with prompt engineering when you are using LLMs for single-turn tasks, your use case is conversational or task-isolated, and the failure mode is consistently poor or inconsistent output quality.
Invest in context engineering when you are building agents that need to reason over retrieved information, your RAG pipeline is returning irrelevant results, or your agents are confidently grounding on incorrect facts.
Invest in harness engineering when you are running autonomous, multi-step agents in production; your context engineering is solid but agents still fail unpredictably; or you are operating at scale where manual intervention is not viable.
Govern the data layer when all three layers are solid but agents still hallucinate or make wrong decisions. That is almost always a data quality problem, not an architecture problem.
How Atlan addresses the data layer that makes harness engineering work
Permalink to “How Atlan addresses the data layer that makes harness engineering work”The practitioner consensus is clear: harness engineering is hard. But the conversation stops at the orchestration layer. It focuses on constraint files, feedback loops, and toolchains, and ignores the data those components read from.
Gartner found that 60% of AI projects will be abandoned without AI-ready data. A separate finding: 63% of organizations lack proper data management practices for AI. Both describe the same gap: teams engineering the harness while leaving the data layer unmanaged.
When the harness reads from stale lineage, uncertified tables, or semantically ambiguous columns, the result is not a harness failure. It is a data failure that looks like a model failure. Prukalpa Sankar, Atlan’s co-CEO, frames it directly: “every wrong AI answer is a context bug, not a model bug; treat it the same way you treat a broken data pipeline.”
Atlan is the governed data layer that the harness reads from. Its capabilities map directly to the failure modes harness engineering leaves unsolved:
- Context Engineering Studio builds, tests, and deploys AI context from governed metadata, so the context window is populated with certified, lineage-tracked information
- Metadata Lakehouse provides column-level lineage, certified table status, business glossary, and usage patterns: the semantic grounding that prevents schema drift failures
- MCP Server exposes governed context directly to AI agents via Model Context Protocol, plugging Atlan into the harness toolchain natively
- Data Quality Studio runs business-driven quality checks that flow directly into the context layer
- Enterprise Data Graph and Agent Context Layer combine semantic layer, ontology, operational playbooks, lineage, and decision memory into the governed runtime environment harness engineers need
Real stories from real customers: governed data makes the harness reliable
Permalink to “Real stories from real customers: governed data makes the harness reliable”"We're excited to build the future of AI governance with Atlan. All of the work that we did to get to a shared language at Workday can be leveraged by AI via Atlan's MCP server..."
-- Joe DosSantos, VP of Enterprise Data & Analytics, Workday
When Workday’s agents read from Atlan’s governed metadata (certified tables, agreed business definitions, verified lineage) the harness stops hallucinating because the data feeding it is trustworthy. The years of work building shared data language become AI-ready context, rather than technical debt that undermines agent reliability.
"Atlan is much more than a catalog of catalogs. It's more of a context operating system..."
-- Sridher Arumugham, Chief Data & Analytics Officer, DigiKey
DigiKey’s experience points to the same shift: the data catalog is no longer just an inventory system. When it becomes a context operating system, a governed layer that agents read from, the harness gains the semantic grounding it needs to operate reliably at scale.
Bottom line: why the data layer is the missing piece in harness engineering
Permalink to “Bottom line: why the data layer is the missing piece in harness engineering”Prompt engineering, context engineering, and harness engineering are not competing methodologies. They are nested layers of the same discipline, each addressing a failure mode the previous layer left exposed. Prompt engineering gave you better instructions. Context engineering gave you better information architecture. Harness engineering gave you a reliable operational environment.
But all three share a root cause when they fail in enterprise deployments: the data layer. Stale lineage, uncertified tables, schema drift, and semantically ambiguous columns defeat even the most carefully engineered harness. The organizations pulling ahead on autonomous AI are not just engineering better harnesses; they are governing the data those harnesses read from. That is the discipline that connects all three layers, and it is the one most teams have not yet built.
FAQs about harness engineering vs prompt engineering context engineering
Permalink to “FAQs about harness engineering vs prompt engineering context engineering”1. What is the difference between prompt engineering and context engineering?
Permalink to “1. What is the difference between prompt engineering and context engineering?”Prompt engineering designs individual instructions (what you ask the model). Context engineering designs what information the model sees: retrieved documents, memory, schema definitions, and tool outputs. Karpathy defined context engineering as “filling the context window with just the right information.” Prompt engineering is a subset of context engineering. Every context engineering system still produces prompts, but the unit of control shifted from the message to the session.
2. What is harness engineering in AI?
Permalink to “2. What is harness engineering in AI?”Harness engineering is the discipline of designing the full operational environment for an AI agent, covering everything except the model itself. Martin Fowler defined the relationship as “Agent = Model + Harness.” The harness includes guides (feedforward controls like AGENTS.md files), sensors (feedback controls like validation loops), toolchain management, memory systems, and lifecycle management. It operates at the system level, not the message or session level.
3. How does harness engineering differ from context engineering?
Permalink to “3. How does harness engineering differ from context engineering?”Context engineering focuses on what information goes into the context window, specifically the content of what the model sees. Harness engineering focuses on how the entire agent environment operates: tools, constraints, feedback loops, memory, and lifecycle management. Context engineering is a component inside the harness; the harness contains and orchestrates the context engineering layer alongside all other agent subsystems.
4. Is prompt engineering dead in 2026?
Permalink to “4. Is prompt engineering dead in 2026?”No. Prompt engineering is a contained, necessary discipline inside every context engineering and harness engineering system. It did not die; it was reclassified. Between 2022 and 2024 it was treated as the primary lever for LLM improvement. As agents became more complex, the unit of control shifted to the session level (context) and then to the system level (harness). Prompts still exist inside both.
5. Who coined the term context engineering?
Permalink to “5. Who coined the term context engineering?”The term was popularized in June 2025, primarily by Andrej Karpathy, who defined it as “the delicate art and science of filling the context window with just the right information for the next step.” Tobi Lutke (Shopify CEO) and Simon Willison also shaped early usage. The term emerged to distinguish serious context architecture work from casual prompt crafting, which had diluted “prompt engineering” as a term.
6. What are the three phases of AI engineering evolution?
Permalink to “6. What are the three phases of AI engineering evolution?”The three phases are: prompt engineering (2022-2024), which focused on message-level instruction design; context engineering (2025), which focused on architecting what information the model reasons over; and harness engineering (2026+), which focuses on designing the full operational environment for autonomous agents. Each phase exposed a deeper failure mode: from poorly worded instructions, to wrong context, to data layer failures in the underlying information the harness reads from.
Sources
Permalink to “Sources”- Simon Willison (simonwillison.net) — “Context Engineering”: https://simonwillison.net/2025/jun/27/context-engineering/
- Birgitta Böckeler (martinfowler.com) — “Harness Engineering for Coding Agent Users”: https://martinfowler.com/articles/exploring-gen-ai/harness-engineering.html
- Epsilla Team — “The Third Evolution: Why Harness Engineering Replaced Prompting in 2026”: https://www.epsilla.com/blogs/harness-engineering-evolution-prompt-context-autonomous-agents
- MadPlay — “Beyond Prompts and Context: Harness Engineering for AI Agents”: https://madplay.github.io/en/post/harness-engineering
- Hugo Bowne-Anderson (Substack) — “Harness Engineering: Why Agent Context Isn’t Enough”: https://hugobowne.substack.com/p/harness-engineering-why-agent-context
- Andrej Karpathy (X/Twitter) — “Andrej Karpathy on context engineering”: https://x.com/karpathy/status/1937902205765607626
- arXiv — “Context Engineering: A Methodology for Structured Human-AI Collaboration”: https://arxiv.org/html/2604.04258v1
Share this article
