AI Agent Stack: Layers Every Enterprise Needs

Q: Where should my team start if we already have agents running in production without a context layer?

Start with the highest-frequency failure case on record. Pick one agent, one recurring wrong-answer pattern, and trace it back to which of the five context types is missing or conflicting: data, knowledge, meaning, user, or operational. Fix one, instrument the fix so corrections flow back into a persistent layer, then move to the next.

Q: Why do agents provide different answers to the same question?

Mostly due to conflicting context. Each department has a unique interpretation of the same metrics. Without a governing context layer that reconciles these differing interpretations based on a single definition, agents will default to one interpretation, and different agents or the same agent may default differently depending on the session.

Q: Who should own the context layer — the data team, the ML platform team, or individual product teams?

Federated. Domain experts own the definitions — what "revenue" means, what "active customer" includes, how escalation policies work — and the platform team owns connectivity and enforcement. Centralizing ownership creates a bottleneck; fully decentralizing it recreates the silo problem at the agent layer.

Quick facts

What the article covers	The five standard layers of an AI agent stack, the two missing layers required for enterprise, and the failure modes that expose the gap.
Primary architectural point	Model capability is commoditizing. Context infrastructure is the decisive differentiator for enterprise agent reliability.
Standard stack layers	Foundation model, orchestration, tools, memory, observability.
Missing enterprise layers	Enterprise context layer (governed business knowledge) and AI control plane (governance and access enforcement).
Production failure signal	60% of enterprises report little or no measurable return on AI investment (BCG, 2025).
Governance readiness gap	Only 21% of companies have a mature governance model for autonomous AI agents (Deloitte, 2026).

The paradox of the modern AI agent stack

The AI agent stack most enterprises are building looks complete on paper and fails in production. The reason is rarely the model. It is what the model does not know about the business it is supposed to serve. Many engineering teams have placed their bets on a false premise: that the right combination of base models and orchestration frameworks is enough to turn AI ambition into enterprise value, without a governed context layer underneath.

The empirical evidence suggests otherwise. BCG’s 2025 Build for the Future study, surveying 1,250 senior executives across 25 sectors, found that only 5% of companies qualify as “AI future-built” while 60% report little or no measurable return on their AI investments despite sustained spending. The gap is not a compute problem. It is a context problem. Menlo Ventures’ 2025 State of Generative AI in the Enterprise report, drawing on 495 U.S. enterprise decision-makers, shows enterprise AI investment tripling to $37 billion in a single year, with 47% of deals now reaching production. The money and the models are real. What determines whether those deployments hold up in production is what the agent knows about the business it is supposed to serve.

The gap between highly curated demonstrations and what survives contact with actual production data is now the defining structural problem of enterprise AI today.

A better model does not fix a context failure. It amplifies it.

Poor context has a measurable financial cost. According to Gartner, poor data quality costs organizations an average of $12.9 million annually. In practice, many of those losses trace back not to missing data but to missing context: ambiguous definitions, conflicting business rules, and the absence of a single, governed understanding of how the organization works.

Stronger models make this worse, not better. When a weak model produces a wrong answer, the output is usually obvious enough to catch and discard. When a strong model produces a wrong answer, it does so with full reasoning, structured formatting, and apparent confidence. The output sounds right. It gets acted on. The errors are more subtle, more convincing, and more consequential. Atlan’s research on AI agent hallucination documents how this plays out across production deployments.

Smarter agents operating on flawed context do not fail less often. They fail more elaborately. The agent stack you see in most diagrams is the assembly: model, framework, tools, memory. The context layer is the foundation. You can assemble a perfect agent on top of weak context, and it will still fail in production.

What does the standard AI agent stack cover?

Every ‘AI agent stack’ article covers roughly the same ground. Here is what the standard diagram includes, what each layer does, and where each one stops.

What does the foundation model layer actually do?

The reasoning core. This is the LLM that interprets queries, generates plans, and produces outputs. GPT-4o, Claude 3.7, Gemini 1.5, Llama 3 — the model choice matters far less than most teams assume. Models are commoditizing fast, and the performance gap between frontier and near-frontier options narrows with each release cycle. Model selection is increasingly the least important architectural decision in the enterprise AI stack.

What problem does orchestration solve?

The coordination layer. This is what manages multi-step agent workflows, handles task decomposition, routes between agents, and manages the execution loop. LangChain, LangGraph, CrewAI, and AutoGen are common choices, each with different tradeoffs around flexibility and production readiness. Orchestration solves a real problem: how to coordinate complex, multi-step agent behavior. It does not solve the question of what the agent knows about the business it is coordinating work for.

What do tool integrations give an agent?

The action layer. Tools connect agents to external systems — querying databases, calling APIs, reading documents, triggering workflows. Tool use is what separates an agent from a chatbot: the ability to act on the world, not just respond to it. Tools give agents reach. Context tells agents what to do with that reach.

How is memory different from context?

The state layer. Most frameworks distinguish between in-context memory (what fits in the current prompt window), short-term session memory (what persists within a conversation), and longer-term storage (vector databases, knowledge bases). Tools like Mem0 and Zep address parts of this. Session memory solves an important problem: it tracks what was said in a conversation. It does not store organizational knowledge — the canonical definitions, business rules, and institutional logic that agents need to answer questions correctly across sessions, users, and use cases. The distinction between session memory and the broader types of AI agent memory an enterprise actually needs is the single most common architectural confusion in agent design today.

What does observability actually tell you?

The monitoring layer. Logging, tracing, evaluation, alerting. LangSmith, Arize, and similar tools provide visibility into what agents are doing. Observability tells you what happened after the fact. It does not prevent the wrong thing from happening in the first place.

Five failure modes that expose the gap

When agents fail in production, the cause almost always falls into one of five categories. Each is a context problem, not a model problem.

Failure mode	What breaks	What the agent does	Business impact
Missing context	Required information is absent	Hallucinates or fabricates gaps	Incorrect decisions, eroded trust
Stale context	Outdated data or policies	Uses obsolete logic as current truth	Compliance risk, bad recommendations
Conflicting context	Multiple definitions exist	Defaults to one interpretation	Inconsistent answers across teams
Irrelevant context	Too much unfiltered data in window	Struggles to prioritize signal	Lower accuracy, higher latency
Permission-violated context	Access controls not enforced	Surfaces restricted data	Security breach, governance failure

These are not edge cases but the default outcome when agents operate without a governed context layer. A compliance agent using last quarter’s policy is stale context. An agent that surfaces rows from a table a user should not see is permission-violated context. Each failure mode is distinct, each has different consequences, and none of them is fixed by upgrading the model.

The two missing layers: context and control

To address these failures structurally, the enterprise AI agent stack needs two additional layers that most stack diagrams omit entirely.

What belongs in the enterprise context layer?

The enterprise context layer is the infrastructure that stores and serves governed business knowledge to agents at inference time. It is not a data catalog. It is not session memory. It is the persistent, cross-system layer that answers the question every agent needs answered before it can work reliably: what does this organization’s data actually mean?

A well-built context layer includes four interconnected components:

Enterprise Data Graph. A unified map of all data assets across the organization: their lineage, relationships, usage patterns, quality signals, and ownership. When an agent queries a metric, the data graph tells it which table is authoritative, how that table was derived, and whether the underlying data has quality issues.

Active Ontology. A living, continuously updated model of how the enterprise works: its entities, relationships, canonical business definitions, and decision logic. This is where ‘revenue’ gets a single, governed definition. Where ‘customer’ is disambiguated from ‘prospect’ and ‘account.’ Where business rules are encoded in a form agents can consume.

Enterprise Memory. The accumulated learning from agent interactions: every correction made by a human reviewer, every evaluation that flags a wrong answer, every annotation added by a domain expert. Enterprise memory is what makes the tenth agent substantially better than the first. Not because the model improved, but because the context did.

Enterprise Skills. Codified, reusable agent capabilities that encode how the organization’s agents should approach recurring task types, built once and available to all agents that need them.

How is the AI control plane different from the context layer?

The AI control plane is frequently confused with the context layer. They are different, and the distinction matters operationally.

The context layer answers: What should the agent know?

The control plane answers: What is the agent allowed to do?

The control plane enforces guardrails, manages role-based access controls, routes queries to appropriate models, runs evaluation pipelines, and generates audit logs. According to the IBM Global AI Adoption Index, data privacy concerns and a fundamental lack of governance tools remain the primary inhibitors preventing organizations from moving AI out of isolated pilot phases and into actual production environments.

When an agent attempts to access data a user does not have permission to see, the control plane stops it. When an agent’s output deviates from expected behavior, the control plane flags it.

This is also where AI agent memory governance lives: the rules and access controls that determine not just what each agent knows, but what each agent is allowed to retain, recall, or expose across sessions.

Deloitte’s 2026 State of AI in the Enterprise report found that only 21% of companies currently have a mature governance model for autonomous AI agents — even as nearly three-quarters plan to deploy agentic AI within two years. That gap is structural: organizations are building agents without the control plane infrastructure that makes autonomous action auditable and safe at scale.

When the context layer and the control plane are conflated, what you get is a system governed by whatever context happens to exist. Governance becomes reactive rather than structural.

Agents do not fail because they cannot reason. They fail because they do not know what your business means — and because no system exists to tell them authoritatively.

The complete enterprise AI agent stack

The full architecture, properly layered, looks like this. Layers 6 and 7 are highlighted because they are what every enterprise stack is currently missing.

Layer	Function	What it handles	Example tools	Enterprise-critical?
1. Foundation model	Core reasoning	Interpreting queries, generating outputs	GPT-4o, Claude 3.7, Gemini, Llama 3	Table stakes
2. Orchestration	Workflow coordination	Task decomposition, multi-step execution, routing	LangChain, LangGraph, CrewAI, AutoGen	Yes
3. Tools and integrations	System connections and actions	APIs, database queries, file access, triggers	MCP connectors, REST APIs, SQL engines	Yes
4. Memory and storage	Session state, vector retrieval	Short-term conversation context, embeddings	Mem0, Zep, pgvector, Pinecone	Partial
5. Observability	Monitoring and tracing	Logging, alerting, tracing agent actions	LangSmith, Arize, Datadog	Yes
6. Enterprise context layer	Governed business knowledge	Definitions, lineage, policies, decision traces	Atlan Context Engineering Studio	Critical – missing from most stacks
7. AI control plane	Governance and access enforcement	Guardrails, RBAC, evals, audit logs, model routing	Access policies, audit trails, model gateway	Critical – missing from most stacks

The context layer and the control plane are not optional enhancements for mature teams. They are the architectural prerequisites for production-grade agents. Without them, you cannot systematically improve agent quality, you cannot share context across multiple agents, and you cannot audit why an agent said what it said.

Why the context layer must be model-agnostic

Most large enterprises are already running multiple agents: often from different vendors, on different platforms, built by different teams. When each agent builds its own context store in isolation, you recreate the data silo problem at the agent layer. Revenue means one thing to the Sales Agent and something different to the Finance Agent. Two agents, same source data, different answers.

The fix is a shared context layer that all agents consume from a single source of governed truth. This requires the context layer to be interoperable — built on open standards that any authorized agent can connect to, regardless of which model or framework it uses.

The Model Context Protocol (MCP) is the emerging standard for this. It enables the context layer to serve governed, policy-embedded context to any authorized consumer, whether that is a Snowflake Cortex agent, an AWS Bedrock workflow, or a custom internal framework. Context portability is what makes the architecture durable as the agent ecosystem evolves.

When organizational context is locked into a single vendor’s proprietary system, it becomes a liability. When it is portable, open, and governed, it becomes a compounding advantage. Every definition refined, every annotation added, every edge case documented makes the shared context richer for all agents that consume it.

What agents actually need from a context layer

It helps to be specific about what the context layer must contain to be useful. AI agents in enterprise environments need five distinct types of context to function reliably. Most stacks provide none of them in governed form.

Data context. Which columns to use, how views are aggregated, which table is the authoritative source. Lives in data warehouses like Snowflake, Databricks, and BigQuery — but rarely in governed, machine-readable form.
Knowledge context. How the business actually works: editorial policies, escalation rules, one-off exceptions, leadership decisions. Lives in Confluence, Notion, Slack threads, and people’s heads. This is the hardest type to capture and the most often absent.
Meaning context (semantic). What business terms actually mean: how ‘Top 10’ is defined (by views? by ratings? by watch time?), what ‘customer’ includes, how ‘revenue’ is calculated. Lives in BI tools, semantic layers, and business glossaries.
User context. Who is asking and what decision they are trying to make. Marketing optimizes for views; editorial optimizes for watch time. The same question has a different correct answer depending on who is asking.
Operational context. What is happening right now: active experiments, seasonal adjustments, real-time quality signals, ongoing incidents. Lives in content management systems, product telemetry, and Slack channels.

Why context engineering is not prompt engineering

A common architectural mistake is treating context as a prompt problem. Teams invest in prompt templates, few-shot examples, and system instructions — then wonder why agents behave inconsistently across different users, queries, and edge cases. The distinction between context engineering and prompt engineering is the difference between shaping an individual interaction and building infrastructure every interaction depends on.

Prompt engineering operates at the model layer. It shapes individual interactions. Context engineering operates at the infrastructure layer. It ensures that every interaction, regardless of who initiates it or how it is phrased, draws on the same governed, accurate, current understanding of the business.

The agents that improve consistently over time are not the ones running on the latest model release. They are the ones running on continuously maintained context.

The compounding advantage of investing early

Each layer of context that is built, governed, and maintained becomes the foundation for the next. Column lineage feeds better column descriptions. Better descriptions feed more accurate metric definitions. Accurate metric definitions feed a bootstrapped ontology. A richer ontology produces better agents. Better agents generate feedback — corrections, annotations, evaluation failures — that enrich the context further.

The compounding effect is real. The tenth agent an organization deploys will be substantially better than the first. Not because the model improved. Because the context did.

Enterprises that invest in shared, governed context infrastructure early build an advantage that is genuinely hard to replicate. Models commoditize. Orchestration frameworks converge. The accumulated organizational knowledge encoded in a well-maintained context layer with enterprise memory is specific to the organization, and it compounds with every agent deployed.

How Atlan approaches the enterprise AI agent stack

The challenge

Enterprises ship agents on top of fragmented context. Metric definitions conflict across systems, business rules live in unmaintained documentation, and governance is appended rather than architected. Each new agent rebuilds from scratch, and none of them draws from a shared source of truth. When autonomy increases, these gaps compound into production failures the standard stack cannot explain.

The approach

Atlan treats the context layer as metadata-native enterprise infrastructure, assembled from the data systems already in use rather than rebuilt from scratch. Context Engineering Studio bootstraps context from lineage, query history, BI usage, and existing glossaries. The Atlan MCP server exposes governed context to agents across frameworks, so every agent in the enterprise draws from the same definitions, access policies, and lineage. Domain experts own meaning; platform teams own connectivity.

The outcome

Enterprises move from isolated agent pilots to a shared, governed foundation that every agent consumes. The work done to define a metric once is reused by every subsequent agent. Corrections and evaluations flow back into the context layer, so the system gets measurably better with use. The tenth agent compounds on the first, rather than starting from zero.

What the complete stack looks like at scale: DigiKey and Mastercard

DigiKey

DigiKey’s data organization needed infrastructure that could power discovery, AI governance, data quality, and an MCP server delivering context to AI models, all from the same metadata foundation. The team now treats Atlan as a context operating system rather than a catalog.

DigiKey activates metadata as a context operating system

"Atlan is much more than a catalog of catalogs. It's more of a context operating system… Atlan enabled us to easily activate metadata for everything from discovery in the marketplace to AI governance to data quality to an MCP server delivering context to AI models."

Sridher Arumugham, Chief Data & Analytics Officer

DigiKey

Mastercard

Mastercard’s data organization moved from privacy by design to data by design to context by design as it scaled AI initiatives across hundreds of millions of data assets. The team chose Atlan for a metadata lakehouse that could meet the contextual demands of AI at enterprise scale.

Mastercard moves to context by design at enterprise AI scale

"When you're working with AI, you need contextual data to interpret transactional data at the speed of transaction. So we have moved from privacy by design to data by design to now context by design. AI initiatives require more context than ever. Atlan's metadata lakehouse is configurable, intuitive, and able to scale to hundreds of millions of assets."

Andrew Reiskind, Chief Data Officer

Mastercard

Why a stack without context is a stack built for demos

The standard AI agent stack is a good description of how to get an agent to work. It is not a complete description of how to get an agent to work reliably, at scale, on real business data, with real governance requirements, and real consequences when it is wrong.

The missing layer is not an advanced feature for mature teams to add later. It is the architectural prerequisite for agents that earn production trust. Without a governed context layer, you cannot systematically improve agent quality. You cannot share context across multiple agents without rebuilding from scratch each time. You cannot audit why an agent said what it said.

Most teams discover this the hard way, after a production failure that reveals not a model gap but a context gap. The organizations that invest in a governed context layer before that moment are the ones that scale from one agent to twenty without rebuilding every time.

FAQs about AI agent stack

How do I tell if my AI agent failures are model problems or context problems?

Run a simple diagnostic: take the failed output and feed it back through a different frontier model with the same inputs. If the second model produces the same wrong answer or a differently wrong answer, you have a context problem — the underlying information the agent received was ambiguous, incomplete, or conflicting. If the second model gets it right, you may have a reasoning or prompt problem specific to the first model. In practice, the majority of production agent failures involving business metrics, policy interpretation, or data permissions are context problems that no model upgrade will fix.

Where should my team start if we already have agents running in production without a context layer?

Start with the highest-frequency failure case you already have on record. Pick one agent, one recurring wrong-answer pattern, and trace it back to which of the five context types is missing or conflicting: data, knowledge, meaning, user, or operational. Fix one, instrument the fix so corrections flow back into a persistent layer, then move to the next.

Why do agents provide different answers to the same question?

Mostly due to a conflicting context problem. Each department has a unique interpretation of the same metrics. If there isn’t a governing context layer that reconciles these differing interpretations based upon a single definition for the metrics, agents will default to one interpretation, and different agents or the same agent may default differently depending on the session. With a shared context layer, all agents are able to access a single authoritative definition eliminating the conflict.

Who should own the context layer — the data team, the ML platform team, or individual product teams?

Federated. Domain experts own the definitions (what “revenue” means, what “active customer” includes, how escalation policies work), and the platform team owns connectivity and enforcement. Centralizing ownership with one team creates a bottleneck; fully decentralizing it recreates the silo problem at the agent layer.

What should I look for when evaluating platforms that claim to provide an enterprise context layer?

Three things. First, interoperability — does the platform serve context through open standards like MCP, or lock it into one vendor’s proprietary format? Second, bootstrapping — can it derive an initial context layer from signals you already have (lineage, query history, BI usage, existing glossaries), or does it require manual build-out from scratch? Third, persistence and feedback — does the platform capture agent corrections, human reviews, and evaluation failures back into the context layer over time, or does that institutional memory get lost after each session?

If I have no existing context layer and want to build one, where do I start?

You almost certainly have more context than you think, just scattered and ungoverned. BI dashboards already encode metric definitions. SQL query histories reveal common analytical patterns. Lineage metadata describes data relationships. Business glossaries define terminology. Use those existing signals to bootstrap a first-draft ontology automatically, then refine it with domain experts and evaluate against known-answer questions pulled from existing dashboards.

Sources

Share this article

What a Complete AI Agent Stack Actually Looks Like for Enterprise

Key takeaways

What is an AI agent stack?

The two missing enterprise layers

Quick facts

The paradox of the modern AI agent stack

A better model does not fix a context failure. It amplifies it.

What does the standard AI agent stack cover?