AI Agent Planning: Why Context Quality is the Key to Accuracy

Q: 3. When should planning surface to humans for review?

For routine queries, autonomous execution is fine. For decisions above a governance threshold — deal exceptions, contract creation, policy modifications, data access changes — surface the plan before execution.

Emily Winks

Data Governance Expert

Updated:05/15/2026

Published:05/15/2026

13 min read

Watch Context Agents Live Get the Context Layer Ebook

Key takeaways

Planning quality depends on context quality, not just how well an agent reasons through tasks.
Weak context creates confident wrong answers and unnecessary tool calls from uncertain agents.
Snowflake saw 39% fewer tool calls after adding organizational ontology — same model, better context.
High-stakes decisions still need human review because some business judgment lives outside systems.

What is AI agent planning?

AI agent planning is how an agent decomposes a request into subtasks, selects which tools to call and in what order, and executes step by step. Snowflake Engineering's 2026 research documented a 39% reduction in tool calls and 20% accuracy improvement when agents received an organizational ontology.

The three pillars of agent planning:

Decomposition: breaking a complex request into subtasks the agent can act on
Tool selection: deciding which tools to call and in what order
Execution monitoring: observing results and adjusting the plan accordingly

Assess Your Context Readiness

Assess Your Readiness

AI agent planning is the process by which an agent takes a user request, breaks it into steps, selects which tools to call, and executes those steps in sequence. Most enterprise teams have learned that planning quality does not come from the model alone. It comes from the context the model is planning against. A state-of-the-art agent given ambiguous context will build a confident, wrong plan. The same agent given precise organizational context will build a tight, accurate one.

What it covers	Most discussed paradigms	Key dependency	Observed inefficiency	Measured improvement	When humans should review
Decomposition, tool selection, execution monitoring	ReAct, Chain-of-Thought, Tree-of-Thought	Context quality	Unnecessary tool calls from uncertain agents	39% fewer tool calls (Snowflake, 2026)	High-stakes decisions above governance threshold

This page explains how planning paradigms work, why context poverty produces over-calling and hallucination, and what the evidence says about governed context as the fix.

Get the blueprint for implementing AI context graphs across your enterprise.

Get the Stack Guide

How do chain-of-thought, tree-of-thought, and reactive planning work?

Planning paradigms differ in how the agent structures its reasoning before acting. The choice of paradigm shapes how the agent handles ambiguity, but none of them compensate for missing context.

Paradigm	How it works	Strengths	Context dependence
Reactive (ReAct)	The agent interleaves reasoning steps with tool calls, observing results after each action and adjusting the next step accordingly	Fast for well-defined tasks; adapts to real-time feedback from tool responses	High — each reasoning step inherits whatever context is in the prompt; bad context produces bad reasoning at every step
Chain-of-Thought	The agent generates an explicit reasoning chain before taking any action, decomposing the problem into a sequence of logical steps	Reduces errors on multi-step problems; makes reasoning auditable	High — the chain is only as good as the definitions and constraints the agent is working with; ambiguous terms produce plausible but wrong chains
Tree-of-Thought	The agent explores multiple reasoning branches simultaneously, evaluates each branch, and selects the most promising path	Better performance on problems with high uncertainty or multiple valid approaches	Very high — branching multiplies the context dependency; each branch can diverge confidently in the wrong direction if foundational definitions are unclear

For data leaders: a practical four-layer architecture from metadata foundation to agent orchestration.

Get the CIO Guide

Why is over-calling a symptom of context poverty?

When an agent does not know which table is authoritative, it queries several. When it cannot tell which definition of “pipeline” applies to the CFO’s question, it hedges by calling multiple tools. The result is a plan that is technically executing but doing unnecessary work at every step.

Snowflake Engineering’s 2026 research on the agent context layer documented the downstream effect of this directly. Adding an organizational ontology — structured definitions of business entities and their relationships — to the same underlying model produced a 39% reduction in tool calls and a 20% improvement in answer accuracy. The model did not change. The context did.

Over-calling also compounds the hallucination risk. Each tool call produces a response that goes back into the agent’s context window. More tool calls mean more intermediate results to reconcile. When those results conflict — because the agent queried overlapping sources — the agent must choose between them without a reliable signal for which to trust. The result is a confident synthesis of contradictory data. Better context means fewer calls, fewer conflicts, and fewer opportunities for the agent to construct a plausible but wrong answer.

Why planning and context are complementary, not substitutes

A common assumption in early agentic deployments is that a more capable planning paradigm can compensate for weak context. The evidence does not support this. A more sophisticated planning paradigm applied to ambiguous context produces more sophisticated wrong plans.

“Better reasoning without context still fails — it just fails with more steps and more confidence.”

The relationship between planning quality and context quality is additive, not compensatory. The 2x2 below illustrates the four possible combinations:

Scenario	Planning quality	Context quality	Real-world outcome
Strong model, strong context	High	High	Tight plans, accurate answers, minimal tool calls — production-ready
Strong model, weak context	High	Low	Sophisticated wrong answers; agent reasons confidently to incorrect conclusions
Weak model, strong context	Low	High	Simple plans that work because the agent has reliable ground truth to work from
Weak model, weak context	Low	Low	Frequent failures, high retry rates, low trust — pilots that never reach production

The practical implication is that enterprises that upgrade the model without addressing context quality will move from the bottom-left quadrant to the top-left quadrant. They will get more sophisticated failures. The path to the top-right quadrant requires both. But for most production deployments, context quality is the binding constraint — it is what determines whether the plan the agent builds maps to the actual state of the business.

When should high-stakes planning surface for human review?

For routine analytical queries — summarize this report, pull last quarter’s numbers, classify this support ticket — autonomous execution is appropriate. The stakes are low, the reversibility is high, and the cost of an occasional error is manageable.

The threshold shifts when the plan’s execution produces an irreversible or high-consequence outcome. Deal exceptions, contract creation, policy modifications, changes to data access permissions, budget reallocations above a defined threshold — these are decisions where some business judgment lives outside the systems the agent can query. The agent may have access to every relevant metric and still lack the organizational context needed to weigh them correctly.

A practical governance pattern is to surface the decision trace, not just the conclusion, for human review on any plan above the threshold. The reviewer sees what the agent checked, what it concluded at each step, and what action it proposed. That visibility makes the review fast and the approval meaningful. It also creates an audit trail that satisfies the question every regulator now asks: how do you know the AI’s decision was based on current, approved information?

In multi-agent environments, the stakes compound. One agent’s planning output becomes another agent’s context input. A wrong plan executed autonomously by the first agent can cascade as a wrong premise for every downstream agent reading from the same shared layer. Surfacing the plan for human review before execution is the point in the pipeline where governance has the highest leverage.

How Atlan approaches planning through governed context

The planning problem is a context problem. Atlan’s approach is to solve it at the infrastructure layer — making context available, accurate, and governed before any agent starts to plan.

The Context Engineering Studio is where this happens. It combines several capabilities that directly address the planning failures described above:

Enterprise Data Graph: A structured representation of every data asset, its relationships, its business definitions, and its ownership. When an agent plans against the Enterprise Data Graph, it knows which table is authoritative for revenue, which definition of “pipeline” applies in which context, and which team owns the answer. Ambiguity — the root cause of over-calling — is resolved before the plan starts.
Context agents: Specialized agents that run in advance of query-answering agents, retrieving the specific organizational context a downstream agent needs for a given request. The query-answering agent receives a pre-populated context package rather than a raw schema.
Atlan MCP server: Exposes governed context directly to agent frameworks via the Model Context Protocol, making Atlan’s Enterprise Data Graph available as a structured context source without custom integration work.
AI governance workflows: Approval routing, decision traces, and provenance tracking built into the platform, so that plans above the governance threshold surface for human review without requiring custom tooling.

The outcome

Tighter plans. Fewer unnecessary tool calls. Agents that know what they do not know and ask rather than hallucinate. And a shared context layer that compounds in accuracy as each agent’s certified learnings propagate to every downstream agent reading from the same layer.

Learn how context engineering drove 5x AI accuracy in real customer systems.

Download E-book

How enterprises ground planning in governed context

Workday

“We built a revenue analysis agent and it couldn’t answer one question. We started to realize we were missing this translation layer. All of the work that we did to get to a shared language amongst people at Workday can be leveraged by AI via Atlan’s MCP server.” — Joe DosSantos, VP Enterprise Data & Analytics, Workday

Workday’s experience captures the core planning failure precisely. The agent had access to the right data. It lacked the translation layer — the organizational ontology — that would have told it how to interpret that data in Workday’s specific business context. Adding Atlan’s governed context layer via the MCP server gave the agent the shared language that Workday’s human teams had spent years building. The planning problem resolved because the context problem resolved.

Mastercard

“When you’re working with AI, you need contextual data to interpret transactional data at the speed of transaction (within milliseconds). So we have moved from privacy by design to data by design to now context by design. We needed a tool that could scale with us. We chose Atlan, a platform that’s configurable, intuitive, and able to scale with our 100M+ data assets.” — Andrew Reiskind, Chief Data Officer, Mastercard

Mastercard’s framing — context by design — is a useful summary of the shift required. Planning that works at transaction speed, at Mastercard’s scale, requires context that is already structured, governed, and available before the agent asks. You cannot resolve ambiguity at inference time when inference must complete in milliseconds. The context must be right before the plan starts.

Why enterprise planning starts with context, not reasoning

The teams shipping trusted agents in 2026 are not the teams with the most sophisticated planning paradigms. They are the teams that solved the context problem first. Chain-of-thought, ReAct, and tree-of-thought are all capable paradigms. None of them produce reliable plans when the foundational definitions are wrong, ambiguous, or missing.

Planning without AI Agent context is planning to the wrong destination. The agent may execute every step correctly and still deliver an answer that contradicts what the business actually intended. Governance is not an obstacle to planning capability — it is the infrastructure that makes planning capability trustworthy. The enterprises that understand this distinction are the ones that have moved from pilots to production.

For teams evaluating where to invest, the research is consistent: context quality is the binding constraint. A tighter context layer produces tighter plans. Tighter plans produce fewer errors, fewer unnecessary tool calls, and answers that the business can act on with confidence.

Book a Demo

FAQs about AI agent planning

1. Why do agents over-call tools? Is that a planning failure?

Over-calling is almost never a pure planning failure. It is context poverty in disguise. The agent does not know which table is authoritative, so it queries multiple tables. It does not know which definition applies, so it hedges by checking several sources. Better context means the agent arrives at each step already knowing what it needs, and calls only the tools that will move the plan forward. Snowflake’s research documented a 39% reduction in tool calls simply by adding an organizational ontology to the same model. The planning paradigm did not change. The context did.

2. Can a better model fix bad planning caused by weak context?

No. A smarter model will produce more sophisticated wrong answers. If the agent does not know which definition of “pipeline” the CFO uses, a better model will reason more confidently to the wrong conclusion. Context quality is a prerequisite for planning quality, not a nice-to-have. The only way to fix a context problem is to fix the context. Upgrading the model without addressing the context layer moves you from simple wrong answers to elaborate wrong answers — which are harder to detect and more dangerous to act on.

3. When should planning surface to humans for review?

For routine queries, autonomous execution is fine. The cost of an occasional error is low and the reversibility is high. The threshold shifts when the plan’s execution produces an irreversible or high-consequence outcome. Deal exceptions, contract creation, policy modifications, data access changes, budget reallocations above a defined threshold — these are decisions where some business judgment lives outside the systems the agent can query. Surface the plan before execution, show the reviewer the decision trace (what the agent checked, what it concluded at each step), and get explicit approval. That visibility makes the review fast and creates the audit trail that governance requires.

4. How much does organizational context improve planning efficiency?

Research from Snowflake shows approximately 39% reduction in unnecessary tool calls when agents receive semantic views and organizational context, alongside a 20% improvement in answer accuracy. The same model, different context, substantially different outcomes. The efficiency gain compounds across a multi-agent system: every additional agent benefits from the context layer earlier agents helped build, and every certified correction propagates to all downstream agents without requiring any per-agent work.

5. Is planning different from reasoning? Are they the same thing?

Reasoning is how the agent thinks. Planning is how it decides what to do. An agent can reason brilliantly about a problem it misunderstands. Chain-of-thought, ReAct, and tree-of-thought are reasoning paradigms — they shape how the agent generates and evaluates steps. Planning is the output: the sequence of actions the agent commits to based on its reasoning. You need both reasoning capability and context quality for reliable plans. Strong reasoning over weak context produces a well-structured wrong plan. Weak reasoning over strong context produces a simple right plan. The goal is strong reasoning over strong context — but for most enterprise deployments, context is the binding constraint.

Sources

Share this article

Atlan is the Context Layer for AI — a Leader in the Gartner Magic Quadrant for D&A Governance (2026) and the Forrester Wave for Data Governance (Q3 2025). Atlan unifies your data, business knowledge, and the meaning behind your terms into one Enterprise Data Graph that gives every team and every AI agent the trusted context they need. Trusted by Mastercard, Workday, General Motors, CME Group, HubSpot, FOX, Virgin Media O2, Elastic, and 400+ enterprises representing $10T+ in market cap.

Book a Demo Watch Context Agents Live