Data Quality for AI: Why Bad Data Breaks Models

Emily Winks

Data Governance Expert

Updated:05/27/2026

Published:05/27/2026

18 min read

Watch Context Agents Live Get the Context Layer Ebook

Key takeaways

BI-ready data fails AI when agents cannot read source authority, canonical definitions, lineage, and freshness at retrieval.
Around 65% of AI agent failures trace to context drift: stale definitions accumulate while pipelines keep running unchanged.
Stronger models make bad context more dangerous: confident wrong answers survive review longer than weaker ones.
AI quality goes beyond BI checks: agents need meaning, source authority, lineage, freshness, and trust before retrieval.

What is data quality for AI?

Data quality for AI makes enterprise data trustworthy for autonomous agent use: source authority, canonical definitions, freshness, lineage, and ownership must all be readable at retrieval time. Classical row-level checks stay in place — AI adds a context requirement on top. Gartner projects organizations will abandon 60% of AI projects unsupported by AI-ready data through 2026, even when Snowflake, dbt, and Monte Carlo report green.

What data quality for AI requires beyond row-level validation

Source authority: the agent must know which of three ARR tables carries authority for the question being asked
Canonical definitions: resolve when the same metric name means different things across teams or pipelines
Definition-level freshness: catches drift between definition updates, not just table refresh timestamps
Answer-level lineage: traces wrong outputs back to the specific source and definition that caused them

Is your data estate AI-agent ready?

Assess Your Readiness

A finance team deploys a revenue analysis agent. Their data warehouse holds three ARR tables: arr_net maintained by sales ops, annual_recurring_revenue maintained by finance reporting, and recurring_revenue_total synced from the CRM. In testing, the agent answers cleanly because the finance lead pointed it at the correct source. In production, the agent picks whichever source responds first in the retrieval sequence. Three percent separates the highest and lowest figures. Neither analyst can trace which ARR calculation backed the answer that went into the board deck.

This is not a pipeline failure. Every table passes its quality checks. The failure is a data quality problem that BI tooling was never designed to catch: three valid sources exist in the same warehouse, and nothing in the data path tells the agent which one carries authority for the question being asked.

	BI quality threshold	AI quality threshold
What it verifies	Values support known reports with human review	Sources carry machine-readable context, meaning, lineage, freshness, certification, and ownership at inference time
Common failure mode	Missing rows, late loads, schema drift	Context drift: stale definitions and unowned glossary terms accumulate while pipelines keep running
Production symptom	Dashboard number differs from the report	The same ARR question returns different answers on different days and the team cannot trace why
Resolution path	Fix the pipeline, reload the table	Canonical definitions, source authority, answer-level lineage, freshness signals on meaning, and ownership metadata
Revenue agent example	Three ARR tables all pass quality checks	The agent picks different sources for the same question on different days

Why does BI-ready data fail AI?

BI survives with analyst judgment. When a finance team sees two conflicting ARR numbers, someone checks Slack threads, consults the metric review notes, and applies the institutional memory that tells them which table carries authority this quarter. That judgment never appears in a column. It lives in people, calendars, and tribal knowledge accumulated across dozens of quarterly reviews.

For an agent answering without human review, fit-for-purpose has to be in the data path itself. There is no analyst to consult. There is no Slack thread the agent can search. There is no memory of last quarter’s clarification unless that clarification was written into the metadata layer as a machine-readable fact. IBM puts it directly: “Data can be technically correct and fail the AI system using it” (IBM, 2026). A table can pass every row-level check, every freshness threshold, every schema validation, and still be the wrong source for the question the agent is answering.

This is the gap BI quality leaves for AI. The checks tell you values are correct. They do not tell you which source carries authority for a given question, which definition was current when the values were loaded, or whether the team that owns the asset is still maintaining it. Context-aware AI agents need all of this information in machine-readable form before they retrieve anything - not after.

How does bad context create hallucinations?

AI agent hallucinations in enterprise settings rarely come from a model inventing facts. They come from a model reasoning correctly from the wrong source. The source exists. The values are real. The retrieval looks valid. The problem is that the agent selected an asset whose meaning had drifted since the agent’s retrieval rules were written.

How does the drift timeline develop?

Consider the ARR scenario six weeks after the revenue agent goes live. Finance changed their mid-quarter contraction treatment as part of a methodology revision. Sales ops updated arr_net within two days. Finance reporting updated annual_recurring_revenue roughly ten days later. The CRM sync table, recurring_revenue_total, was never updated. No one filed a ticket. No alert fired. The pipeline kept running on schedule, passing every freshness check because the rows were arriving on time and the schema had not changed.

The agent continued answering ARR questions using whichever source its retrieval logic reached first. On days when recurring_revenue_total was fastest, it answered from the stale CRM definition. On days when arr_net was fastest, it answered correctly. The three-percent gap was not a bug anyone could find by checking row counts or load times. It was a context gap: the meaning of the metric had changed in two of three sources, and the agent had no way to read which definition was current.

Why is context drift harder to detect than missing data?

Around 65% of enterprise AI agent failures trace to context drift rather than missing or malformed data. Missing data leaves a signal. A failed row count check, a null where a value was expected, a schema mismatch - these are detectable. Drifted meaning often leaves none. The pipeline runs. The values load. The quality dashboard stays green. The failure only surfaces when an analyst challenges an answer and discovers the agent has been answering from a definition that changed weeks ago.

This is why context engineering for AI agents has emerged as a distinct discipline. Preventing context drift requires writing definition changes, ownership updates, and certification status into the metadata layer as machine-readable facts - not just updating the values in the table. The agent needs to read the meaning of a source, not just its freshness timestamp, before deciding whether to use it.

What data quality gaps does BI tooling leave for AI agents?

Why is validity not the same as source authority?

In the ARR scenario, arr_net, annual_recurring_revenue, and recurring_revenue_total are all valid tables. All three pass row-level checks. All three load on schedule. None of that tells the agent which one is authoritative for a board-level ARR question on the last day of the quarter. Source authority is the declaration - written into the metadata layer, not inferred from the data - that this specific asset is the certified source for this specific question type.

Without source authority as a machine-readable attribute, an agent operating across context engineering frameworks must guess. It might use the freshest table, the first table in its retrieval order, or the table whose schema most closely matches the query. None of these heuristics are the same as knowing which source the organization has designated as authoritative. Context engineering versus prompt engineering comes down to exactly this: prompts tell the model how to reason; context tells the model which sources to trust.

Why does answer-level lineage matter for debugging?

When a revenue agent returns a number three percent below what the CFO expected, the debugging path matters. Row-level lineage can tell you where a table’s values came from. It cannot tell you which table the agent used to answer a specific question, which definition was active at retrieval time, or which transformation step introduced the discrepancy.

Answer-level lineage traces the path from output back to source table, source definition, and transformation history for that specific inference event. Without it, debugging an agent failure means reconstructing the retrieval sequence from logs - if those logs exist - and comparing the definition the agent read against the definition that was supposed to be current. Decision traces for AI agents make this path auditable by design, not recoverable by forensics.

Why do freshness signals need to attach to the definition, not just the table?

A table freshness timestamp tells you when the last row arrived. It does not tell you when the definition governing those rows was last reviewed or updated. In the ARR scenario, recurring_revenue_total had a current freshness timestamp - rows were arriving on schedule - but a stale definition. The contraction treatment methodology had changed, the table’s values had not reflected the change, and nothing in the standard quality dashboard showed the discrepancy.

Definition-level freshness is a separate signal: the timestamp of the last review or update to the meaning, not the data. When an agent reads a source, it needs both signals. Current data loaded from a stale definition is not trustworthy for production AI workloads, and standard warehouse freshness checks cannot tell the difference. What are decision traces for AI agents explores how this signal feeds back into the audit layer.

How does ownership metadata prevent the wrong-source failure from recurring?

Ownership metadata assigns a named human to each data asset - not a team name, a named individual with an active record in the organization’s identity system. When an agent retrieves from the wrong source, ownership metadata gives the repair path a starting point: one escalation target, one person whose job it is to update the definition and re-certify the asset.

Without named ownership, the repair path is organizational archaeology. Who is responsible for recurring_revenue_total? Who approved the CRM sync configuration? Who should be notified when the contraction treatment methodology changes? These questions take days to answer in large organizations. With named ownership as a queryable attribute on the asset, the path from wrong answer to corrected definition is a direct line. AI agent memory governance depends on this escalation structure being built into the metadata layer before agents go to production.

Why do stronger models amplify bad data?

A common assumption when a revenue agent starts returning wrong answers is that the model needs to be upgraded. In practice, a stronger model makes bad context more dangerous, not less. Stronger models produce more fluent reasoning. When the agent uses the wrong source, the explanation around the wrong number is more polished, more internally consistent, and more likely to survive a cursory review. Analysts reading a well-constructed paragraph with correct-looking citations are less likely to challenge the underlying source than they would be reading a halting, uncertain output.

Hallucinations accounted for over a third of reviewed incidents in deployed LLM applications during recent evaluations of production AI systems. The distribution is not random. Failures concentrate in systems where model quality and context quality are not governed as separate problems - where teams upgraded the model assuming the improvement would compensate for gaps in the data layer.

Model quality and context quality must be governed independently. The work that prevents the wrong-source failure lives in the meaning, lineage, and trust signals the agent reads at retrieval time - not in the model’s ability to reason around whatever source it finds first. AI agent governance treats these as parallel tracks, not a single dial to turn up.

What does data quality for AI require?

How does canonical definition governance work for agents?

Canonical definition governance means the organization has designated one authoritative definition for each metric, written that definition into the metadata layer as a machine-readable attribute, and established a process for updating it when the underlying methodology changes. For agents, this is not optional. When multiple valid-looking definitions exist for the same metric name - annual recurring revenue calculated before contraction, after contraction, net of expansion - an agent without canonical definition governance will choose among them arbitrarily.

The enterprise context layer provides the infrastructure for this: a governed layer above the warehouse where canonical definitions are stored, versioned, and queryable by agents at retrieval time. Without this layer, definition governance exists only in documentation systems that agents cannot read.

Why does lineage have to reach the answer, not just the dashboard?

Dashboard lineage traces values to their source tables and transformation steps. This is sufficient for BI because a human analyst closes the gap between the dashboard and the business question. For agents, the gap is the answer itself. The agent is not returning a dashboard; it is returning a claim - “ARR is $142M” - that needs to be traceable back through source selection, definition choice, and transformation history.

Answer-level lineage extends the trace from output backward to every decision point the agent made during retrieval and reasoning. This is what makes agent harness failures debuggable in production: not reconstructing the sequence from incomplete logs, but reading the trace that was written at the time the answer was generated. Unstructured data AI lineage addresses the additional complexity when agents reason across structured and unstructured sources simultaneously.

What does a definition-level freshness signal catch that a table refresh timestamp does not?

A table refresh timestamp answers the question: when did the last row arrive? A definition-level freshness signal answers the question: when was the meaning of this asset last reviewed and confirmed as current? These are different facts, and they fail independently. The ARR scenario demonstrates this directly - recurring_revenue_total had a current refresh timestamp and a stale definition simultaneously.

Definition-level freshness is a human-in-the-loop signal. It requires someone with domain knowledge to confirm that the metric’s definition still reflects the organization’s current methodology. Automated pipelines cannot produce it. The active metadata AI agent memory model treats this signal as a first-class attribute, queryable alongside refresh timestamps and certification status during retrieval.

Why do trust signals have to be machine-readable?

An asset can be certified in a wiki, approved in a ticket, and documented in a Confluence page - and still be unreadable to an agent at inference time. Trust signals have to be machine-readable: attributes stored in the metadata layer that an agent can query as part of its retrieval logic. Certification status, review date, ownership assignment, usage policy, and access classification all need to exist as queryable attributes, not prose in a documentation system.

Context-aware AI agents use these signals to filter candidate sources before reasoning begins. An agent that cannot read certification status at retrieval time has no way to distinguish between an asset that is actively maintained and one that has been orphaned for six months. Context infrastructure for AI agents covers how this metadata layer is built and maintained across the enterprise data estate.

How do upstream quality signals flow to the agent?

Upstream quality signals - test results from Great Expectations, anomaly alerts from Monte Carlo, freshness checks from Soda - need to propagate into the metadata layer as queryable attributes, not just dashboard indicators. When an agent is evaluating whether to retrieve from a given asset, it should be able to read the most recent quality check result, the most recent anomaly flag, and the current freshness status as part of the retrieval decision, not after the fact.

Atlan’s context layer enterprise memory aggregates these signals from existing observability tools and exposes them as queryable attributes on each asset. The agent does not need to query Monte Carlo directly. It reads the consolidated quality signal from the metadata layer, alongside certification status and definition freshness, before deciding which source to use.

How does a feedback loop close output failures back to source?

When an agent returns a wrong answer and an analyst identifies the source of the error, that correction needs to flow back to the metadata layer - not just be fixed in the immediate output. If recurring_revenue_total was used when arr_net was authoritative, the metadata layer needs to record that event: which asset was used, which asset should have been used, and what the resolution was. The next time the agent encounters a similar retrieval decision, the corrected signal is available.

Decision traces for AI agents provide the structure for this feedback loop. Each inference event is logged with enough detail to identify the source selection decision, compare it against the correct choice, and write the correction back as a signal. Without this loop, agent systems require manual intervention every time a similar failure recurs. Gartner estimates that “poor data quality costs organizations an average of $12.9 million per year” (Gartner, 2020)^[4] - a figure that compounds when agent failures are not closed back to their source.

How Atlan approaches data quality for AI

Traditional data quality tooling is still necessary, but it stops short of what AI needs. A table can pass every pipeline and observability check and still mislead an agent if the agent cannot tell which source finance trusts, which definition is current, and whether the surrounding business context has changed.

Atlan provides the context layer that makes quality signals usable at retrieval time, alongside the meaning of the metric, the freshness of the definition, the source relationships behind the number, and the path back to the upstream issue if the answer is wrong. Instead of treating data quality for AI as a monitoring problem alone, it turns it into a context problem the agent can resolve before generation.

In the ARR example, that means the agent is not choosing from three plausible tables based on surface similarity or convenience. It is working from the right business definition, the right source for the question, and the right freshness signals in the same retrieval flow.

Why the context layer sets AI data quality

The ARR agent fails because three valid tables exist in the same warehouse without a context layer that tells the agent which source finance trusts, which definition is current after the latest pricing change, and how to interpret those signals before generation. With that layer in place, the same question on Monday and Thursday resolves through the same trusted source and the same active definition. Clean warehouse data provides usable values; the context layer tells the agent which values, definitions, and signals it should actually rely on.

Book a Demo

FAQs about data quality for AI

1. What is the difference between data quality for AI and data quality for BI?

BI quality assumes a human analyst reads the dashboard and supplies missing context from memory. AI quality has to make that context machine-readable so an agent can read it during retrieval. The same warehouse can support BI and break an agent because the rules a finance team carries in their heads are not stored in any column. Context engineering for AI agents is the discipline of making that implicit knowledge explicit.

2. Why does AI hallucinate when the underlying data is technically correct?

The model retrieves a source that passes every row check, then reasons from a definition that has shifted, an ownership gap, or a metric two teams calculate differently. The output looks supported because the source exists. The wrongness lives in the meaning behind the data, which is the layer most warehouses and quality tools do not store. LLM hallucinations in enterprise settings are predominantly context failures, not model failures.

3. Do existing data quality tools work for AI agents?

Tools like Monte Carlo and Great Expectations cover row-level and pipeline-level checks well, and that work stays valuable. On their own, they do not give agents machine-readable trust, certification, or definition-level freshness, which is what an agent needs to read during retrieval. AI workloads usually require a layer above existing observability that aggregates these signals and exposes them as queryable attributes on the asset. Data quality for AI agent harnesses covers the specific requirements when agents are retrieving across multiple systems simultaneously.

4. What is context drift in AI systems?

Context drift is the silent accumulation of stale definitions, schema changes that did not propagate, and unowned glossary terms while pipelines continue to run on time. Around 65% of enterprise AI agent failures trace to it. It spreads because nothing breaks at first. Dashboards keep producing numbers, and the failure only surfaces when someone challenges an answer. Context poisoning describes the more severe version, where drifted context corrupts agent memory across sessions.

5. Will upgrading to a stronger model fix data quality issues for AI?

A stronger model produces more fluent reasoning around the same source. If the agent uses the wrong source, the explanation around the wrong number is more polished and more likely to survive review. Model quality and context quality have to be governed as separate problems. The work that prevents the failure lives in the meaning, lineage, and trust signals the agent reads at retrieval time. AI agent observability is how teams verify the context layer is working independently of model upgrades.

6. What is the role of metadata in data quality for AI?

Metadata is what makes a data asset legible to an agent. It carries the certified status, the canonical definition, the lineage from source to inference, the freshness signal on meaning, and the named owner. An agent without this metadata can read values but cannot tell which valid-looking source it is allowed to trust for the question being asked. What is Atlan MCP explains how agents query this metadata layer in real time through the Model Context Protocol. The metadata layer for AI overview covers the full architectural picture.

Sources

Gartner. “Gartner Predicts 60% of Organizations Will Abandon AI Projects Due to Lack of AI-Ready Data Through 2026.” Gartner, 2025. https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk
IBM. “The cost of poor data quality: Why data quality is more important than ever.” IBM, 2026. https://www.ibm.com/think/insights/data-quality
IBM. “AI data quality: How data quality affects AI outcomes.” IBM. https://www.ibm.com/think/topics/ai-data-quality
Gartner. “How to Stop Data Quality Undermining Your Business.” Gartner, 2020. https://www.gartner.com/smarterwithgartner/how-to-stop-data-quality-from-undermining-your-business

Share this article

Atlan is the Context Layer for AI — a Leader in the Gartner Magic Quadrant for D&A Governance (2026) and the Forrester Wave for Data Governance (Q3 2025). Atlan unifies your data, business knowledge, and the meaning behind your terms into one Enterprise Data Graph that gives every team and every AI agent the trusted context they need. Trusted by Mastercard, Workday, General Motors, CME Group, HubSpot, FOX, Virgin Media O2, Elastic, and 400+ enterprises representing $10T+ in market cap.

Book a Demo See Context Studio Live