A finance team deploys a revenue analysis agent. Their data warehouse holds three ARR tables: arr_net maintained by sales ops, annual_recurring_revenue maintained by finance reporting, and recurring_revenue_total synced from the CRM. In testing, the agent answers cleanly because the finance lead pointed it at the correct source. In production, the agent picks whichever source responds first in the retrieval sequence. Three percent separates the highest and lowest figures. Neither analyst can trace which ARR calculation backed the answer that went into the board deck.
This is not a pipeline failure. Every table passes its quality checks. The failure is a data quality problem that BI tooling was never designed to catch: three valid sources exist in the same warehouse, and nothing in the data path tells the agent which one carries authority for the question being asked.
| BI quality threshold | AI quality threshold | |
|---|---|---|
| What it verifies | Values support known reports with human review | Sources carry machine-readable context, meaning, lineage, freshness, certification, and ownership at inference time |
| Common failure mode | Missing rows, late loads, schema drift | Context drift: stale definitions and unowned glossary terms accumulate while pipelines keep running |
| Production symptom | Dashboard number differs from the report | The same ARR question returns different answers on different days and the team cannot trace why |
| Resolution path | Fix the pipeline, reload the table | Canonical definitions, source authority, answer-level lineage, freshness signals on meaning, and ownership metadata |
| Revenue agent example | Three ARR tables all pass quality checks | The agent picks different sources for the same question on different days |
Why does BI-ready data fail AI?
Permalink to “Why does BI-ready data fail AI?”BI survives with analyst judgment. When a finance team sees two conflicting ARR numbers, someone checks Slack threads, consults the metric review notes, and applies the institutional memory that tells them which table carries authority this quarter. That judgment never appears in a column. It lives in people, calendars, and tribal knowledge accumulated across dozens of quarterly reviews.
For an agent answering without human review, fit-for-purpose has to be in the data path itself. There is no analyst to consult. There is no Slack thread the agent can search. There is no memory of last quarter’s clarification unless that clarification was written into the metadata layer as a machine-readable fact. IBM puts it directly: “Data can be technically correct and fail the AI system using it” (IBM, 2026). A table can pass every row-level check, every freshness threshold, every schema validation, and still be the wrong source for the question the agent is answering.
This is the gap BI quality leaves for AI. The checks tell you values are correct. They do not tell you which source carries authority for a given question, which definition was current when the values were loaded, or whether the team that owns the asset is still maintaining it. Context-aware AI agents need all of this information in machine-readable form before they retrieve anything - not after.
How does bad context create hallucinations?
Permalink to “How does bad context create hallucinations?”AI agent hallucinations in enterprise settings rarely come from a model inventing facts. They come from a model reasoning correctly from the wrong source. The source exists. The values are real. The retrieval looks valid. The problem is that the agent selected an asset whose meaning had drifted since the agent’s retrieval rules were written.
How does the drift timeline develop?
Permalink to “How does the drift timeline develop?”Consider the ARR scenario six weeks after the revenue agent goes live. Finance changed their mid-quarter contraction treatment as part of a methodology revision. Sales ops updated arr_net within two days. Finance reporting updated annual_recurring_revenue roughly ten days later. The CRM sync table, recurring_revenue_total, was never updated. No one filed a ticket. No alert fired. The pipeline kept running on schedule, passing every freshness check because the rows were arriving on time and the schema had not changed.
The agent continued answering ARR questions using whichever source its retrieval logic reached first. On days when recurring_revenue_total was fastest, it answered from the stale CRM definition. On days when arr_net was fastest, it answered correctly. The three-percent gap was not a bug anyone could find by checking row counts or load times. It was a context gap: the meaning of the metric had changed in two of three sources, and the agent had no way to read which definition was current.
Why is context drift harder to detect than missing data?
Permalink to “Why is context drift harder to detect than missing data?”Around 65% of enterprise AI agent failures trace to context drift rather than missing or malformed data. Missing data leaves a signal. A failed row count check, a null where a value was expected, a schema mismatch - these are detectable. Drifted meaning often leaves none. The pipeline runs. The values load. The quality dashboard stays green. The failure only surfaces when an analyst challenges an answer and discovers the agent has been answering from a definition that changed weeks ago.
This is why context engineering for AI agents has emerged as a distinct discipline. Preventing context drift requires writing definition changes, ownership updates, and certification status into the metadata layer as machine-readable facts - not just updating the values in the table. The agent needs to read the meaning of a source, not just its freshness timestamp, before deciding whether to use it.
What data quality gaps does BI tooling leave for AI agents?
Permalink to “What data quality gaps does BI tooling leave for AI agents?”Why is validity not the same as source authority?
Permalink to “Why is validity not the same as source authority?”In the ARR scenario, arr_net, annual_recurring_revenue, and recurring_revenue_total are all valid tables. All three pass row-level checks. All three load on schedule. None of that tells the agent which one is authoritative for a board-level ARR question on the last day of the quarter. Source authority is the declaration - written into the metadata layer, not inferred from the data - that this specific asset is the certified source for this specific question type.
Without source authority as a machine-readable attribute, an agent operating across context engineering frameworks must guess. It might use the freshest table, the first table in its retrieval order, or the table whose schema most closely matches the query. None of these heuristics are the same as knowing which source the organization has designated as authoritative. Context engineering versus prompt engineering comes down to exactly this: prompts tell the model how to reason; context tells the model which sources to trust.
Why does answer-level lineage matter for debugging?
Permalink to “Why does answer-level lineage matter for debugging?”When a revenue agent returns a number three percent below what the CFO expected, the debugging path matters. Row-level lineage can tell you where a table’s values came from. It cannot tell you which table the agent used to answer a specific question, which definition was active at retrieval time, or which transformation step introduced the discrepancy.
Answer-level lineage traces the path from output back to source table, source definition, and transformation history for that specific inference event. Without it, debugging an agent failure means reconstructing the retrieval sequence from logs - if those logs exist - and comparing the definition the agent read against the definition that was supposed to be current. Decision traces for AI agents make this path auditable by design, not recoverable by forensics.
Why do freshness signals need to attach to the definition, not just the table?
Permalink to “Why do freshness signals need to attach to the definition, not just the table?”A table freshness timestamp tells you when the last row arrived. It does not tell you when the definition governing those rows was last reviewed or updated. In the ARR scenario, recurring_revenue_total had a current freshness timestamp - rows were arriving on schedule - but a stale definition. The contraction treatment methodology had changed, the table’s values had not reflected the change, and nothing in the standard quality dashboard showed the discrepancy.
Definition-level freshness is a separate signal: the timestamp of the last review or update to the meaning, not the data. When an agent reads a source, it needs both signals. Current data loaded from a stale definition is not trustworthy for production AI workloads, and standard warehouse freshness checks cannot tell the difference. What are decision traces for AI agents explores how this signal feeds back into the audit layer.
How does ownership metadata prevent the wrong-source failure from recurring?
Permalink to “How does ownership metadata prevent the wrong-source failure from recurring?”Ownership metadata assigns a named human to each data asset - not a team name, a named individual with an active record in the organization’s identity system. When an agent retrieves from the wrong source, ownership metadata gives the repair path a starting point: one escalation target, one person whose job it is to update the definition and re-certify the asset.
Without named ownership, the repair path is organizational archaeology. Who is responsible for recurring_revenue_total? Who approved the CRM sync configuration? Who should be notified when the contraction treatment methodology changes? These questions take days to answer in large organizations. With named ownership as a queryable attribute on the asset, the path from wrong answer to corrected definition is a direct line. AI agent memory governance depends on this escalation structure being built into the metadata layer before agents go to production.
Why do stronger models amplify bad data?
Permalink to “Why do stronger models amplify bad data?”A common assumption when a revenue agent starts returning wrong answers is that the model needs to be upgraded. In practice, a stronger model makes bad context more dangerous, not less. Stronger models produce more fluent reasoning. When the agent uses the wrong source, the explanation around the wrong number is more polished, more internally consistent, and more likely to survive a cursory review. Analysts reading a well-constructed paragraph with correct-looking citations are less likely to challenge the underlying source than they would be reading a halting, uncertain output.
Hallucinations accounted for over a third of reviewed incidents in deployed LLM applications during recent evaluations of production AI systems. The distribution is not random. Failures concentrate in systems where model quality and context quality are not governed as separate problems - where teams upgraded the model assuming the improvement would compensate for gaps in the data layer.
Model quality and context quality must be governed independently. The work that prevents the wrong-source failure lives in the meaning, lineage, and trust signals the agent reads at retrieval time - not in the model’s ability to reason around whatever source it finds first. AI agent governance treats these as parallel tracks, not a single dial to turn up.
What does data quality for AI require?
Permalink to “What does data quality for AI require?”How does canonical definition governance work for agents?
Permalink to “How does canonical definition governance work for agents?”Canonical definition governance means the organization has designated one authoritative definition for each metric, written that definition into the metadata layer as a machine-readable attribute, and established a process for updating it when the underlying methodology changes. For agents, this is not optional. When multiple valid-looking definitions exist for the same metric name - annual recurring revenue calculated before contraction, after contraction, net of expansion - an agent without canonical definition governance will choose among them arbitrarily.
The enterprise context layer provides the infrastructure for this: a governed layer above the warehouse where canonical definitions are stored, versioned, and queryable by agents at retrieval time. Without this layer, definition governance exists only in documentation systems that agents cannot read.
Why does lineage have to reach the answer, not just the dashboard?
Permalink to “Why does lineage have to reach the answer, not just the dashboard?”Dashboard lineage traces values to their source tables and transformation steps. This is sufficient for BI because a human analyst closes the gap between the dashboard and the business question. For agents, the gap is the answer itself. The agent is not returning a dashboard; it is returning a claim - “ARR is $142M” - that needs to be traceable back through source selection, definition choice, and transformation history.
Answer-level lineage extends the trace from output backward to every decision point the agent made during retrieval and reasoning. This is what makes agent harness failures debuggable in production: not reconstructing the sequence from incomplete logs, but reading the trace that was written at the time the answer was generated. Unstructured data AI lineage addresses the additional complexity when agents reason across structured and unstructured sources simultaneously.
What does a definition-level freshness signal catch that a table refresh timestamp does not?
Permalink to “What does a definition-level freshness signal catch that a table refresh timestamp does not?”A table refresh timestamp answers the question: when did the last row arrive? A definition-level freshness signal answers the question: when was the meaning of this asset last reviewed and confirmed as current? These are different facts, and they fail independently. The ARR scenario demonstrates this directly - recurring_revenue_total had a current refresh timestamp and a stale definition simultaneously.
Definition-level freshness is a human-in-the-loop signal. It requires someone with domain knowledge to confirm that the metric’s definition still reflects the organization’s current methodology. Automated pipelines cannot produce it. The active metadata AI agent memory model treats this signal as a first-class attribute, queryable alongside refresh timestamps and certification status during retrieval.
Why do trust signals have to be machine-readable?
Permalink to “Why do trust signals have to be machine-readable?”An asset can be certified in a wiki, approved in a ticket, and documented in a Confluence page - and still be unreadable to an agent at inference time. Trust signals have to be machine-readable: attributes stored in the metadata layer that an agent can query as part of its retrieval logic. Certification status, review date, ownership assignment, usage policy, and access classification all need to exist as queryable attributes, not prose in a documentation system.
Context-aware AI agents use these signals to filter candidate sources before reasoning begins. An agent that cannot read certification status at retrieval time has no way to distinguish between an asset that is actively maintained and one that has been orphaned for six months. Context infrastructure for AI agents covers how this metadata layer is built and maintained across the enterprise data estate.
How do upstream quality signals flow to the agent?
Permalink to “How do upstream quality signals flow to the agent?”Upstream quality signals - test results from Great Expectations, anomaly alerts from Monte Carlo, freshness checks from Soda - need to propagate into the metadata layer as queryable attributes, not just dashboard indicators. When an agent is evaluating whether to retrieve from a given asset, it should be able to read the most recent quality check result, the most recent anomaly flag, and the current freshness status as part of the retrieval decision, not after the fact.
Atlan’s context layer enterprise memory aggregates these signals from existing observability tools and exposes them as queryable attributes on each asset. The agent does not need to query Monte Carlo directly. It reads the consolidated quality signal from the metadata layer, alongside certification status and definition freshness, before deciding which source to use.
How does a feedback loop close output failures back to source?
Permalink to “How does a feedback loop close output failures back to source?”When an agent returns a wrong answer and an analyst identifies the source of the error, that correction needs to flow back to the metadata layer - not just be fixed in the immediate output. If recurring_revenue_total was used when arr_net was authoritative, the metadata layer needs to record that event: which asset was used, which asset should have been used, and what the resolution was. The next time the agent encounters a similar retrieval decision, the corrected signal is available.
Decision traces for AI agents provide the structure for this feedback loop. Each inference event is logged with enough detail to identify the source selection decision, compare it against the correct choice, and write the correction back as a signal. Without this loop, agent systems require manual intervention every time a similar failure recurs. Gartner estimates that “poor data quality costs organizations an average of $12.9 million per year” (Gartner, 2020)[4] - a figure that compounds when agent failures are not closed back to their source.
How Atlan approaches data quality for AI
Permalink to “How Atlan approaches data quality for AI”Traditional data quality tooling is still necessary, but it stops short of what AI needs. A table can pass every pipeline and observability check and still mislead an agent if the agent cannot tell which source finance trusts, which definition is current, and whether the surrounding business context has changed.
Atlan provides the context layer that makes quality signals usable at retrieval time, alongside the meaning of the metric, the freshness of the definition, the source relationships behind the number, and the path back to the upstream issue if the answer is wrong. Instead of treating data quality for AI as a monitoring problem alone, it turns it into a context problem the agent can resolve before generation.
In the ARR example, that means the agent is not choosing from three plausible tables based on surface similarity or convenience. It is working from the right business definition, the right source for the question, and the right freshness signals in the same retrieval flow.
Why the context layer sets AI data quality
Permalink to “Why the context layer sets AI data quality”The ARR agent fails because three valid tables exist in the same warehouse without a context layer that tells the agent which source finance trusts, which definition is current after the latest pricing change, and how to interpret those signals before generation. With that layer in place, the same question on Monday and Thursday resolves through the same trusted source and the same active definition. Clean warehouse data provides usable values; the context layer tells the agent which values, definitions, and signals it should actually rely on.
FAQs about data quality for AI
Permalink to “FAQs about data quality for AI”1. What is the difference between data quality for AI and data quality for BI?
Permalink to “1. What is the difference between data quality for AI and data quality for BI?”BI quality assumes a human analyst reads the dashboard and supplies missing context from memory. AI quality has to make that context machine-readable so an agent can read it during retrieval. The same warehouse can support BI and break an agent because the rules a finance team carries in their heads are not stored in any column. Context engineering for AI agents is the discipline of making that implicit knowledge explicit.
2. Why does AI hallucinate when the underlying data is technically correct?
Permalink to “2. Why does AI hallucinate when the underlying data is technically correct?”The model retrieves a source that passes every row check, then reasons from a definition that has shifted, an ownership gap, or a metric two teams calculate differently. The output looks supported because the source exists. The wrongness lives in the meaning behind the data, which is the layer most warehouses and quality tools do not store. LLM hallucinations in enterprise settings are predominantly context failures, not model failures.
3. Do existing data quality tools work for AI agents?
Permalink to “3. Do existing data quality tools work for AI agents?”Tools like Monte Carlo and Great Expectations cover row-level and pipeline-level checks well, and that work stays valuable. On their own, they do not give agents machine-readable trust, certification, or definition-level freshness, which is what an agent needs to read during retrieval. AI workloads usually require a layer above existing observability that aggregates these signals and exposes them as queryable attributes on the asset. Data quality for AI agent harnesses covers the specific requirements when agents are retrieving across multiple systems simultaneously.
4. What is context drift in AI systems?
Permalink to “4. What is context drift in AI systems?”Context drift is the silent accumulation of stale definitions, schema changes that did not propagate, and unowned glossary terms while pipelines continue to run on time. Around 65% of enterprise AI agent failures trace to it. It spreads because nothing breaks at first. Dashboards keep producing numbers, and the failure only surfaces when someone challenges an answer. Context poisoning describes the more severe version, where drifted context corrupts agent memory across sessions.
5. Will upgrading to a stronger model fix data quality issues for AI?
Permalink to “5. Will upgrading to a stronger model fix data quality issues for AI?”A stronger model produces more fluent reasoning around the same source. If the agent uses the wrong source, the explanation around the wrong number is more polished and more likely to survive review. Model quality and context quality have to be governed as separate problems. The work that prevents the failure lives in the meaning, lineage, and trust signals the agent reads at retrieval time. AI agent observability is how teams verify the context layer is working independently of model upgrades.
6. What is the role of metadata in data quality for AI?
Permalink to “6. What is the role of metadata in data quality for AI?”Metadata is what makes a data asset legible to an agent. It carries the certified status, the canonical definition, the lineage from source to inference, the freshness signal on meaning, and the named owner. An agent without this metadata can read values but cannot tell which valid-looking source it is allowed to trust for the question being asked. What is Atlan MCP explains how agents query this metadata layer in real time through the Model Context Protocol. The metadata layer for AI overview covers the full architectural picture.
Sources
Permalink to “Sources”- Gartner. “Gartner Predicts 60% of Organizations Will Abandon AI Projects Due to Lack of AI-Ready Data Through 2026.” Gartner, 2025. https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk
- IBM. “The cost of poor data quality: Why data quality is more important than ever.” IBM, 2026. https://www.ibm.com/think/insights/data-quality
- IBM. “AI data quality: How data quality affects AI outcomes.” IBM. https://www.ibm.com/think/topics/ai-data-quality
- Gartner. “How to Stop Data Quality Undermining Your Business.” Gartner, 2020. https://www.gartner.com/smarterwithgartner/how-to-stop-data-quality-from-undermining-your-business
