Why AI Agents Forget: The Stateless LLM Problem

Emily Winks profile picture
Data Governance Expert
Updated:04/02/2026
|
Published:04/02/2026
18 min read

Key takeaways

  • LLMs are stateless by design — every inference call starts with a fresh context window and no carry-forward state.
  • [object Object]
  • Session memory tools solve chatbot-style recall but cannot fix the organisational cold-start problem in enterprise AI.

Why do AI agents forget?

AI agents forget because large language models are stateless by design: each inference call processes a fresh context window and discards all intermediate computation when the call ends. For enterprise data teams, the deeper problem is not session forgetting — it is that agents never knew the organisation's metric definitions, governance policies, or data lineage in the first place.

Core components

  • Stateless architecture: transformers compute over a fixed input window with no persistent internal state between calls
  • Session forgetting: conversation history lost between sessions — addressed by memory tools like Mem0 and Zep
  • Organisational ignorance: agents lack business definitions, lineage, and governance context — requires a context layer

Want to skip the manual work?

Assess Your Context Maturity

AI agents forget because large language models are stateless by design: each inference call processes a fresh context window with no carry-forward of internal state. 32% of enterprise teams cite output quality as their top barrier to production deployment, and this traces directly to statelessness. For enterprise data teams, the forgetting problem runs deeper than session persistence. Agents never knew your data estate, metric definitions, or governance policies in the first place. This guide covers the stateless architecture, why session memory tools fall short, the two cold-start model, and what a context layer provides that memory layers do not.


Quick facts

Permalink to “Quick facts”
What it is The architectural property that causes AI agents to start every session with no memory of prior interactions or organisational context
Root cause Transformer architecture: self-attention over a fixed token window, no persistent internal state between inference calls
Two failure types Session forgetting (chatbot-style) AND organisational ignorance (enterprise-specific)
Session memory tools Mem0, Zep, LangMem, LangGraph memory; solve session recall, not organisational knowledge
The deeper fix A context layer: governed metadata, business definitions, lineage, and policies made queryable at inference time
Enterprise impact Gartner predicts 40% of agentic AI projects canceled by 2027; MIT found 95% of GenAI pilots deliver zero measurable ROI

What makes LLMs stateless?

Permalink to “What makes LLMs stateless?”

A large language model is a stateless function: given a sequence of input tokens, it produces a sequence of output tokens, then discards all intermediate computation. The self-attention mechanism in the transformer architecture computes over the complete input sequence at inference time. That sequence, and everything computed from it, disappears when the call ends.

The transformer architecture in plain terms

Permalink to “The transformer architecture in plain terms”

The 2017 “Attention Is All You Need” paper (Vaswani et al., arXiv:1706.03762) established that transformers compute relationships between all token pairs in an input sequence statelessly. There is no recurrent loop, no carry-forward. A 2026 paper formalises this precisely: a causal transformer layer is mathematically equivalent to a stateless differentiable neural computer where the controller has no recurrent internal state.

The external “memory” is a write-once matrix of value vectors: the context window itself. Multi-head attention reads from it. When the inference call ends, the matrix is discarded.

This is not a design flaw. Statelessness is what makes LLMs horizontally scalable, reproducible, and parallelisable. You can run the same model across 10,000 concurrent users because each call is fully independent. The trade-off is that any continuity must be engineered into the system around the model.

The context window is not memory

Permalink to “The context window is not memory”

The context window is the complete input sequence for one inference call. Modern models range from 128K tokens for GPT-4o to 1M+ for Gemini 1.5 Pro. When the call ends, the window is discarded.

The next call starts with whatever you put in the prompt, and nothing else.

A 2026 paper on LLM context window limitations adds an important nuance: Paulsen (2025) found accuracy degradation begins at 1,000 tokens, far below advertised limits. Relevant information buried in long windows is effectively forgotten even within a single call. The “lost in the middle” phenomenon is real.

Stuffing more tokens into context does not solve the problem. It makes prompts more expensive and noisier. The right fix is structured, relevant context from a governed data estate, not a bigger bag of text.

Comparison: How LLM inference actually works

Aspect What it is What it is NOT
Context window Complete input for one inference call Persistent memory
Model weights Patterns baked in during training Updateable at runtime
Attention computation Stateless, call-by-call Continuous or cumulative
Session recall Possible if engineered externally Automatic
Organisational knowledge Must be injected explicitly Learned from training data


What agents actually lose when they “forget”

Permalink to “What agents actually lose when they “forget””

When an AI agent “forgets,” it loses two distinct things: the conversation record from previous sessions (session forgetting), and the ability to reason about organisational context it never had access to in the first place (organisational ignorance). The first is a continuity problem. The second is an infrastructure problem. Most tooling addresses only the first.

Session forgetting: the chatbot problem

Permalink to “Session forgetting: the chatbot problem”

The consumer experience of an agent forgetting is well-documented. 83% of customers report having to repeat information to multiple agents; 33% cite it as their most frustrating experience. This is the problem Mem0, Zep, LangMem, and LangGraph memory address: persist conversation extracts, inject them at the next session start.

For customer-facing agents, support chatbots, and personal assistants, session memory tools are appropriate and effective. This page does not dismiss them. The argument is about scope, not validity.

See the full taxonomy in types of AI agent memory for a detailed breakdown.

Organisational ignorance: the enterprise problem

Permalink to “Organisational ignorance: the enterprise problem”

For a data team’s agent running SQL queries, diagnosing pipelines, or answering “what is our NRR?”, session memory solves nothing. The agent does not need to remember last week’s conversation.

It needs to know what revenue_net_of_returns means, that orders_v1 is deprecated, that the Q3 revenue definition changed on January 15, and that the analyst asking the question does not have access to the raw PII column. None of that is conversation history. It is organisational knowledge infrastructure.

This gap is what Atlan calls the AI context gap. As an illustrative example drawn from Atlan’s practitioner research: a revenue analysis agent told a finance team Q4 revenue was $12M when the actual figure was $8.4M. The agent was not stateless in the session sense; it had access to the database. It was contextless: it pulled revenue_recognized instead of revenue_net_of_returns because no governed definition existed to guide it. Session memory would not have prevented this.

Understanding common context problems data teams face building agents makes clear how pervasive this gap is.



The two cold-starts: session vs. organisation

Permalink to “The two cold-starts: session vs. organisation”

Enterprise AI agents face two distinct cold-start problems. The first is the session cold-start: every new conversation begins with no memory of prior exchanges. The second, and more consequential, is the organisational cold-start: agents begin with zero knowledge of your data estate, business definitions, governed metrics, or cross-system entity relationships.

Cold-start 1: the session reset

Permalink to “Cold-start 1: the session reset”

Each new agent session is a blank slate. The agent that helped debug a pipeline last Tuesday has no memory of that work today.

Practitioners describe the “constant re-orientation overhead,” which means re-explaining the same context every session. Session memory tools address this by persisting conversation extracts and injecting them at the next session start. For a support chatbot or personal assistant, this is a reasonable and effective fix.

Read more about the AI agent cold-start problem and when session memory tools are the right choice.

Cold-start 2: the organisation

Permalink to “Cold-start 2: the organisation”

The organisational cold-start is not solved by any memory framework on the market. An agent can have perfect recall of every conversation it ever had and still not know:

  • Which of 14 definitions of revenue is authoritative at your company
  • That customer_id in Salesforce, account_id in Stripe, and org_id in Zendesk are the same entity
  • That pricing questions must use certified_pricing_v3, not draft_pricing
  • How to trace NRR through 4 dbt models back to 2 Snowflake source tables
  • That the Q3 revenue metric definition changed on January 15 at the request of the finance team

The memory layer for AI agents pillar page captures why this distinction is foundational, not a footnote.

The distinction matters in practice

Session cold-start Organisational cold-start
What is forgotten Conversation history Business definitions, lineage, governance policies
Fix Session memory layer (Mem0, Zep, etc.) Context layer (governed data estate)
Scope Per-agent, per-session Shared, cross-system, persistent
Scales with Number of conversations Data estate size and governance complexity
Solved by more tokens? Partially No

Why session memory tools aren’t enough for enterprise

Permalink to “Why session memory tools aren’t enough for enterprise”

Session memory tools (Mem0, Zep, LangMem, LangGraph memory) store conversation extracts and replay them at session start. They solve the chatbot-style recall problem well. They cannot store governed business definitions, cross-system entity mappings, data lineage, or column-level access policies. For enterprise data agents, these are not the same class of problem.

What session memory tools actually store

Permalink to “What session memory tools actually store”

Memory tools extract entities and summaries from conversations: user preferences, past decisions, named entities mentioned. At the next session start, relevant extracts are injected into the context window. This is episodic memory: time-indexed records of what was said.

For a customer support agent, this is sufficient. For an agent running SQL against an enterprise data warehouse, it is categorically insufficient.

A 2025 survey of AI agent memory research (arXiv:2512.13564) confirms the fragmentation problem: agent memory frameworks diverge substantially in motivation, implementation, and evaluation. There is no consensus framework designed for enterprise-grade semantic and procedural memory.

Consider the three memory types that actually matter for enterprise:

Memory type What it means Enterprise relevance
Episodic Past interactions, conversation history Limited; chatbot-oriented; does not capture enterprise knowledge
Semantic Business definitions, entity relationships, domain facts High: this is where enterprise context lives: what does “revenue” mean here?
Procedural Workflow steps, authorisation rules, routing patterns High: governance policies, access controls, certification rules

No current memory framework is designed to store business definitions, asset certification status, cross-system entity mappings, or column-level lineage. They are optimised for chatbot personalization, not enterprise data estate knowledge.

The four things enterprise agents actually need to know

Permalink to “The four things enterprise agents actually need to know”

Agents operating on enterprise data need four types of context that conversation history cannot supply:

  1. Organisational context: owners, glossary terms, policies, authoritative metric definitions
  2. Technical context: schemas, lineage traces, data tests, deprecation status for assets like orders_v1
  3. Access and safety context: permissions, PII sensitivity, data residency rules for the requesting user
  4. Temporal context: freshness, recent changes, incidents, definition version history

None of these live in conversation artifacts. They live in data catalogs, business glossaries, governance systems, and metadata stores. See common context problems data teams face building agents for the full breakdown.

Context collapse in long-running workflows

Permalink to “Context collapse in long-running workflows”

Even when memory tools inject conversation history into a long-running agent workflow, a new failure mode emerges: context collapse.

As the context window fills with conversation replay, system prompt constraints get pushed out. Agents make contradictory decisions, forget routing rules, and produce degraded reasoning. A 2026 paper on memory control vs. context accumulation argues for active management of what enters context, not passive accumulation. Structured, relevant context from a governed catalog prevents collapse. Raw conversation replay amplifies it.


The enterprise dimension: agents that never knew your organisation

Permalink to “The enterprise dimension: agents that never knew your organisation”

The fundamental enterprise AI problem is not that agents forget. It is that they never knew the organisation. No training data includes your governed metric definitions, your cross-system entity mappings, your deprecation log, or your access control policies. Agents operating without this knowledge are not forgetful. They are organisationally blind.

What agents cannot learn from training data

Permalink to “What agents cannot learn from training data”

LLMs are trained on internet-scale text. Your internal revenue definition is not on the internet. Five specific things agents cannot learn from training data:

  1. Which of 14 definitions of revenue is authoritative at your company
  2. That customer_id in Salesforce = account_id in Stripe = org_id in Zendesk
  3. That pricing questions must use certified_pricing_v3, not draft_pricing
  4. How to trace NRR through 4 dbt models back to 2 Snowflake source tables
  5. That the Q3 revenue metric definition changed on January 15 at the request of the finance team

None of this is on GitHub. None of it is in your model’s weights. All of it is in your data estate. The context-aware AI agents companion page covers the three failure patterns this creates: the cold-start problem, the validation bottleneck, and the replication problem.

The validation bottleneck

Permalink to “The validation bottleneck”

When agents lack organisational knowledge, teams cannot validate their outputs. If there is no source of truth, no governed definition, no certified metric, no lineage trace, there is no baseline to validate against.

One enterprise team described spending five months manually testing 1,000+ use cases because no source of truth existed. This is not a model capability problem. It is a context infrastructure problem. And it is entirely preventable with governed metadata.

The multi-system identity problem

Permalink to “The multi-system identity problem”

The average enterprise runs 3-5 data platforms. Each platform uses different identifiers for the same entity. An agent with only platform-native context is blind to 60-80% of the data estate.

Cross-system entity resolution, the mapping that establishes customer_id, account_id, and org_id as the same real-world entity across systems, is not a conversation artifact. It lives in an enterprise data graph.

Snowflake’s internal experiment confirms the impact: adding an ontology layer improved agent answer accuracy by 20% and reduced tool calls by 39%. Session memory played no role in this improvement. The gains came from structured context, not conversation replay.

See what is a context graph for how enterprise data graphs support cross-system entity resolution.

Permalink to “Why the popular discourse misses this”

Here is where the popular discourse goes wrong. The SERP consensus frames agent forgetting as a session persistence problem and proposes memory layers as the fix. For enterprise data teams, this is the right answer to the wrong question.

The question is not “how do we make agents remember conversations?” It is “how do we give agents access to the organisational knowledge they need to reason accurately?”

Session memory and context infrastructure are not competing approaches. They operate at different layers. But conflating them leads your team to build the wrong thing, which explains why Gartner predicts 40% of agentic AI projects will be canceled by 2027 due to inadequate risk controls and unclear business value.


What a context layer provides that memory layers don’t

Permalink to “What a context layer provides that memory layers don’t”

A context layer is not a memory store for conversations. It is the governed infrastructure that makes an organisation’s data estate legible to AI agents. It provides business definitions, asset certification status, cross-system entity mappings, lineage traces, access policies, and decision histories; all assembled and injected at inference time from a live, queryable metadata graph.

The five things a context layer gives agents

Permalink to “The five things a context layer gives agents”

No session memory tool stores any of these:

  1. Governed metric definitions: authoritative, versioned, linked to certified sources (revenue_net_of_returns, not revenue_recognized)
  2. Cross-system entity identity: the mapping that resolves customer_id to account_id to org_id across platforms
  3. Routing rules for authoritative sources: agents ask pricing questions of certified_pricing_v3, not ad-hoc tables
  4. Provenance for every answer: the lineage trace from NRR back through 4 dbt models to 2 Snowflake source tables
  5. Institutional memory: decision histories, metric version events, deprecation logs linked to data entities

This is what context engineering means in practice: building the infrastructure that assembles relevant, governed context at inference time.

Structured context vs. raw text injection

Permalink to “Structured context vs. raw text injection”

Teams sometimes approximate a context layer by dumping metadata into the context window: a full data catalog as raw text, a PDF of business definitions pasted as a prompt. This approach works until it doesn’t.

Research from ICLR 2026 (arXiv:2510.04618) shows structured context management improved agent benchmark performance by 10.6%, with 8.6% gains in financial-domain tasks where governance complexity is highest. Anthropic’s context engineering framework identifies four context types (working, session, long-term, and tool context) and recommends just-in-time context assembly. The benefit is in structure and relevance, not volume.

The full architectural comparison is in memory layer vs context layer: which do you actually need?

Memory layer vs. context layer at a glance

Dimension Session memory layer Context layer
What it stores Conversation extracts, user preferences Business definitions, lineage, governance policies, entity mappings
Designed for Chatbot personalization, session recall Enterprise data estate knowledge
Scope Per-agent, per-conversation Shared across all agents and use cases
Updates After each conversation Continuously, from live data systems
Solves session forgetting Yes Partially (injected at session start)
Solves organisational ignorance No Yes

How Atlan approaches the AI context gap

Permalink to “How Atlan approaches the AI context gap”

Atlan’s context layer gives AI agents governed access to the complete enterprise data estate: business glossary, certified metric definitions, cross-system lineage, asset certification status, and access policies, assembled dynamically at inference time. The layer is not a static metadata dump. It is a live, queryable graph that updates as the data estate changes.

Traditional approaches try to solve agent forgetting at the prompt level: longer context windows, better retrieval, richer system prompts. These help at the margins. They do not solve the foundational problem: there is no governed, machine-readable representation of what your data means.

The business definitions, entity relationships, and governance policies that human analysts carry in their heads have never been structured in a way agents can query. Context engineering starts here: not with prompt design, but with infrastructure. For enterprise teams evaluating whether this applies to their situation, does an enterprise need a context layer between data and AI? is a direct resource.

Atlan’s context layer is built on the Enterprise Data Graph, a unified entity relationship map that spans all data systems, linking assets to owners, definitions, lineage, certifications, and policies.

When an agent asks a question, the context layer assembles the relevant context: the authoritative metric definition, the certified source table, the access policy for the requesting user. It injects this at inference time. The agent does not need to know your entire data estate. It needs the right slice at the right moment: revenue_net_of_returns and certified_pricing_v3 at the point of query, not a 50,000-token metadata dump on every call.

Teams using Atlan’s context layer report agents that produce auditable, traceable answers. Not because the model got smarter, but because it finally knew what it was talking about.

Adding a context layer (ontology plus lineage plus definitions) improved agent answer accuracy by 20% and reduced unnecessary tool calls by 39% in Snowflake’s internal experiment. MIT found 95% of GenAI pilots deliver zero measurable ROI when the barrier is organisational, not technological. Structured context directly addresses both.

Explore how Atlan’s context layer works and see the five-layer architecture in detail. Or go directly to the context layer product page to see how it applies to your data estate.


Wrapping up

Permalink to “Wrapping up”

The stateless transformer architecture is real, well-understood, and not going away. The question is what you build around it.

For enterprise data teams, the session memory discourse is incomplete. It addresses the visible symptom (chatbot-style forgetting) and misses the deeper pathology: organisational ignorance. An agent that perfectly remembers every conversation it has ever had still does not know what revenue_net_of_returns means, which table is certified, or that a metric definition changed in January.

The right architectural response is a context layer: governed, live, queryable organisational knowledge that agents can access at inference time.

The teams building durable agent workflows are not solving the forgetting problem with bigger prompts or more conversation history. They are solving it by making the enterprise data estate legible to machines. That is context engineering, not prompt engineering.

For the full architectural breakdown, start with what is a memory layer for AI agents?. Then see how the context layer for enterprise AI takes it further.


External citations: Vaswani et al. (2017), arXiv:1706.03762 | arXiv:2603.19272 (2026) | arXiv:2512.13564 (2025) | arXiv:2601.11653 (2026) | arXiv:2510.04618 (ICLR 2026) | Gartner, June 2025 | Fortune/MIT NANDA, August 2025 | Snowflake blog | LangChain State of Agent Engineering | Anthropic context engineering framework | Mindset AI

FAQs about why AI agents forget

Permalink to “FAQs about why AI agents forget”

1. Why do AI agents forget between sessions?

Permalink to “1. Why do AI agents forget between sessions?”

AI agents forget between sessions because large language models are stateless by design. Each inference call processes a fresh context window (the complete input sequence) and discards all intermediate computation when the call ends. No internal state carries forward. To give agents session continuity, teams must engineer external memory systems that persist conversation extracts and inject them into the context window at the next session start.

2. What is the stateless LLM problem?

Permalink to “2. What is the stateless LLM problem?”

The stateless LLM problem refers to the architectural property that large language models have no persistent internal state between inference calls. A transformer computes self-attention over its input sequence and produces output; nothing is stored after the call completes. Statelessness is a deliberate trade-off that enables scale and reproducibility. All continuity, including session history, organisational knowledge, and user preferences, must be built externally and injected at runtime.

3. What is the AI agent cold-start problem?

Permalink to “3. What is the AI agent cold-start problem?”

The AI agent cold-start problem describes an agent beginning a task with zero knowledge of the organisation it is meant to serve. Every new session or deployment starts without business definitions, data asset context, governance policies, or cross-system entity mappings. Session memory tools address the session cold-start (conversation recall between sessions). They do not address the organisational cold-start, which is the structural absence of enterprise knowledge from the agent’s context.

4. What is the difference between a memory layer and a context layer?

Permalink to “4. What is the difference between a memory layer and a context layer?”

A memory layer stores and replays conversation history: what was said, what was decided, what the user prefers. A context layer provides structured organisational knowledge: governed metric definitions, data lineage, asset certification status, cross-system entity mappings, and access policies. Memory layers are optimised for chatbot-style session recall. Context layers are designed for enterprise agents that must reason accurately over a complex, multi-system data estate.

5. Why do enterprise AI agents fail in production?

Permalink to “5. Why do enterprise AI agents fail in production?”

Enterprise AI agents fail in production primarily because of inadequate context infrastructure, not model capability. Gartner predicts 40% of agentic AI projects will be canceled by 2027 due to escalating costs, unclear business value, and risk control failures. MIT research found 95% of GenAI pilots deliver zero measurable ROI. In both cases, the barrier is organisational: agents lack the business definitions, governance context, and entity knowledge needed to produce accurate, auditable answers.


Share this article

signoff-panel-logo

Atlan is the next-generation platform for data and AI governance. It is a control plane that stitches together a business's disparate data infrastructure, cataloging and enriching data with business context and security.

 

Everyone's talking about the context layer. We're the first to build one, live. April 29, 11 AM ET · Save Your Spot →

Bridge the context gap.
Ship AI that works.

[Website env: production]