How AI Memory Systems Work: Ingestion to Eviction Guide

AI memory systems move data through four stages (ingestion, storage, retrieval, and eviction) to give agents persistent context beyond a single conversation. Most engineering attention concentrates on retrieval: vector search, graph traversal, hybrid ranking. The ingestion stage, where trust is either established or broken, gets the least scrutiny. Eight in ten enterprises cite data limitations, not model limitations, as the bottleneck to scaling agentic AI. This guide covers all four stages with equal rigor, then names the failure mode that retrieval architecture alone cannot fix.

What it is	A pipeline of four stages (ingestion, storage, retrieval, eviction) giving AI agents persistent memory across sessions
Key benefit	Agents act on accumulated knowledge rather than starting from scratch each session
Critical stage	Ingestion: where source authority is accepted or abandoned
Core failure mode	Eviction policies (LRU, TTL, decay) prune on access frequency and time, not on whether the source is still authoritative
Frameworks	Mem0, Zep, LangMem, Letta/MemGPT, MemOS
Enterprise gap	80% of companies cite data limitations as the agentic AI bottleneck (McKinsey, 2026)

What is an AI memory system?

An AI memory system is a structured pipeline that captures, stores, and retrieves context for an AI agent across multiple sessions. Unlike a context window, which resets at the end of each conversation, memory systems persist information over time, enabling agents to accumulate knowledge, recall past interactions, and build on prior decisions.

The simplest analogy: the context window is RAM. Memory is disk. When the working session ends, RAM clears. The memory system keeps what matters for next time.

Why this matters now: enterprise agent deployment more than doubled from 11% in Q1 2025 to 26% in Q4 2025, yet three-quarters of large enterprises still haven’t scaled agentic AI. Agents without persistent memory can’t compound knowledge. Each session is a cold start. The pipeline that enables compounding is the memory system, and it must function correctly at every stage.

That pipeline has four stages: ingestion, storage, retrieval, and eviction. A failure at any stage propagates forward. An error introduced at ingestion cannot be corrected by better retrieval. A memory that survives eviction because of high access frequency, but draws from a deprecated source, will be retrieved with confidence. The field’s dominant assumption is that ingestion is unproblematic. That assumption is the root of most enterprise AI memory failures.

For a full taxonomy of memory types, see What Is an AI Memory System? and Memory Layer vs Context Window.

Stage 1: Ingestion — where trust is established or broken

Ingestion is the stage at which external data enters the memory pipeline. Every major framework (Mem0, LangMem, MemOS) uses an extraction phase to determine what to store. The critical question no framework answers by default: is the source authoritative? Ingestion without source governance is how confident-but-wrong memories are born. The pipeline accepts data. It has no mechanism to ask whether that data should be trusted.

What the ingestion stage actually does

Raw inputs enter the pipeline from multiple channels: documents, conversation turns, tool outputs, database snapshots, API responses. An extraction phase processes them, scoring content for importance and formatting it for storage.

The frameworks differ in how they score:

Mem0 uses LLM-based importance scoring: the model judges relevance and decides what to retain
MAGMA uses JSON schema enforcement at ingestion: structural quality control, but not source authority validation
MemOS tracks an origin signature, access control, TTL, version chain, and compliance tags per memory unit via its MemCube provenance model. This is the most advanced provenance tracking currently available.

What none of them do by default: ask whether the source of that data is currently authoritative, certified, or deprecated within your organization’s data infrastructure. Relevance and importance are not the same as trustworthiness.

The ingestion trust gap

The failure modes compound quickly.

Letta/MemGPT allows an agent to edit its own core memory directly. This enables adaptation, but it also means the agent can overwrite a certified definition with an inferred approximation, and the pipeline has no mechanism to flag the difference. The certified fact and the inferred approximation are stored identically.

Multi-agent provenance disappears at ingestion. Mem0’s own 2026 report documents this explicitly: “a memory that reads ‘user needs help with deployment’ is ambiguous about whether the user stated this directly, a monitoring agent inferred it, or a planning agent generated it.” Once ingested, that provenance is lost. Every downstream retrieval treats all memories as equally authoritative.

The attack surface is also at ingestion, not retrieval. A 2024 study (PoisonedRAG) found that injecting just 5 malicious documents into a corpus of millions caused a RAG system to return a false answer 90% of the time for targeted queries. The vector database cannot distinguish legitimate from malicious content, because the problem was introduced at ingestion.

The practitioner framing that’s circulating: “garbage in is a retrieval problem.” It’s not. Garbage in is an ingestion governance problem. Better retrieval ranking of a malicious or deprecated memory returns the wrong answer faster, with higher confidence.

As CXToday’s governance roundup puts it: “If the knowledge base is outdated, RAG just retrieves the wrong answer faster.”

What governed ingestion looks like

Governing ingestion means asking authority questions before data enters the pipeline.

The Infosys enterprise memory architecture describes content authority tagging at ingestion: Policy, Standard, Guideline, Opinion. Each has a different trust weight. Not all content that could enter memory should enter memory at the same authority level. The principle: “Do not copy chaos. Connect to truth.”

The Governed Memory architecture paper (arXiv:2603.17787) takes this further: schema lifecycle enforcement, 99.6% fact recall, and zero cross-entity leakage across 3,800 adversarial queries. The paper identifies five structural problems of ungoverned multi-agent memory, including “silent degradation”: facts that are wrong for months before any user surface detects the failure. Governance at ingestion prevents silent degradation before it starts.

The principle: trusted ingestion connects to governed infrastructure rather than extracting from ungoverned sources. Enterprises already maintain this infrastructure: the data catalog, with certified definitions, enforced lineage, and active freshness signals.

See AI Memory Ingestion Pipeline: What Should (and Shouldn’t) Enter AI Memory for the full conviction treatment of this stage.

Stage 2: Storage — tiers, types, and trade-offs

AI memory systems use tiered storage modeled on how operating systems manage RAM and disk. Hot memory holds immediate working state; warm memory holds recent history; cold memory holds long-term knowledge. The taxonomy (episodic, semantic, procedural, in-context) maps to distinct retrieval patterns and decay rates.

Memory tier architecture

Hot (in-context): The system prompt, active tool outputs, and current session state. Fits inside the context window. Fastest access, but resets at session end. This is what most practitioners mean when they say “AI memory”; it is actually the most transient tier.

Warm (recall/episodic): Recent conversation summaries, session-level entities, short-term behavioral patterns. Stored in a vector database or key-value store. MemGPT’s architecture calls this “recall storage”: the layer between working memory and long-term archival.

Cold (archival/semantic): Long-term knowledge base, enterprise documents, glossary definitions, certified metric specifications. Accessed via similarity search over a vector index, often augmented with graph traversal for relational queries. The Analytics Vidhya memory architecture overview covers hot/warm/cold with async consolidation: how memories migrate between tiers as they age and access frequency changes.

Framework mapping: Letta uses core/recall/archival; the names differ, the tier logic is consistent across all major frameworks.

Memory type taxonomy

The tier (where it’s stored) is separate from the type (what kind of knowledge it represents):

Episodic: Specific events and past interactions. “The agent ran the quarterly revenue analysis last Tuesday. The query timed out.”
Semantic: Factual knowledge, definitions, relationships. “Revenue means recognized revenue. It’s owned by Finance. Last certified Q4 2025.”
Procedural: Learned skills and workflows. “When the user asks for a board report, call the formatting tool before running the query.”
In-context (working memory): Active session state. This is the tier most teams confuse with the memory system itself.
Org context memory: The fifth type: institutional knowledge. Certified metrics, approved definitions, business glossary terms. Often absent from framework taxonomies but essential for enterprise data agents.

For the full five-type breakdown, see Types of AI Agent Memory.

Ungoverned vs governed storage

Aspect	Ungoverned storage	Governed storage
Source verification	None (importance scored by LLM)	Certified, traces to authoritative source
Freshness signals	TTL or access frequency	Live freshness from upstream catalog
Provenance	Often lost in extraction	Tracked per memory unit (MemOS MemCube model)
Conflict resolution	Last-write wins or merge	Authority-level hierarchy (Policy > Standard > Opinion)
Staleness detection	Reactive (wrong answer surfaced by user)	Proactive (freshness signal triggers refresh)

The ungoverned column describes how every major memory framework operates today. The governed column describes what the research literature and enterprise requirements say should happen.

Stage 3: Retrieval — getting the right memory at the right time

Retrieval is the most-engineered stage of AI memory systems. When an agent needs context, it queries stored memories using vector similarity, graph traversal, or hybrid ranking to surface the most relevant items. The field’s attention is rightly here; retrieval unreliability is a genuine primary failure mode. The mistake is assuming that better retrieval compensates for ungoverned ingestion. It doesn’t.

Retrieval mechanisms

Vector similarity search: Embeddings of stored memories are compared to query embeddings. Fast, scalable, sensitive to embedding quality. Semantic drift, where a term’s meaning changes in the domain but not in the embedding model, can cause highly confident but wrong retrievals.

Graph traversal: Entity relationships stored as graph edges enable multi-hop reasoning. “recognized_revenue_q4 → owned by Finance → last certified 2025-Q4 → flagged for review 2026-Q1.” This kind of lineage-aware retrieval is unavailable with vector search alone.

Hybrid retrieval: Vector search combined with graph traversal and keyword matching. Mem0’s 2026 architecture uses dual vector+graph storage for multi-scope retrieval across user, agent, session, and global memory scopes.

Retrieval scope: Memory can be scoped per-user, per-agent, per-session, or globally. Different scopes carry different trust requirements. A global memory accessible to all agents in a multi-agent system has higher authority requirements than a single-session episodic memory.

Redis’s engineering analysis frames retrieval latency (sub-millisecond with in-memory stores) as a solved engineering problem. Retrieval quality (returning the right answer, not just the most similar one) is not.

What retrieval cannot fix

The arXiv survey of 218 memory papers identifies retrieval unreliability as a primary failure mode: ranking algorithms can surface results that are irrelevant or misleading. This is real: retrieval is not a solved problem.

But retrieval quality is bounded by ingestion quality. If recognized_revenue_q4 was ingested from a deprecated financial model, vector similarity will still retrieve it, because it is semantically similar to the query. The retrieval is technically correct. The answer is factually wrong.

The Workday case makes this concrete: an agent “couldn’t answer one question” until a semantic layer bridged business language to the underlying data structure. The retrieval engine was working. The ingestion foundation (the absence of governed semantic definitions) was the problem.

The quantified impact: Atlan and Snowflake research shows a 3x text-to-SQL accuracy improvement when retrieving against governed metadata vs. bare schemas. A 20% accuracy gain and 39% tool call reduction with an ontology layer. The governed source at ingestion drives retrieval quality outcomes. Better retrieval algorithms don’t move these numbers. A better source does.

For the RAG comparison, see AI Memory System vs RAG: When Retrieval-Augmented Generation Isn’t Enough.

Stage 4: Eviction — the failure mode the field hasn’t named yet

Eviction is how AI memory systems manage capacity, removing lower-priority memories when storage limits are reached. Standard policies include LRU (least recently used), TTL (time-to-live), decay functions, and recursive summarization. The failure mode none of these policies detect: a memory from a deprecated source with high access frequency will survive eviction, while a certified fact with low access gets pruned. The agent will then answer confidently with the deprecated data.

How eviction policies work today

LRU (least recently used): Prunes memories not accessed recently. Operates on access pattern, not source authority. A deprecated metric definition that your financial agents query constantly scores highly. It survives.

TTL (time-to-live): Expires memories after a fixed duration. Operates on time, not freshness of the underlying source. If the source was superseded a week after ingestion but the TTL is 90 days, the memory persists for 89 more days: still confidently wrong.

Decay functions: Score memories by a combination of recency and frequency. Same blind spot as LRU: the signal is access pattern, not source authority.

Recursive summarization (Letta/MemGPT): Compresses old memories into summaries to manage storage. Loses granularity. Loses provenance. Loses the ability to detect when a specific fact within the summary has been superseded by an update to the source.

The mechanism missing from all of these: a signal from the source that says “I have been updated” or “I have been deprecated.” Every eviction policy in current production use is operating blind to source authority.

Here is the failure mode the field has not named directly.

A memory ingested six months ago from a deprecated financial dataset may still score highly on recency (your agents accessed it last week), relevance (high embedding similarity to incoming revenue queries), and frequency (it’s your most-queried memory). All three LRU/decay signals say: keep this memory. The eviction policy keeps it. The agent answers your CFO’s revenue question from a deprecated source, with full confidence.

Mem0’s own 2026 report admits this is unresolved: “a highly-retrieved memory about a user’s employer is highly relevant until it is not, at which point it becomes confidently wrong.” This is not a retrieval failure. The retrieval was correct; the query matched the memory. It is an eviction failure: the wrong signal (access frequency) drove the keep-or-prune decision.

The enterprise stakes are significant. In 2024, 47% of enterprise AI users admitted to making at least one major business decision based on hallucinated content. Confident-but-stale memory is a direct contributor. Gartner predicts 60% of organizations will fail to realize expected AI value by 2027 because of incohesive data governance frameworks. Staleness surviving eviction is a governance failure, not a model failure, not a retrieval failure.

What eviction needs to work correctly

Eviction policies need active freshness signals from the upstream source: not “when was this memory last accessed” but “has the source changed since this was ingested?”

MemOS (arXiv:2507.03724), published July 2025, is the most advanced memory framework currently available on this dimension. Its MemCube model tracks origin signature, provenance metadata, and lifecycle state per memory unit. This is the closest existing system to governance-aware eviction.

The gap: MemOS builds this provenance infrastructure from scratch. It doesn’t connect to the governed data infrastructure enterprises already maintain. A data catalog with active metadata (continuously updated signals, not point-in-time snapshots) already carries the freshness signals eviction policies need. The question is why memory frameworks build provenance tracking in isolation rather than connecting to the governed source that already has it.

Why the field’s attention is inverted

The AI memory literature (218 papers analyzed from 2023 Q1 through 2025 Q4) concentrates research effort on retrieval mechanisms and storage architecture. Ingestion quality and eviction provenance receive surface treatment. This inversion matters: retrieval is the most visible failure mode; ingestion is the root cause. Fixing the wrong stage is why scaling agentic AI remains elusive for most enterprises.

The survey is diagnostic: five pipeline operations identified, with retrieval and storage dominating the literature. Ingestion governance is not a recognized research category in the 218 papers analyzed. The field is optimizing the stages that are easy to measure (query latency, similarity scores, recall rates) and underinvesting in the stage where trust is determined.

McKinsey’s 2026 agentic AI analysis is explicit about what the bottleneck actually is: “one data foundation for analytics and AI — build data once, use everywhere, with clear common definitions.” This is an ingestion architecture recommendation. It describes a governed source of truth that feeds both analytics and AI. Not a retrieval recommendation. Not a vector store recommendation. This is a source governance recommendation, pointing directly at the data catalog layer.

By 2028, Gartner projects that 60% of agentic analytics projects relying solely on the Model Context Protocol will fail due to the absence of a consistent semantic layer. The market is moving toward governed ingestion. The engineering community is still optimizing retrieval.

To be precise: the field is not ignoring ingestion quality entirely. Mem0’s extraction phase uses LLM importance scoring; MAGMA applies JSON schema enforcement at the ingestion boundary; MemOS tracks provenance per memory unit via its MemCube model. These are real quality controls, and they represent genuine progress. But they are all framework-level substitutes for source governance: they score and validate data after it enters the pipeline, without asking whether the source was authoritative to begin with. A relevance-scored memory from a deprecated dataset is still a deprecated memory, held with model-level confidence. Framework-level ingestion controls are necessary but not sufficient when the underlying source is ungoverned.

The gap no competitor article has named: enterprises are building extraction pipelines around their data catalogs, pulling definitions, metrics, and lineage out of the governed source, processing them through LLM scoring, and re-ingesting them into memory frameworks, instead of connecting the memory pipeline directly to the governed source. The data catalog already has certified definitions, enforced lineage, active freshness signals, and authority hierarchies. The memory framework builds all of these from scratch.

One scope note: this conviction applies to enterprise data agents, meaning AI systems that reason over organizational data. Consumer chatbot memory (user preferences, conversation history, behavioral adaptation) has different trust requirements. The governed source-of-truth argument is strongest where organizational data governance already exists and enterprise AI agents need to reason over it.

How Atlan approaches AI memory as a source-of-truth problem

Atlan’s context layer provides what every memory pipeline’s ingestion stage is missing: a governed, certified, semantically enriched source of truth. Rather than extracting from ungoverned sources, agents connecting to Atlan start memory with certified metric definitions, enforced lineage, and active freshness signals already built in. The ingestion problem becomes a connection problem.

Every memory framework today builds its ingestion extraction from scratch: an LLM scores importance, a vector store indexes the result, an eviction policy manages capacity. None of it knows whether the source was authoritative when the memory was created, or whether it still is. The result: agents that answer with confidence from deprecated data, outdated definitions, or inferred approximations treated as certified facts.

The typical enterprise response to this problem is to build a better extraction pipeline. The actual fix: connect to the governed source that already knows what’s certified, what’s stale, and who owns what.

Atlan’s context layer provides certified semantic definitions (what recognized_revenue_q4 means, who owns it, when it was last validated), column-level lineage (where data originated, what transformations it passed through, what systems consume it downstream), active metadata (continuously updated signals, not point-in-time snapshots), and inference-time governance (policies enforced at decision time, not just data-at-rest). Eviction policies can then consult the catalog’s freshness signal rather than relying solely on LRU or TTL.

An agent reading from Atlan at ingestion starts from a different epistemic position. Not “this is what I extracted from the document corpus and scored as probably important.” But “this is what the organization has certified as true, who owns it, and when it was last validated.”

CME Group has cataloged 18M+ assets with 1,300+ glossary terms: exactly the kind of governed source that memory pipelines could connect to rather than build around. The question isn’t whether the governed infrastructure exists. For most enterprises running a data catalog, it already does. The question is why AI memory teams are building extraction pipelines around it instead of connecting to it.

The outcomes follow from the architecture. Atlan and Snowflake research shows 3x text-to-SQL accuracy improvement with governed metadata vs. bare schemas, and a 20% accuracy gain with 39% tool call reduction when an ontology layer is added. The Governed Memory architecture paper shows 99.6% fact recall, 50% token reduction via progressive delivery, and zero cross-entity leakage across 3,800 adversarial queries. This is what governance-first memory achieves: not by improving retrieval algorithms, but by ensuring what enters memory is trustworthy.

The shift: from “build a memory pipeline that extracts from everything” to “connect your memory pipeline to the source that already governs everything.”

Learn more: How Atlan’s Context Layer Functions as an Enterprise Memory Layer and Context Layer as AI Memory Foundation: Why the Build Problem Is a Connection Problem.

Also relevant: Agent Memory Layer on Your Data Catalog.

Inside Atlan AI Labs — what happens when enterprise AI systems get governed context at the input layer.

Download E-Book

Real stories from real customers: memory layers in production

"We're excited to build the future of AI governance with Atlan. All of the work that we did to get to a shared language at Workday can be leveraged by AI via Atlan's MCP server…as part of Atlan's AI Labs, we're co-building the semantic layer that AI needs with new constructs, like context products."

— Joe DosSantos, VP of Enterprise Data & Analytics, Workday

Watch Now

"Atlan is much more than a catalog of catalogs. It's more of a context operating system…Atlan enabled us to easily activate metadata for everything from discovery in the marketplace to AI governance to data quality to an MCP server delivering context to AI models."

— Sridher Arumugham, Chief Data & Analytics Officer, DigiKey

Watch Now

Why the ingestion stage is the trust inflection point for enterprise AI

The four-stage AI memory pipeline (ingestion, storage, retrieval, eviction) is well understood at the storage and retrieval layers. The stages that determine whether memory is trustworthy, ingestion and eviction, receive the least engineering attention. The field has concentrated research and tooling investment on retrieval because retrieval failures are visible. Ingestion failures are silent until they surface as confident wrong answers.

The named failure mode: eviction policies that operate on access patterns and time cannot detect deprecated sources. A high-frequency deprecated memory survives eviction. A low-frequency certified fact gets pruned. The agent answers confidently from stale data. This is not a retrieval architecture problem; it is a source governance problem that retrieval optimization cannot correct.

Enterprises scaling agentic AI already maintain the governed infrastructure memory pipelines need. The data catalog has certified definitions, enforced lineage, active freshness signals, and authority hierarchies. The architectural question is no longer “how do we build better memory?” It is: why are we building extraction pipelines around the source of truth instead of connecting to it?

Ready to connect the governed source your AI memory pipeline needs?

Book a Demo

FAQs about how AI memory systems work

1. What are the four stages of an AI memory system?

The four stages are ingestion (data enters the pipeline from external sources), storage (data is organized across memory tiers: hot, warm, cold), retrieval (relevant memories are surfaced using vector similarity, graph traversal, or hybrid ranking), and eviction (low-priority memories are pruned to manage capacity). Each stage introduces distinct failure modes. Ingestion is where source trust is either established or permanently lost; failures at this stage propagate through all subsequent stages.

2. What is memory eviction in AI agents?

Memory eviction is the process by which an AI agent removes stored memories when capacity is reached. Common policies include LRU (least recently used), TTL (time-to-live expiration), decay functions, and recursive summarization. The critical gap: all standard eviction policies operate on access frequency and time, not on whether the original source is still authoritative. Highly accessed memories from deprecated sources survive eviction; certified facts with low access frequency get pruned. The agent then answers from stale data with full confidence.

3. What is the difference between AI memory and RAG?

RAG (retrieval-augmented generation) retrieves from a static knowledge base at query time; there is no persistent agent state between sessions. AI memory systems maintain a dynamic, continuously updated store that the agent reads from and writes to across sessions. The distinction matters for enterprise data agents: RAG retrieves from what was indexed; memory systems accumulate from what agents experience across sessions. Both fail identically when the underlying source is ungoverned; better retrieval architecture doesn’t fix an ungoverned knowledge base.

4. How does an AI agent decide what to store in memory?

Most frameworks use an extraction phase that scores incoming content by estimated importance; typically an LLM judges what is worth retaining. Mem0 uses LLM-based scoring; MAGMA uses JSON schema enforcement; MemOS uses provenance tagging per memory unit. What none of these do by default: verify that the source of the content is currently authoritative, certified, or still valid within the organization’s governed data infrastructure. Relevance and importance are not the same as trustworthiness.

5. Can AI memory become stale or incorrect?

Yes, and this is one of the field’s most under-addressed risks. Mem0’s own 2026 report identifies memory staleness as “unresolved”: a highly-retrieved memory becomes confidently wrong when the underlying context changes. Standard eviction policies don’t detect staleness; they detect low access frequency. The fix requires active freshness signals from the upstream source, not better retrieval ranking of the stale memory. A memory framework connected to a governed data catalog inherits the catalog’s freshness signals automatically.

6. What happens when an AI agent’s context window fills up?

When the context window reaches capacity, the agent must offload content to long-term memory storage. Frameworks like Letta/MemGPT handle this by moving content from in-context (hot) memory to recall or archival tiers via compression and summarization. The agent then uses retrieval to pull relevant items back in as needed. The context window is working memory; the memory system is the agent’s persistent disk. The quality of what ends up on that disk depends entirely on the ingestion stage: what was selected, from what sources, with what authority.

7. How do AI memory frameworks like Mem0 and Zep compare?

Mem0 uses dual vector+graph storage with LLM-based extraction and integrates with all major agent frameworks including LangChain, LlamaIndex, CrewAI, and AutoGen; it’s the most widely adopted framework but treats ingestion quality as a relevance problem, not a governance problem. Zep focuses on session-level episodic memory with fast retrieval. LangMem integrates with LangGraph for agent workflows. MemOS is the emerging framework with the most advanced provenance model, tracking origin signature, provenance metadata, and lifecycle state per memory unit via its MemCube model. None of the frameworks, by default, connect to an existing governed data catalog as the ingestion source.

Share this article

How AI Memory Systems Work: Ingestion, Storage, Retrieval

Key takeaways

How does an AI memory system work?

The four stages

What is an AI memory system?