Zep and Mem0 are the two leading frameworks for giving stateless LLMs persistent memory, but they take fundamentally different architectural approaches: Mem0 is vector-first with optional graph enhancement, while Zep is built on Graphiti, a temporal knowledge graph that models when facts were true, not just that they were true. On LongMemEval, Zep scores 63.8% vs Mem0’s 49.0% on GPT-4o; on LOCOMO, the two vendors dispute each other’s methodology, the benchmark war is real and unresolved. This guide breaks down what each tool does, how their architectures differ, what the benchmark numbers actually mean, when each fits, and where both hit an architectural ceiling for enterprise data agents.
| Dimension | Zep | Mem0 |
|---|---|---|
| What it is | Context engineering platform powered by Graphiti temporal knowledge graph | Memory layer with vector-first storage and optional graph enhancement (Mem0g) |
| Core storage | Bi-temporal knowledge graph (valid time + transaction time) | Vector embeddings (base) + directed labeled graph (Mem0g) |
| Who owns it | Zep AI (startup, active commercial SaaS + Graphiti OSS) | Mem0 (YC + Basis Set; Apache 2.0 open source) |
| Key strength | Temporal reasoning: tracks when facts changed, not just what they are | Breadth + ecosystem: AWS Strands, CrewAI, Flowise, 41K+ GitHub stars (at Series A) |
| Best for | Agents requiring causal/temporal reasoning; enterprise graph relationships | Developers who need a functional memory layer quickly; consumer + B2B copilots |
| LongMemEval score | 63.8% (GPT-4o) | 49.0% (GPT-4o) |
| Self-hosting | Community Edition deprecated April 2025; requires Graphiti + graph DB | Full stack self-hostable; Apache 2.0; Docker-ready |
| Pricing | Credit-based; Graphiti graph at all tiers (~$25/mo Flex) | Free to $19/mo Starter; graph locked to Pro ($249/mo, 13x jump) |
What’s the difference between Zep and Mem0?
Permalink to “What’s the difference between Zep and Mem0?”Zep and Mem0 both solve the stateless LLM problem, they give agents a way to remember past interactions, but their architectures diverge at the storage layer. Mem0 extracts salient facts into vector embeddings and optionally a directed graph. Zep stores everything in Graphiti, a temporal knowledge graph that tracks when facts were true, enabling queries like “what did the user prefer last month?”
The core architectural distinction
Both tools extract facts from conversations and return relevant context at query time. That’s the shared job. Where they diverge:
- Mem0: extraction-first pipeline, an LLM identifies salient facts, stores them in a vector DB, and optionally promotes them to Mem0g’s directed graph; retrieval is by semantic similarity
- Zep: graph-first pipeline, every conversation episode becomes a graph update; Graphiti models entities, relationships, and validity windows; retrieval combines semantic embeddings, BM25 keyword search, and graph traversal
A concrete example: a user changes their shipping address. With Mem0’s base configuration (absent an explicit contradiction signal in the update phase), the old address may be retrieved if it is semantically closer to the query than the updated fact. With Zep/Graphiti, the old address is marked invalid with a timestamp, and only the current address surfaces on subsequent queries. (Sources: arXiv 2501.13956; arXiv 2504.19413)
Why the distinction matters now
Gartner predicts that 40% of enterprise applications will feature task-specific AI agents by 2026, up from less than 5% in 2025. More agents means a growing requirement for memory that reasons about change over time, not just static facts. Zep’s v3 rebrand to “context engineering platform”, citing Andrej Karpathy and Shopify’s Tobias Lutke as endorsers, signals that the market is maturing past simple vector retrieval.
Why confusion persists
Both tools call themselves memory layers, and both now support graph-based retrieval in 2026. The benchmark dispute makes objective comparison harder: Zep originally claimed 84% on LOCOMO; Mem0 corrected this to 58.44% (alleging adversarial category inclusion errors); Zep counter-claimed 75.14%. Neither vendor’s benchmarks measure enterprise data context, governed definitions, lineage, access policy, which is where the real distinction from enterprise requirements emerges. (Source: GitHub: getzep/zep-papers/issues/5)
What is Zep?
Permalink to “What is Zep?”Zep is a memory and context engineering platform for AI agents, built on Graphiti, an open-source temporal knowledge graph engine. Unlike vector-first memory systems, Zep models facts with validity windows: when a fact was true, and when it was recorded. This enables agents to reason accurately about change over time. Zep’s DMR benchmark score is 94.8%; the Graphiti engine has 24K+ GitHub stars.
Core definition and purpose
Zep started as a stateless LLM memory service. With v3, it repositioned as a “context engineering platform”. Its core engine is Graphiti (Apache 2.0), a real-time temporal knowledge graph for AI agents built on three subgraph layers:
- Episodic layer: raw conversation sessions stored as episodes; every episode drives graph updates without overwriting history
- Semantic layer: extracted entities (people, organizations, preferences, facts) stored with typed edges and validity windows (9 node types, 8 relationship types in v3)
- Community layer: higher-order clusters aggregating related entities; reduces retrieval noise for long-running agent sessions
Bi-temporal modeling gives every fact two timestamps: valid time (when it was true in the world) and transaction time (when Graphiti ingested it). This supports queries like “What was the contract status in March?”, a capability that pure vector retrieval cannot provide. (Source: arXiv 2501.13956)
Why Graphiti matters now
Graphiti MCP Server v1.0 shipped November 2025, compatible with Claude Desktop, Cursor, and any MCP client, reaching hundreds of thousands of weekly MCP users. Zep scaled 30x in two weeks in late 2025; infrastructure optimizations brought P95 graph search from 600ms to 150ms. The vector database vs knowledge graph distinction is central to Zep’s competitive advantage here.
Maturity and evolution
Zep Community Edition was deprecated in April 2025, with additional feature retirements in February 2026. Self-hosting now requires Graphiti plus a compatible graph database, Neo4j, FalkorDB, or Kuzu, meaning at minimum three systems to provision. A Docker image staleness issue (v0.10 vs current v0.22, reported six months post-CE deprecation) signals an operational gap for smaller teams. Zep’s positioning shift from “memory” to “context engineering” is genuine, but important to parse: Graphiti is still a conversation graph, not a governed enterprise context layer.
Core components of Zep
Permalink to “Core components of Zep”- Graphiti temporal knowledge graph: Open-source engine (Apache 2.0) modeling entities, relationships, and validity windows with bi-temporal timestamps
- Episodic memory layer: Stores raw conversation sessions as episodes; every episode drives graph updates without overwriting history
- Semantic entity layer: Extracted entities with typed edges and validity windows (9 node types, 8 relationship types in v3)
- Community summaries: Higher-order clusters aggregating related entities; reduces retrieval noise for long sessions
- Hybrid retrieval: Combines semantic embedding search, BM25 keyword search, and graph traversal, not vector-only
- MCP Server: Native integration with Claude Desktop, Cursor, and any MCP-compatible client (v1.0, November 2025)
What is Mem0?
Permalink to “What is Mem0?”Mem0 is an open-source memory layer for AI agents that extracts salient facts from conversations, stores them as vector embeddings, and retrieves relevant context at inference time. Its graph variant (Mem0g) adds a directed labeled graph for relationship modeling. On LOCOMO, Mem0 scores 66.9% vs OpenAI memory’s 52.9%, a 26% relative improvement. It has 41,000+ GitHub stars (at Series A in October 2025) and a $24M Series A.
Core definition and purpose
Mem0 addresses the fixed context window problem: instead of replaying full conversation history, it extracts key facts (preferences, entities, decisions) and retrieves the relevant subset at inference time. Two variants exist:
- Base Mem0: vector + LLM extraction pipeline
- Mem0g: directed labeled graph (entities as nodes, typed relationships as edges) for relationship-aware retrieval
The incremental processing pipeline runs an extraction phase (LLM identifies salient facts from new conversation) followed by an update phase (consolidates with existing memory, resolves contradictions). The result: 90%+ token savings vs full-context baseline; p95 latency 91% reduction, practical production benefits that are real and documented. This makes it a strong choice alongside the broader ecosystem of AI agent memory frameworks.
Why Mem0 matters now, ecosystem momentum
AWS selected Mem0 as the exclusive memory provider for the Strands Agent SDK in May 2025, the most significant enterprise validation signal in the memory layer market to date. A $24M Series A followed in October 2025, led by Basis Set Ventures with strategic investors including Dharmesh Shah, and the CEOs of Datadog, Supabase, and PostHog. API call growth from 35M (Q1 2025) to 186M (Q3 2025), approximately 30% month-over-month, confirms real production adoption.
Mem0 integrates natively with CrewAI, Flowise, Langflow, and AWS Strands, giving it the broadest ecosystem coverage in the category. For teams evaluating options, the Mem0 alternatives landscape is also worth reviewing.
Maturity and evolution
Mem0g adds approximately 2% overall score improvement over base Mem0, meaningful but not transformational; the graph is supplementary, not architectural. Graph memory is locked to the Pro tier at $249/mo, a 13x jump from Starter at $19/mo, which is the top developer complaint in community forums. GitHub Issue #2066 documents extreme graph token costs in production: saving 62 photo descriptions took over an hour at costs 15x higher than generation. Full stack self-hostable under Apache 2.0, a clear advantage over Zep post-Community Edition deprecation.
Core components of Mem0
Permalink to “Core components of Mem0”- Vector memory store: Fact extraction pipeline that converts conversations into semantic embeddings; retrieval by cosine similarity
- LLM extraction layer: Configurable LLM identifies salient facts, user preferences, and entities from raw conversation text
- Mem0g graph variant: Directed labeled graph (entities as nodes, typed relationships as edges) for relationship-aware retrieval, Pro tier only on the managed API
- Conflict resolution: Update phase resolves contradictions between new facts and existing memory (e.g., old and new address for the same user)
- Multi-agent memory: Shared memory namespaces across agent sessions, supports cross-agent context passing
- SDK integrations: Native SDKs for Python and JavaScript; integrations with CrewAI, Flowise, Langflow, and AWS Strands Agent SDK
Not sure which architecture fits your agent stack? See our decision framework: How to choose an AI agent memory architecture
Zep vs Mem0: Head-to-head comparison
Permalink to “Zep vs Mem0: Head-to-head comparison”The sharpest differences between Zep and Mem0 appear in three dimensions: storage architecture (temporal knowledge graph vs vector-first), pricing model (graph available at all Zep tiers vs Mem0’s $249/mo Pro paywall), and self-hosting posture (Mem0 maintains full Apache 2.0 stack; Zep deprecated Community Edition). On benchmarks, both claim state-of-the-art performance, and both contest the other’s methodology.
| Dimension | Zep | Mem0 |
|---|---|---|
| Primary storage | Bi-temporal knowledge graph (Graphiti) | Vector embeddings + optional Mem0g directed graph |
| Temporal reasoning | Native: fact validity windows, temporal queries | Improving: Mem0g adds ~2% on temporal benchmarks over base |
| LongMemEval (GPT-4o) | 63.8% (independent test) | 49.0% (same benchmark) |
| LOCOMO benchmark | Disputed: 84% (original) → 58.44% (Mem0 corrected) → 75.14% (Zep counter) | 66.9% LLM-as-Judge; 26% relative improvement vs OpenAI memory |
| Self-hosting | Graphiti (Apache 2.0) only; full Zep SaaS stack not self-hostable | Full stack (Apache 2.0); Docker images maintained |
| Graph access pricing | All tiers (credit model; ~$25/mo Flex) | Pro tier only ($249/mo, 13x jump from $19/mo Starter) |
| MCP ecosystem | Graphiti MCP Server v1.0 (November 2025) | No MCP server in current release |
| Ecosystem integrations | MCP clients (Claude Desktop, Cursor) | AWS Strands, CrewAI, Flowise, Langflow |
| GitHub traction | 24K+ stars (Graphiti) | 41K+ stars at Series A (mem0) |
| Failure mode | Self-hosting complexity post-CE deprecation; SaaS gaps for smaller users | Graph paywall blocks production evaluation; graph token costs at scale |
The benchmark dispute, what it actually means
Zep’s original 84% LOCOMO claim was challenged by Mem0, which published a correction asserting that Zep had included adversarial question categories that the benchmark specification explicitly excludes, reducing Zep’s score to 58.44%. Zep rebutted this, claiming Mem0 misconfigured Zep in their evaluation and that the corrected Zep score is 75.14%, approximately 10% above Mem0’s best configuration. Both methodologies have been questioned by independent observers.
The honest read: treat any single benchmark number from either vendor with caution. The dispute itself reveals that benchmark construction for conversational memory is still immature.
Real-world example: a B2B SaaS customer success agent
Your agent assists account managers by remembering customer preferences, open issues, and renewal history. A customer calls to change their billing contact and mentions that the Q4 upsell discussion from November is no longer relevant.
- With Mem0 (base): The old billing contact remains in vector memory unless explicitly contradicted; the Q4 upsell context may still surface by semantic similarity in future queries.
- With Zep/Graphiti: The billing contact update marks the old entity invalid with a timestamp; the Q4 opportunity is flagged as superseded. A future query for “active opportunities for Acme Corp” returns only current state.
Both approaches work. The difference matters when your agent’s reasoning depends on what is true now vs what was said at some point in the past.
Build Your AI Context Stack
A practical guide to layering memory, context, and governance in production agent systems.
When to use Zep, when to use Mem0, and when you’ll need something more
Permalink to “When to use Zep, when to use Mem0, and when you’ll need something more”Zep and Mem0 solve the same core problem, stateless LLMs forgetting what happened in previous sessions, with different trade-offs. In most production agent stacks, you choose one, not both. Where both hit their limit is the same place: neither was designed to serve as the governed context layer for agents that reason about enterprise data assets, business definitions, or access-controlled data pipelines.
When to prioritize Zep
Permalink to “When to prioritize Zep”- Your agent needs to reason about how facts change over time (customer status, preferences, entity relationships)
- You are building on MCP-compatible tooling (Claude Desktop, Cursor) and want native Graphiti integration
- Your use case involves complex entity graphs where relationship history matters
- You have engineering capacity to manage Graphiti plus a compatible graph database
When to prioritize Mem0
Permalink to “When to prioritize Mem0”- You need a functional memory layer quickly on a managed SaaS or self-hosted Apache 2.0 stack
- Your primary integrations are CrewAI, Flowise, Langflow, or AWS Strands
- You are building for consumer or B2B copilot use cases where semantic similarity is sufficient
- You want graph memory later but need to evaluate before committing to the Pro pricing tier
The benchmark dispute as a signal
Permalink to “The benchmark dispute as a signal”Zep’s “Stop Using RAG for Agent Memory” post and the LOCOMO dispute both reveal something instructive: the industry is arguing over recall accuracy on conversation benchmarks, LoCoMo and LongMemEval measure how well an agent remembers what a user said, not whether an agent knows what net_revenue means in your finance data warehouse.
For teams building enterprise AI agent memory systems, recall accuracy matters, but context quality and governance matter more. Understanding the full landscape of types of AI agent memory helps clarify where conversational memory fits relative to other memory types.
When you’ll need something more
Permalink to “When you’ll need something more”McKinsey finds that 8 in 10 companies cite data limitations, not recall accuracy, as the primary roadblock to scaling agentic AI. Gartner predicts that more than 40% of agentic AI projects will be canceled by end of 2027, citing governance gaps and unclear business value.
The failure mode isn’t that Zep or Mem0 retrieves the wrong conversation fact. It’s that neither tool can tell your agent what revenue_recognized means in your data warehouse, who certified the orders table, or whether this agent is authorized to query that data at all. This is the memory layer vs context layer distinction, and it is architectural, not a feature gap.
Inside Atlan AI Labs & The 5x Accuracy Factor
How governed context produces materially better results than memory alone.
How Atlan approaches the context problem that Zep and Mem0 don’t solve
Permalink to “How Atlan approaches the context problem that Zep and Mem0 don’t solve”Atlan operates at a different layer from Zep and Mem0. Where they provide conversation memory (what the user said before), Atlan provides governed enterprise context: what net_revenue means in finance vs product, who certified the orders table, and whether this agent is authorized to query that column at runtime. Gartner named Atlan a Leader in its 2026 Data & Analytics Governance Platforms Magic Quadrant, citing its metadata control plane as central to agentic solutions.
The challenge
Most enterprise AI teams discover the architectural gap late: the agent recalls user preferences correctly (Mem0/Zep doing their job) but pulls revenue_recognized instead of net_revenue because no governed definition was available at inference time. Neither Zep nor Mem0 was designed to surface semantic layer definitions, column-level lineage, or runtime access policies, these are architectural gaps, not feature gaps. Gartner’s projection that 40%+ of agentic AI projects will be canceled by 2027 cites governance gaps, not recall failures, as the primary cause.
Atlan’s unified approach
Atlan’s context layer addresses five governed memory types that neither Zep nor Mem0 provides:
- Semantic definitions: governed business metrics like
net_revenue, distinct fromrevenue_recognized - Ontology: cross-system identity resolution, the same “customer” in Salesforce, Snowflake, and SAP
- Operational playbooks: routing and disambiguation logic for agent decisions
- Column-level data lineage: provenance across cloud systems, tracked and certified
- Runtime access enforcement: governance at inference time, not just retrieval, agents cannot access unauthorized data even when memory retrieval succeeds
The Context Engineering Studio provides a systematic workflow for building, testing, and deploying governed enterprise context through specialist AI agents and human-in-the-loop workflows. The Context Repo makes every agent read governed definitions via MCP, the same protocol Graphiti’s MCP Server uses, but serving governed enterprise context rather than conversation history. Snowflake named Atlan its 2025 Data Governance Partner of the Year and selected Atlan as the launch partner for Snowflake Intelligence.
See how Atlan’s context layer works as enterprise memory for AI agents: atlan.com/know/atlan-context-layer-enterprise-memory/
AI Context Maturity Assessment
Find out whether your team is ready to move from memory layers to governed context.
Real stories: enterprise context layers in production
Permalink to “Real stories: enterprise context layers in production”"AI initiatives require more context than ever. Atlan's metadata lakehouse is configurable, intuitive, and able to scale to hundreds of millions of assets."
— Andrew Reiskind, Chief Data Officer, Mastercard
"Context is the differentiator. Atlan gave our teams the shared vocabulary and lineage to move from reactive data management to proactive AI enablement across CME Group."
— Kiran Panja, Managing Director, Data & Analytics, CME Group
Memory architecture is a genuine architectural decision
Permalink to “Memory architecture is a genuine architectural decision”Zep and Mem0 are both serious, production-ready frameworks, and the choice between them is a genuine architectural decision, not a “pick the better one” verdict. Zep’s temporal knowledge graph excels when your agent needs to reason about change over time. Mem0’s vector-first approach with broad ecosystem coverage is the faster path to a functional memory layer for most teams.
The benchmark dispute between the two vendors is real, but it signals immaturity in the evaluation methodology, not a clear winner. Both tools are optimizing for recall accuracy on benchmarks that measure conversation memory, not enterprise data context. Zep’s pivot to “context engineering” is the clearest market signal that the field is evolving past simple vector recall.
For enterprise data agents, systems that need to reason about governed business definitions, certified data lineage, and access-controlled pipelines, both Zep and Mem0 are necessary layers, but neither is sufficient as the enterprise context layer. That requires a different architectural component underneath them.
Explore how a governed context layer works alongside agent memory frameworks: Memory layer vs context layer: which do you actually need?
Ready to see how Atlan's context layer works with your agent stack?
Book a DemoFAQs about Zep vs Mem0
Permalink to “FAQs about Zep vs Mem0”1. What is the difference between Zep and Mem0?
Permalink to “1. What is the difference between Zep and Mem0?”Zep stores agent memory in Graphiti, a temporal knowledge graph that tracks when facts were true, not just what they were. Mem0 uses vector embeddings to extract and retrieve salient facts, with an optional graph variant (Mem0g). The core difference: Zep is graph-first with native temporal reasoning; Mem0 is vector-first with graph as an add-on. Both solve the stateless LLM problem; they disagree on the best storage architecture.
2. Is Zep better than Mem0 for AI agents?
Permalink to “2. Is Zep better than Mem0 for AI agents?”It depends on the use case. On LongMemEval with GPT-4o, Zep scores 63.8% vs Mem0’s 49.0%, a meaningful gap. For agents that need temporal reasoning (tracking how facts change over time), Zep’s Graphiti architecture is more capable. For developers who need broad ecosystem integrations, AWS Strands, CrewAI, Flowise, and a fully self-hostable open-source stack, Mem0 is the faster path to production. Neither is universally better.
3. What is Graphiti and how does it relate to Zep?
Permalink to “3. What is Graphiti and how does it relate to Zep?”Graphiti is Zep’s open-source temporal knowledge graph engine (24K+ GitHub stars, Apache 2.0). It powers all of Zep’s memory capabilities. Graphiti stores conversation episodes as graph updates, models entities with validity windows, and supports hybrid retrieval combining semantic embeddings, BM25 keyword search, and graph traversal. As of April 2025, Graphiti is Zep’s only open-source component, Zep Community Edition was deprecated.
4. Does Mem0 support temporal memory?
Permalink to “4. Does Mem0 support temporal memory?”Partially. Base Mem0 retrieves by semantic similarity, it can surface outdated facts if they are semantically closer to the query than the updated fact. Mem0g (graph-enhanced variant) improves temporal handling by about 2% on benchmarks. Zep’s Graphiti explicitly models fact validity windows, making temporal reasoning a first-class feature rather than an improvement layered on top of vector retrieval.
5. Can I self-host Zep in 2026?
Permalink to “5. Can I self-host Zep in 2026?”You can self-host Graphiti (Apache 2.0), but not the full Zep application stack, Zep Community Edition was deprecated in April 2025, with additional feature retirements in February 2026. Self-hosting Graphiti requires provisioning the Graphiti service plus a compatible graph database (Neo4j, FalkorDB, or Kuzu). Mem0 maintains a fully self-hostable Apache 2.0 stack with Docker support.
6. What is the LongMemEval benchmark and how do Zep and Mem0 compare?
Permalink to “6. What is the LongMemEval benchmark and how do Zep and Mem0 compare?”LongMemEval is an evaluation framework for AI agent memory that tests recall accuracy over long conversation histories. On GPT-4o, Zep scores 63.8% and Mem0 scores 49.0%, a 14.8 percentage point gap in Zep’s favor according to an independent test. There is also a separate LOCOMO benchmark dispute: Zep originally claimed 84%, Mem0 corrected this to 58.44%, and Zep counter-claimed 75.14%. Both vendors contest the other’s methodology; treat any single benchmark figure with caution.
7. When should I use Zep instead of Mem0?
Permalink to “7. When should I use Zep instead of Mem0?”Use Zep when your agent needs to reason about how facts change over time, changing customer status, evolving relationships, temporal queries like “what was the account state in Q3?” Use Zep if you are building on MCP-compatible tooling (Claude Desktop, Cursor) and want Graphiti’s native MCP server integration. Use Mem0 if you need the fastest path to a production memory layer with broad ecosystem coverage and full open-source self-hosting.
Sources
Permalink to “Sources”- Zep: A Temporal Knowledge Graph Architecture for Agent Memory, arXiv, January 2025
- Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory, arXiv, April 2025
- Mem0 vs Zep (Graphiti): AI Agent Memory Compared (2026), Vectorize.io, 2026
- Mem0 raises $24M Series A, TechCrunch, October 2025
- Graphiti: Build Real-Time Knowledge Graphs for AI Agents, GitHub, getzep/graphiti
- Revisiting Zep’s 84% LoCoMo Claim, GitHub Issue #5, getzep/zep-papers
- Is Mem0 Really SOTA in Agent Memory?, Zep Blog
- Graphiti hits 20K Stars + MCP Server 1.0, Zep Blog, November 2025
- Zep v3: Context Engineering Takes Center Stage, Zep Blog
- AWS and Mem0 partner for Strands Agent SDK, Mem0, May 2025
- Gartner: 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026, Gartner, August 2025
- Gartner: 40%+ Agentic AI Projects Canceled by 2027, Gartner, June 2025
- Building the Foundations for Agentic AI at Scale, McKinsey
- Shopify CEO and ex-OpenAI researcher agree: context engineering beats prompt engineering, The Decoder
- Announcing a New Direction for Zep’s Open Source Strategy, Zep Blog, April 2025
- Mem0 graph token cost issue, GitHub Issue #2066, mem0ai/mem0
Share this article
