To implement long-term memory for AI agents, choose a persistence framework (Mem0, Zep, or LangGraph Store), install it in Python, and wire three operations into every agent turn: retrieve relevant memories before reasoning, run the LLM call, then store the new exchange. A basic Mem0 integration runs in under 30 minutes and delivers 0.200s p95 retrieval latency — 91% lower overhead than full-context approaches. This guide covers all three frameworks side-by-side with working code, production configuration, and an honest comparison of where each breaks down.
Quick overview:
| Framework | p95 Retrieval Latency | Recall Accuracy | Self-Hosted | Best For |
|---|---|---|---|---|
| Mem0 | 0.200s | 66.9% (vector) | Yes (Qdrant/ChromaDB) | Fast user preference recall; widest integrations |
| Zep | <200ms | 63.8% (LongMemEval) | Cloud only | Temporal/relational queries |
| LangGraph + LangMem | 59.82s (LangMem) | — | Yes (PostgresStore) | LangChain-native teams; procedural memory |
Why implement long-term memory?
Permalink to “Why implement long-term memory?”Every LLM API call is isolated. Even with a 1M-token context window, accuracy degradation begins around 1,000 tokens of injected context — far below advertised limits. Agents built without external memory cannot recall a user’s name, preferences, or prior decisions after a session ends. According to a 2025 AI agent memory survey, 32% of enterprise teams cite output quality as their top barrier to production deployment, tracing directly to this statelessness. For a full explanation of the problem, see Long-Term vs Short-Term AI Memory.
With a memory layer in place, agents can recall past preferences, skip re-asking known facts, and build compounding context over months. Mem0’s 2026 benchmark data shows 66.9% recall accuracy at 0.200s p95 latency — fast enough for real-time production use.[1] The full-context approach achieves 72.9% accuracy but requires 17.12s p95 latency, making it unusable in any agent with a sub-second SLA.[2]
This guide targets Python developers and AI engineers building conversational agents, data assistants, or any agent that serves the same user across multiple sessions. A solo developer can reach a working prototype in an afternoon. For the concept-level companion to this code-first guide, see Memory Layer for AI Agents and In-Context vs External Memory.
Prerequisites
Permalink to “Prerequisites”Before writing any code, confirm the following:
Organizational:
- [ ] Clarity on use case: personal preference recall, cross-session continuity, or temporal/relational queries
- [ ] A user_id strategy: how will you identify users across sessions? (UUID, email hash, or existing auth token)
- [ ] Decision on data residency: can user memory data leave your infrastructure, or do you need self-hosted storage?
Technical:
- [ ] Python 3.9+ (3.11 recommended for async performance)
- [ ] OpenAI or Anthropic API key (both Mem0 and LangMem use an LLM for memory extraction)
- [ ] For Mem0 self-hosted: Qdrant or ChromaDB running locally (Docker preferred)
- [ ] For LangGraph production: PostgreSQL instance
Time:
- Basic implementation: 2–4 hours
- Production-ready with persistence, multi-user scoping, and monitoring: 1–2 days
Difficulty: Intermediate
Step 1: choose your memory framework
Permalink to “Step 1: choose your memory framework”What you’ll accomplish: Compare Mem0, Zep, LangMem, and LangGraph Store across five dimensions so you can commit to one framework before writing any code.
Time: 30 minutes
| Framework | GitHub Stars | p95 Retrieval Latency | Recall Accuracy (LOCOMO) | Self-Hosted | Best For |
|---|---|---|---|---|---|
| Mem0 | 53.3k | 0.200s | 66.9% vector / 68.4% graph (LOCOMO, ECAI 2025) | Yes (Qdrant/ChromaDB) | Fast user preference recall; widest framework integrations |
| Zep | 4.4k | <200ms | 63.8% (LongMemEval) | Cloud only | Temporal/relational queries; CRM-style “who owned X last quarter?” |
| LangMem | — | 59.82s | — | Yes (any LangGraph Store) | Procedural memory (prompt self-improvement); LangChain-native teams |
| LangGraph Store | — | Depends on backend | — | Yes (InMemoryStore → PostgresStore) | Teams already in LangGraph ecosystem; storage-agnostic flexibility |
For a full framework comparison with scoring, see Best AI Agent Memory Frameworks 2026. For a focused Mem0 vs Zep comparison, see Zep vs Mem0.
Decision guidance
Permalink to “Decision guidance”Pick Mem0 if you need something running today, want the largest community and ecosystem (53.3k stars, 19 vector backends, 13 agent integrations), and your use case is user preference recall or conversation continuity. The managed cloud API path requires 5 lines of code.
Pick Zep if your agent needs to reason about how facts changed over time — for example, “Alice was the budget owner in Q4, Bob took over in February.” Zep’s Graphiti engine models valid_at/invalid_at timestamps. No other framework does this cleanly. Be aware that Zep uses 340x more memory per conversation than Mem0 for marginal accuracy gains on most benchmarks.[4]
Pick LangGraph Store + LangMem if your team is already running LangGraph in production and you need procedural memory — an agent that improves its own system prompt over time. Accept the 59.82s extraction latency; it runs asynchronously, not on the hot path.
Never use LangMem for real-time search. The 59.82s p95 latency[3] makes it categorically unusable as a synchronous retrieval step in any agent.
Validation checklist:
- [ ] Use case maps to one framework’s strengths
- [ ] Data residency requirement checked (cloud vs. self-hosted)
- [ ] user_id strategy decided
Step 2: install and configure your chosen framework
Permalink to “Step 2: install and configure your chosen framework”What you’ll accomplish: Get your chosen framework installed, environment variables set, and a smoke-test connection working.
Time: 20 minutes
Mem0, self-hosted with Qdrant (recommended for data residency)
Permalink to “Mem0, self-hosted with Qdrant (recommended for data residency)”pip install mem0ai
# Start Qdrant: docker run -p 6333:6333 qdrant/qdrant
Configure Mem0 to point to your local Qdrant instance:
from mem0 import Memory
import os
config = {
"llm": {
"provider": "openai",
"config": {
"model": "gpt-4.1-nano-2025-04-14",
"api_key": os.getenv("OPENAI_API_KEY")
}
},
"embedder": {
"provider": "openai",
"config": {"model": "text-embedding-3-small"}
},
"vector_store": {
"provider": "qdrant",
"config": {"url": "localhost", "port": 6333}
}
}
memory = Memory.from_config(config)
Mem0, cloud managed API (fastest start)
Permalink to “Mem0, cloud managed API (fastest start)”pip install mem0ai
# Set OPENAI_API_KEY + MEM0_API_KEY
from mem0 import MemoryClient
client = MemoryClient(api_key=os.getenv("MEM0_API_KEY"))
# No vector DB setup needed
Zep
Permalink to “Zep”pip install zep-cloud
# Set ZEP_API_KEY
from zep_cloud.client import Zep
client = Zep(api_key=os.getenv("ZEP_API_KEY"))
# One-time setup per user
client.user.add(user_id="user_jane", email="[email protected]",
first_name="Jane", last_name="Smith")
LangGraph InMemoryStore (dev) → PostgresStore (prod)
Permalink to “LangGraph InMemoryStore (dev) → PostgresStore (prod)”For the deep LangGraph implementation reference, see Long-Term Memory LangChain Agents.
pip install langgraph langmem
from langgraph.store.memory import InMemoryStore # dev only
from langgraph.store.postgres import PostgresStore # production
# Note: agent creation with memory tools is shown in Step 3
# DEV
store = InMemoryStore(
index={"embed": "openai:text-embedding-3-small", "dims": 1536}
)
# PROD: swap to PostgresStore (identical interface — no agent code changes required)
# DB_URI = "postgresql://user:password@localhost:5432/agentdb"
# with PostgresStore.from_conn_string(DB_URI) as store:
# store.setup()
Validation checklist:
- [ ]
OPENAI_API_KEY(orANTHROPIC_API_KEY) set in environment - [ ] Framework-specific API key set (
MEM0_API_KEYfor cloud Mem0,ZEP_API_KEYfor Zep) - [ ] Qdrant running and reachable at localhost:6333 (if self-hosted Mem0)
- [ ] PostgreSQL accessible (if LangGraph production path)
- [ ] Import works:
from mem0 import Memory(or equivalent) without error
Step 3: add memories to your agent
Permalink to “Step 3: add memories to your agent”What you’ll accomplish: Wire memory storage into your agent so every conversation turn is captured.
Time: 30 minutes
Memory should be stored after getting the LLM response, not before. Pass both the user message and assistant response so the extraction LLM has full context to decide what to store. For background on what types of AI agent memory are being created here, see the linked reference.
Mem0 add pattern
Permalink to “Mem0 add pattern”from mem0 import Memory
from openai import OpenAI
memory = Memory.from_config(config) # config from Step 2
openai_client = OpenAI()
USER_ID = "user_alice"
def chat_with_memory(user_input: str) -> str:
# 1. Retrieve relevant memories (Step 4 covers this in detail)
relevant = memory.search(user_input, user_id=USER_ID, limit=3)
memory_context = ""
if relevant:
memory_context = "Known about this user:\n"
for m in relevant:
memory_context += f"- {m['memory']}\n"
# 2. Build messages with memory injected
messages = [
{"role": "system", "content": f"You are a helpful assistant. {memory_context}"},
{"role": "user", "content": user_input}
]
# 3. Get response
response = openai_client.chat.completions.create(
model="gpt-4.1-nano-2025-04-14",
messages=messages
)
answer = response.choices[0].message.content
# 4. Store new memory from this exchange
memory.add([
{"role": "user", "content": user_input},
{"role": "assistant", "content": answer}
], user_id=USER_ID)
return answer
Mem0’s extraction LLM reads these messages and decides whether to ADD, UPDATE, DELETE, or NOOP existing memories. Deduplication is automatic.
Zep add pattern
Permalink to “Zep add pattern”import uuid
from zep_cloud.types import Message
from datetime import datetime, timezone
# Call this at the start of each new session to get a fresh thread_id
def new_conversation() -> str:
thread_id = uuid.uuid4().hex
client.thread.create(thread_id=thread_id, user_id="user_jane")
return thread_id
def chat_with_zep(thread_id: str, user_input: str) -> str:
# 1. Get assembled context from Zep's knowledge graph
context = client.thread.get_user_context(thread_id=thread_id)
context_block = getattr(context, "context", "")
# 2. Respond with context
messages = [
{"role": "system", "content": f"You are a helpful assistant.\n\n{context_block}"},
{"role": "user", "content": user_input}
]
response = openai_client.chat.completions.create(
model="gpt-4.1", messages=messages
)
answer = response.choices[0].message.content
# 3. Add both turns to Zep graph (processed asynchronously)
ts = datetime.now(timezone.utc).isoformat()
client.thread.add_messages(thread_id, messages=[
Message(created_at=ts, name="Jane", role="user", content=user_input),
Message(created_at=ts, name="Assistant", role="assistant", content=answer)
])
return answer
# Usage
thread_id = new_conversation()
answer = chat_with_zep(thread_id, "What datasets have SLA issues this morning?")
Note: Zep processes messages asynchronously into its temporal knowledge graph. Retrieval may not reflect new messages immediately — background graph processing has latency. This is a known production trade-off.
LangGraph store.put() pattern
Permalink to “LangGraph store.put() pattern”from langmem import create_manage_memory_tool, create_search_memory_tool
from langgraph.prebuilt import create_react_agent
USER_ID = "alice"
agent = create_react_agent(
"openai:gpt-4.1",
tools=[
create_manage_memory_tool(namespace=("memories", USER_ID)),
create_search_memory_tool(namespace=("memories", USER_ID)),
],
store=store, # store from Step 2
)
# The agent decides autonomously when to store/retrieve memories
response = agent.invoke({
"messages": [{"role": "user", "content": "Remember: I prefer dark mode."}]
})
Common mistakes:
- Over-storing: Passing entire long conversations every turn bloats the vector store and degrades retrieval precision. Pass only the current turn, not the full history.
- Under-storing: Calling
memory.add()only on “important” turns — the extraction LLM determines importance; let it run on every turn. - Missing user_id: Forgetting to pass
user_idstores memory in a default namespace, making it visible to all users.
Build Your AI Context Stack
Learn how to combine personal context memory with organizational data context for production-ready enterprise AI agents.
Get the Stack GuideStep 4: retrieve memories before reasoning
Permalink to “Step 4: retrieve memories before reasoning”What you’ll accomplish: Inject relevant past memories into the system prompt before every LLM call so the agent can reason with accumulated context.
Time: 20 minutes
Semantic search pattern (Mem0)
Permalink to “Semantic search pattern (Mem0)”relevant = memory.search(user_input, user_id=USER_ID, limit=5)
memory_context = "\n".join(f"- {m['memory']}" for m in relevant)
system_prompt = f"You are a helpful assistant.\n\nKnown about this user:\n{memory_context}"
limit=5 keeps the injected context tight. Increasing the limit improves recall but adds token cost and risks injecting irrelevant memories.
Zep context retrieval
Permalink to “Zep context retrieval”context = client.thread.get_user_context(thread_id=thread_id)
context_block = context.context # pre-assembled by Zep's graph engine
Zep returns a pre-assembled context block — no manual formatting needed. The graph engine ranks facts by relevance and recency.
LangGraph: agent-driven retrieval
Permalink to “LangGraph: agent-driven retrieval”With create_search_memory_tool attached, the LangGraph agent decides when to search. For explicit search: store.search(namespace, query="language preferences").
Context budget management
Permalink to “Context budget management”- Aim for retrieved memories under 300 tokens of the system prompt budget
- Use
limit=3–5for real-time agents; increase only if accuracy tests show recall gaps - Test that memories injected don’t push total context past the point of LLM accuracy degradation — research shows degradation begins around 1,000 tokens of injected context
For the deeper architectural analysis of why in-context vs external memory trade-offs matter here, see the linked guide.
Validation: Start a fresh Python session, add a known memory, end the session, start a new session, and confirm the agent recalls the fact without it being in the conversation history.
Step 5: handle multi-user and namespace scoping
Permalink to “Step 5: handle multi-user and namespace scoping”What you’ll accomplish: Ensure memories for user A are never surfaced to user B — a critical production requirement.
Time: 30 minutes
user_id discipline
Permalink to “user_id discipline”Every memory.add() and memory.search() call must include the user_id of the authenticated session. The user_id should come from your auth layer, never from a request parameter the client can forge.
Namespace patterns
Permalink to “Namespace patterns”- Mem0:
user_idparameter handles isolation natively - LangGraph Store: Namespace tuple enforces scoping — supports org/team/user hierarchy
- Zep:
user_idis set at thread creation and cannot be changed
# LangGraph org-level scoping
namespace_shared = ("org", org_id, "shared_context") # shared across org
namespace_private = ("user", user_id, "preferences") # user-private
store.put(namespace_private, "memory-key", {"data": "value"})
store.get(namespace_private, "memory-key")
Production concern: memory leakage
Permalink to “Production concern: memory leakage”The most common isolation failure: a shared Memory() instance where user_id is accidentally omitted or hardcoded. In a multi-tenant API, this exposes one user’s memories to all users.
Validation:
- [ ] Create two test user_ids:
user_aliceanduser_bob - [ ] Add a memory under
user_alice - [ ] Search under
user_bob— result must be empty - [ ] Confirm no cross-contamination in the vector store
Step 6: move to production (persistence and async)
Permalink to “Step 6: move to production (persistence and async)”What you’ll accomplish: Replace dev-only in-memory stores with persistent backends, enable async writes so memory storage doesn’t block agent responses, and configure custom extraction prompts for domain-specific memory quality.
Time: 1–2 hours
Mem0: cloud vs. self-hosted
Permalink to “Mem0: cloud vs. self-hosted”Cloud Mem0 (MemoryClient): Zero infrastructure. Preferred for teams without data residency requirements. Write latency is the cloud round-trip.
Self-hosted with Qdrant: Deploy Qdrant via Docker, configure Mem0 with Memory.from_config(config) pointing to localhost:6333. Required for EU data residency, HIPAA, or any use case where user memory cannot leave your infrastructure.
LangGraph: InMemoryStore → PostgresStore upgrade
Permalink to “LangGraph: InMemoryStore → PostgresStore upgrade”# DEV
store = InMemoryStore(index={"embed": "openai:text-embedding-3-small", "dims": 1536})
# PROD: swap in with identical interface — no agent code changes required
DB_URI = "postgresql://user:password@localhost:5432/agentdb"
with PostgresStore.from_conn_string(DB_URI) as store:
store.setup()
Async write pattern
Permalink to “Async write pattern”As of Mem0 v1.0.0+, async_mode=True is available. Memory writes happen in a background thread so they do not add latency to the agent’s response time. This is critical for any agent with a sub-second SLA.
Custom extraction prompts
Permalink to “Custom extraction prompts”Use MemoryConfig to pass a domain-specific extraction prompt. For a data engineering agent, instruct Mem0 to capture “preferred SQL dialect, data stack, active projects, blockers.” This prevents generic extraction and reduces irrelevant memory retrieval.
from mem0 import Memory
from mem0.configs.base import MemoryConfig
custom_extraction_prompt = """
Extract key facts focusing on:
1. Personal preferences and constraints
2. Professional context and role
3. Technical background and stack
4. Goals, blockers, and priorities
Conversation: {messages}
Format as clear, concise facts.
"""
config = MemoryConfig(
vector_store={"provider": "qdrant", "config": {"url": "localhost", "port": 6333}},
llm={"provider": "openai", "config": {"model": "gpt-4.1-nano-2025-04-14"}},
embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}},
)
memory = Memory(config=config)
For context on what the AI agent cold start problem means for memory systems at scale, see the linked reference.
Step 7: monitor and govern long-term memory
Permalink to “Step 7: monitor and govern long-term memory”What you’ll accomplish: Detect and evict stale memories, and understand where personal-context memory frameworks hit their ceiling for enterprise use cases.
Time: Ongoing
Staleness detection and eviction
Permalink to “Staleness detection and eviction”- Mem0: Supports
memory.update(memory_id, data)andmemory.delete(memory_id). Build a staleness check that retrieves all memories for a user (memory.get_all(user_id=USER_ID)) and evicts facts older than a configurable TTL. - Zep: The temporal graph handles
invalid_attimestamps natively — facts that are superseded are automatically marked stale without manual eviction. - LangGraph Store: No built-in TTL. Implement a scheduled job that scans namespaces and removes entries past a defined age.
What no framework solves: the “active forgetting” gap
Permalink to “What no framework solves: the “active forgetting” gap”Community analysis (DEV.to, 2025) notes that all current frameworks treat memory as retrieval but lack a mechanism for when to retrieve and when to forget. No framework implements relevance decay. Irrelevant old memories continue to surface until manually deleted.
As a practical mitigation, implement two patterns:
- TTL sweep: Schedule a daily job that calls
memory.get_all(user_id=USER_ID)and deletes any memory with acreated_attimestamp older than 90 days (or your domain-appropriate TTL). - Re-confirmation prompt: When the agent uses a fact older than 30 days, append to the system prompt: “You last confirmed [X] on [date] — verify this is still accurate before acting on it.” This surfaces staleness to the user without deleting the memory prematurely.
The enterprise governance gap
Permalink to “The enterprise governance gap”Personal memory frameworks (Mem0, Zep, LangGraph) store what a user said. They cannot store what an organization knows: which datasets are certified, what lineage relationships apply, who owns which data, or which governance policies are in force. That requires a live metadata layer — not a vector store. For the architectural distinction, see Active Metadata as AI Agent Memory and Vector Database vs Knowledge Graph for Agent Memory.
Inside Atlan AI Labs
See how enterprise teams are combining personal context memory with live organizational data context to build agents that actually work in production.
Download E-BookCommon pitfalls
Permalink to “Common pitfalls”The most expensive implementation mistakes are namespace failures (memory leakage between users), LangMem latency surprises (59.82s p95 on production search), Zep async delay (new memories not immediately retrievable), and treating memory as a complete solution for enterprise data context.
Over-storing (token overhead, irrelevant retrieval)
Permalink to “Over-storing (token overhead, irrelevant retrieval)”Passing full conversation histories to memory.add() bloats the vector store and causes retrieval to surface irrelevant older context. Mem0’s graph mode with custom extraction prompts mitigates this — but at up to 15x higher token cost for large datasets (GitHub Issue #2066). Pass only the current turn.
Single namespace for all users (privacy and isolation failures)
Permalink to “Single namespace for all users (privacy and isolation failures)”The most common production bug: a shared Memory() instance with no user_id — all users’ memories are stored together and retrieved interchangeably. Always bind every read and write call to an authenticated user_id from your session layer.
No staleness management (stale context corrupts agent decisions)
Permalink to “No staleness management (stale context corrupts agent decisions)”Memories added six months ago may contradict current facts — for example, a user changed their preferred stack from PyTorch to JAX. Without TTL policies or manual eviction, stale memories inject false context. Implement a scheduled eviction job from day one.
Ignoring the enterprise knowledge layer
Permalink to “Ignoring the enterprise knowledge layer”Mem0 and Zep solve personal context memory. They cannot tell an agent which columns in sales.pipeline are certified, which ones carry EU data residency restrictions, or who to contact when an SLA breach occurs. This is a different problem requiring live metadata access.
Real stories from real customers: AI context in production
Permalink to “Real stories from real customers: AI context in production”"We're excited to build the future of AI governance with Atlan. All of the work that we did to get to a shared language at Workday can be leveraged by AI via Atlan's MCP server…as part of Atlan's AI Labs, we're co-building the semantic layer that AI needs with new constructs, like context products."
— Joe DosSantos, VP of Enterprise Data & Analytics, Workday
"Atlan is much more than a catalog of catalogs. It's more of a context operating system…Atlan enabled us to easily activate metadata for everything from discovery in the marketplace to AI governance to data quality to an MCP server delivering context to AI models."
— Sridher Arumugham, Chief Data & Analytics Officer, DigiKey
Implementing long-term memory: what to do next
Permalink to “Implementing long-term memory: what to do next”With a working memory implementation in place, the next steps depend on your production goals.
Multi-agent memory sharing: Once single-agent memory works, explore shared namespaces in LangGraph Store for agents within the same organization that need access to the same context pool.
Procedural memory: If your agent repeatedly makes the same type of mistake, add LangMem’s create_prompt_optimizer to let the agent improve its own system prompt from user feedback trajectories.
from langmem import create_prompt_optimizer
optimizer = create_prompt_optimizer(
"anthropic:claude-3-5-sonnet-latest",
kind="metaprompt",
config={"max_reflection_steps": 3}
)
improved_prompt = optimizer.invoke({
"trajectories": trajectories, # (conversation_turns, feedback) pairs
"prompt": "You are a helpful AI assistant"
})
Governance layer: For data agents, connect Atlan’s MCP alongside Mem0 or Zep — personal context from the memory framework, organizational data context from Atlan.
How to measure success:
- Recall accuracy: test whether the agent correctly answers questions about facts from previous sessions
- Session continuity score: percentage of sessions where the agent greets the user with accurate prior context
Related resources:
- How to Build a Memory Layer for AI Agents — architecture-first companion to this code-first guide
- Best AI Agent Memory Frameworks 2026 — full framework comparison with scoring
FAQs about implementing long-term memory for AI agents
Permalink to “FAQs about implementing long-term memory for AI agents”1. How do you add memory to an AI agent?
Permalink to “1. How do you add memory to an AI agent?”Wrap the agent’s LLM call with two operations: before the call, run memory.search(user_input, user_id=user_id, limit=5) to retrieve relevant past context and inject it into the system prompt; after the call, run memory.add([user_turn, assistant_turn], user_id=user_id) to store the new exchange. That’s the complete pattern for Mem0.
2. What is long-term memory in an AI agent?
Permalink to “2. What is long-term memory in an AI agent?”Long-term memory is a persistent external store — vector database, knowledge graph, or relational DB — that survives between LLM API calls and sessions. Unlike in-context memory (conversation history in the prompt), long-term memory is indexed and retrieved semantically, so it scales beyond token limits.
3. How does Mem0 work for AI agent memory?
Permalink to “3. How does Mem0 work for AI agent memory?”Mem0 intercepts messages passed to memory.add(), runs an LLM-based extraction pass to identify ADD/UPDATE/DELETE/NOOP operations, and stores extracted facts as vector embeddings in a configurable backend (Qdrant, ChromaDB, and 17 others). On retrieval, memory.search() runs a semantic search and returns ranked facts. Mem0 achieves 66.9% recall accuracy at 0.200s p95 latency on the LOCOMO benchmark.
4. What is the difference between LangMem and Mem0?
Permalink to “4. What is the difference between LangMem and Mem0?”Mem0 is a standalone memory framework optimized for speed (0.200s p95 latency) and breadth (19 vector backends). LangMem is a LangChain SDK that adds semantic, episodic, and procedural memory on top of LangGraph Store — its unique feature is prompt optimization (an agent that improves its own system prompt over time). LangMem’s p95 search latency is 59.82 seconds; it should not be used for synchronous retrieval.
5. Can LangChain agents have persistent memory?
Permalink to “5. Can LangChain agents have persistent memory?”Yes. LangGraph Store paired with PostgresStore for production provides persistent memory for any LangGraph agent. Swap InMemoryStore (dev-only, lost on restart) for PostgresStore.from_conn_string(DB_URI) with a one-line change. The store object interface is identical — no agent code changes required.
6. How do I implement cross-session memory in Python?
Permalink to “6. How do I implement cross-session memory in Python?”Use Mem0’s open-source Memory class with a persistent vector backend: configure Qdrant (self-hosted) or use the Mem0 cloud client. Bind every add() and search() call to a stable user_id. Memory persists across process restarts because it lives in the vector store, not in Python memory.
7. What vector database should I use for AI agent memory?
Permalink to “7. What vector database should I use for AI agent memory?”For local development: ChromaDB (no Docker needed). For self-hosted production: Qdrant (best Mem0 integration, scales to billions of vectors). For teams already on LangGraph: PostgresStore with the pgvector extension. For temporal and relational queries: Zep’s Graphiti engine (hosted).
8. What is Zep used for in AI agents?
Permalink to “8. What is Zep used for in AI agents?”Zep builds a temporal knowledge graph from conversation messages and structured business data. Its primary advantage over Mem0 is understanding how facts change over time — Zep can answer questions like “who owned this account in Q3 vs Q4?” using valid_at/invalid_at edge timestamps. Best for CRM-style agents, support agents with account history, and any use case where temporal reasoning about changing facts matters.
Sources
Permalink to “Sources”- Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory, Mem0 / ECAI 2025
- State of AI Agent Memory 2026, Mem0
- Long-Term Memory LangChain Agents: LangGraph and LangMem Guide, Atlan
- Memory Is the Unsolved Problem of AI Agents, DEV Community
- Launching Long-Term Memory Support in LangGraph, LangChain
- MemLayer vs Mem0 vs Zep: Choosing the Right Memory System for Your AI Agents, DEV Community
- LangMem SDK Launch, LangChain
- Long-Term Agentic Memory with LangGraph, DeepLearning.AI
- Zep Quick Start Guide, Zep
- Agent Memory Comparison: Letta vs Mem0 vs Zep vs Cognee, Letta forum
Share this article
