Semantic Memory vs Procedural Memory for AI Agents (2026)

Q: Where is procedural memory stored in AI agents?

The CoALA framework identifies three substrates: in-weights (baked into LLM parameters via pretraining/fine-tuning), code-embedded (agent executor logic, tool definitions, workflow graphs), and explicit instruction sets (system prompts, managed rule libraries). LangMem is currently the only major framework with first-class support for updating the explicit instruction set substrate at runtime.

Q: What is the best storage backend for semantic memory in AI agents?

It depends on the use case. Vector databases are fast and scalable but have no concept of authority or staleness. Knowledge graphs enable multi-hop reasoning and entity deduplication. Hybrid approaches offer the best of both. For enterprise data agents specifically, governed data catalogs add the certification and lineage layer that standard memory tools lack.

Emily Winks

Data Governance Expert

Updated:04/17/2026

Published:04/17/2026

21 min read

See the Context Lakehouse Take Context Maturity Quiz

Key takeaways

Semantic memory stores what the agent knows; procedural stores how it behaves. Each needs a different storage backend.
Storing business rules in vector DBs causes non-deterministic retrieval; facts in system prompts go stale silently.
The CoALA framework formalizes three procedural memory substrates with different update costs and governance requirements.

What is the difference between semantic and procedural memory in AI agents?

Semantic memory is what an AI agent knows: facts, definitions, and entity relationships it can retrieve and state. Procedural memory is how the agent behaves: rules, skills, and decision logic it follows automatically. Conflating the two is the most common enterprise agent architecture failure.

The key differences:

Semantic memory answers "What is X?" and is stored in vector DBs, knowledge graphs, or governed data catalogs
Procedural memory answers "How should I act?" and is stored in LLM weights, system prompts, or agent executor code
Different governance regimes: semantic memory needs certification and freshness; procedural memory needs versioning and rollback

Is your data ready for AI agents?

Assess Context Maturity

Semantic memory is what an AI agent knows: facts, definitions, entity relationships. Procedural memory is how it behaves: rules, skills, decision logic. Most enterprise teams treat both as interchangeable “memory” and store them in the same places. That architectural conflation produces specific, traceable failures: agents that answer confidently with outdated business rules, policy drift across agent fleets, and governed metrics retrieved from unverified vector stores. This page gives you a diagnostic framework to separate them and build agents that get both right.

Dimension	Semantic Memory	Procedural Memory
What it is	The agent’s world knowledge: facts and concepts it can state and reason over	The agent’s behavioral programming: skills and rules it follows automatically
What it stores	Definitions, business terms, entity properties, domain facts, certified metrics	Decision rules, workflows, agent persona, tool-selection logic, constraint policies
Where it lives	Vector DB, knowledge graph, structured DB, enterprise data catalog	LLM weights (in-weights), system prompt (explicit instructions), agent executor code (code-embedded)
How it’s updated	Automated extraction, episodic consolidation, human curation, continuous metadata ingestion	Fine-tuning / RLHF, prompt engineering, LangMem prompt optimization, code deployment
Failure mode	Authority vacuum: conflicting facts with no resolution mechanism; silent staleness	Governance gap: unversioned per-agent prompts; procedural drift via self-modification
Best for	“What is X?” for knowledge retrieval, entity resolution, fact grounding	“How should I act?” for workflow enforcement, behavioral constraints, compliance rules

Semantic vs procedural memory: what’s the difference?

Semantic memory covers what the agent knows: facts, definitions, relationships it can retrieve and state. Procedural memory covers how the agent behaves: rules, skills, and decision patterns it follows without consciously retrieving them.

The distinction originates in cognitive science. Endel Tulving (1972) described semantic memory as a “mental thesaurus,” structured, context-independent world knowledge that underlies language use. Larry Squire (1987) formalized procedural memory as “knowing how” vs. semantic memory’s “knowing that,” categorizing it as non-declarative implicit memory that operates automatically. For AI agents, the distinction is architectural: the two types require different storage backends, different update mechanisms, and different governance regimes.

A concrete example illustrates the difference: “What does net_revenue_q4 mean?” draws on semantic memory; the agent retrieves a fact. “Always use certified tables for revenue queries” is procedural memory; the agent follows a rule automatically. The CoALA framework (arXiv:2309.02427, TMLR 2024) formalizes both as distinct substrates in cognitive architectures for language agents.

The reason teams conflate them is straightforward: both live in what practitioners call the “memory layer”, so teams reach for the same tool (vector DB) for both. The early RAG pattern (“embed everything, retrieve everything”) trained engineers to think of vector stores as universal memory. But the CoALA framework is explicit that procedural memory “must be initialized by the designer” and is the riskiest learning target because code modifications can introduce bugs. Conflation causes business rules stored in vector DBs to have no authority mechanism, while facts in system prompts have no update trigger; both fail silently in production.

What is semantic memory in AI agents?

Semantic memory in AI agents is the store of facts, definitions, and world knowledge the agent retrieves at inference time to ground its answers. In enterprise settings, it holds business terms, metric definitions, entity relationships, and certified domain knowledge. Unlike episodic memory (what happened) or procedural memory (how to act), semantic memory answers the question “what is X?”; its reliability depends entirely on whether the underlying store has authority mechanisms, not just similarity search.

Semantic memory takes two forms: explicit (retrievable from an external store) and implicit (baked into model weights at training, the in-weights semantic memory that cannot be updated without retraining). The in-weights form explains the knowledge cutoff problem: facts encoded during pretraining cannot be corrected by retrieval alone. For enterprise data agents, this is why external, governed semantic stores are not optional; the agent’s in-weights world knowledge is both frozen and generalized, lacking the certified business definitions a production data agent requires.

In enterprise contexts, semantic memory stores: asset definitions and business terms (“What is ARR?”), certified metric definitions with approval metadata (net_revenue_q4, version 3.2, approved by Finance 2026-01-15), cross-system entity resolution (“customer” in CRM = “org” in support = “account” in billing), column-level lineage and ownership, and user profile data consolidated from episodic interactions. CME Group cataloged 18 million assets and over 1,300 certified glossary terms in year one; production evidence that governed semantic memory at enterprise scale is achievable, not aspirational.

Semantic memory evolves via four mechanisms: automated extraction (LLMs pull facts from conversation turns, Mem0’s approach), episodic consolidation (patterns distilled from episodic memories become durable semantic facts), human curation (domain experts write and certify definitions, the enterprise governance path), and continuous ingestion (metadata crawlers extract facts from connected sources). In-weights semantic memory cannot be updated without fine-tuning; it carries the highest update cost of any memory type.

Core components of semantic memory

Knowledge base / fact store: Structured repository of definitions, concepts, and entity properties; the queryable layer of what the agent “knows”
Ontology and schema: The relational structure connecting facts, showing how “revenue” relates to “ARR” relates to “net revenue,” enabling multi-hop reasoning beyond single-fact retrieval
Factual assertions with provenance: Each fact linked to its source, update timestamp, and (in governed systems) approval status and version history
Temporal validity tracking: Mechanism to flag stale or deprecated content. Native in governed catalogs; absent by default in vector DBs
Entity resolution layer: Cross-system deduplication ensuring “customer,” “org,” and “account” resolve to the same canonical entity across sources
In-weights world knowledge: The implicit semantic substrate baked into LLM parameters. Broad but frozen; cannot be corrected at inference without RAG or fine-tuning

What is procedural memory in AI agents?

Procedural memory in AI agents encodes how the agent behaves: its skills, decision rules, workflow patterns, and behavioral constraints. Unlike semantic memory, it is not retrieved at inference time; it operates automatically, shaping every action the agent takes. Squire’s cognitive science framing maps directly to AI agents: “knowing how” to perform without conscious access to the encoding.

Critically, procedural memory is not the same as semantic memory about procedures. A description of “what our invoice approval process is” is semantic content. The rule that makes the agent follow that process is procedural content. The distinction matters because the two require different storage and different governance; the most common conflation failure begins here.

The CoALA framework (arXiv:2309.02427) identifies three distinct storage substrates for procedural memory. Understanding the in-context vs. external memory axis helps contextualize these: in-weights memory is in-model, while code-embedded and system prompt substrates are external.

In-weights (LLM parameters): Skills baked in through pretraining and fine-tuning: reading, coding, reasoning. Deepest and most stable; highest update cost. Risk: catastrophic forgetting during fine-tuning.
Code-embedded (agent executor): Routing logic, tool definitions, workflow graphs (LangChain tool schemas, LangGraph edge conditions). Auditable via version control; requires deployment to change.
Explicit instruction sets (system prompt / managed rule libraries): The most flexible substrate. LangMem (LangChain) focuses here specifically; agents update their own system prompt instructions via meta-prompt optimization. Updatable at runtime without model or code changes, but ungoverned by default.

Update mechanisms follow the substrate: fine-tuning for in-weights (slow, expensive, risk of catastrophic forgetting), prompt engineering for explicit instructions (fast, cheap, brittle at fleet scale), LangMem prompt optimization for automated instruction refinement, and code deployment for executor-embedded logic. The MACLA framework (arXiv:2512.18950) represents an emerging research approach: a frozen LLM plus external procedural store, updated via Bayesian selection and contrastive refinement.

Build Your AI Context Stack

The practical guide to building a governed context stack for enterprise AI agents: semantic memory, context layers, and what to build in what order.

Get the Stack Guide

Core components of procedural memory

Action policies: Step-by-step rules governing how the agent handles specific task types (invoice approval, PR review, data access request)
Tool schemas and selection heuristics: Which tools to call, in what order, under what conditions; the routing logic of the agent
Reasoning patterns: Chain-of-thought scaffolding, decomposition strategies, fallback logic when primary approaches fail
Constraint and compliance rules: Data access policies, GDPR/SOX/HIPAA enforcement rules, source-of-truth routing (always use certified tables for revenue)
Agent persona and format rules: Tone, response structure, and escalation triggers; the behavioral style layer
Learned behavioral updates: User-specific interaction patterns refined over time via LangMem or similar prompt optimization

Semantic vs procedural: head-to-head comparison

The sharpest way to distinguish semantic from procedural memory is to ask what breaks when each is misused. Semantic memory stored in ungoverned vector DBs produces authority vacuums: contradictory answers with no resolution mechanism. Procedural memory stored in per-agent system prompts produces governance gaps: policy drift across agent fleets with no audit trail. The failure modes are different, the storage requirements are different, and the governance regimes are different.

Dimension	Semantic Memory	Procedural Memory
Primary focus	What the agent knows (“knowing that”)	How the agent behaves (“knowing how”)
Storage substrate	Vector DB, knowledge graph, structured DB, enterprise data catalog	LLM weights, system prompt, agent executor code
Retrieval mechanism	Explicit retrieval at inference: similarity search or graph traversal	Implicit; automatically applied with no retrieval call for in-weights or system prompt
Update frequency	Continuous (automated extraction) to periodic (human curation)	Infrequent for in-weights; on-demand for prompts
Update cost	Low for vector writes; high for certification and lineage	Low for prompt edits; very high for fine-tuning
Who owns it	Data teams, domain experts, knowledge management	ML engineers (in-weights), platform engineers (code), agent designers (prompts)
Failure mode	Authority vacuum: similarity retrieval returns conflicting or stale facts without flagging	Governance gap: prompt changes unversioned; fleet-wide policy updates require manual per-agent edits
Enterprise risk	Stale metric definitions used for production decisions; conflicting “revenue” definitions across teams with no resolution mechanism	Procedural drift (SSGM, arXiv:2603.11768): gradual reinforcement of suboptimal workflows; no rollback path
Tool examples	Mem0, Zep, LangMem entity profiles, Atlan Context Layer	LangMem prompt optimizer, Letta instruction blocks, MACLA (arXiv:2512.18950)
What breaks when misused	Business rules stored here → retrieval inconsistency, version confusion, outdated procedures applied confidently	Domain facts stored here → facts don’t update when world changes; silent staleness across agent fleet

The certified table scenario. An enterprise data agent receives the rule: “Always use certified tables for revenue calculations.” The team stores this rule in a vector DB alongside factual content: metric definitions, schema documentation.

The result: the rule is retrieved by cosine similarity, sometimes. When a semantically similar but outdated version exists in the same store (“prefer certified tables where available”), retrieval is non-deterministic. On some queries, the agent applies the current rule; on others, it retrieves the older version. The agent never flags the inconsistency; both answers arrive with equal confidence.

The fix: the rule belongs in the system prompt (explicit procedural memory) or, at enterprise scale, in a centrally managed instruction library with version control and deployment governance. The fact about what “certified” means belongs in the semantic store: certified by whom, when, and per what lineage. Same topic, different memory types, different storage, different update path.

The conflation problem: what goes wrong when teams confuse them

The most common enterprise agent failure is not a capability gap; it is an architectural misclassification. Teams store business rules (procedural) in vector databases (semantic infrastructure) and hard-code facts (semantic) into system prompts (procedural infrastructure). Both patterns fail silently, at scale, in production.

Problem 1: authority vacuum (procedural content in semantic stores). Vector databases resolve queries by cosine similarity, not correctness or recency. When a business rule is stored alongside similar-but-outdated rules, retrieval is non-deterministic; the agent may retrieve the current rule on one query and the deprecated version on the next. No conflict resolution mechanism, no staleness flag, no “current version” concept. As practitioners have noted, “Memory Is Not a Vector Database”; agents need beliefs, not just storage. Teams see agents following inconsistent procedures with no error signal. An enterprise data catalog with certification and lineage, like Atlan’s context layer, eliminates the authority vacuum by providing a single, governed semantic source that similarity search cannot override.

Problem 2: silent staleness (semantic content in procedural stores). Facts hard-coded into system prompts do not update when the world changes. A metric is redefined; a product is renamed; a data source is deprecated. The system prompt does not know. At hundreds-of-agents scale, the update surface becomes unmanageable: every agent requires an independent edit with no centralized trigger. EY’s survey (March 2026) found 78% of leaders admit AI adoption already outpaces their ability to manage the risks it creates. Silent staleness is one concrete mechanism behind that figure.

Problem 3: procedural drift via self-modification. LangMem and similar frameworks enable agents to rewrite their own system prompt instructions based on conversation feedback. Without consistency verification and temporal decay modeling, agents gradually reinforce suboptimal workflows. The SSGM framework (arXiv:2603.11768, 2026) formalizes this as “procedural drift,” a documented production failure mode requiring explicit governance. The SSGM recommendation, consistency verification and temporal decay modeling, provides the governance scaffolding LangMem’s prompt optimization currently lacks. The stability-plasticity dilemma is real: agents that learn too readily lose the stable behavioral foundation that makes them trustworthy.

The diagnostic rule:

If it answers “what is X?” → it is semantic content → it belongs in a governed semantic store with authority and freshness mechanisms.
If it answers “how should I act?” → it is procedural content → it belongs in a versioned, centrally managed instruction layer with rollback.

How semantic and procedural memory work together

Semantic and procedural memory are not alternatives. Well-architected agents need both, working in concert. Semantic memory provides the facts the agent reasons over; procedural memory determines how it applies those facts. The failure modes compound when either is absent: an agent with rich semantic memory but weak procedural memory will know what “certified table” means but not consistently enforce the rule to use only certified tables.

Semantic provides the facts; procedural applies the rules

Semantic memory answers: “What does net_revenue_q4 mean? Who certified it? What is its lineage?” Procedural memory answers: “When this agent runs a revenue query, it must use only certified, Finance-approved tables, no exceptions.” The two work in tandem at inference: the agent retrieves the semantic fact and applies the procedural rule governing how to use it. If the procedural rule is absent or inconsistent, having a certified semantic definition is not enough; agents may still use uncertified sources.

When to invest in each

Early-stage agents: Invest in semantic memory first. Build the knowledge base and entity resolution layer before tuning behavioral rules. Agents without facts cannot benefit from rules about how to use them.
Scaling agents: Shift investment to procedural governance. Centralized, versioned instruction management becomes critical once agents operate across dozens of workflows. Unversioned per-agent prompts become unmanageable at this stage.
Enterprise-regulated agents: Both must be governed simultaneously. Data access policies (procedural) and certified metric definitions (semantic) carry equal compliance weight. A well-defined metric in an ungoverned query rule is just as dangerous as an ungoverned metric in a well-enforced rule.
Signal: If your agent answers “what” questions poorly → semantic gap. If it behaves inconsistently → procedural gap.

LangMem as a framework handling both

LangMem (LangChain) is currently the only major framework with first-class support for both types. It implements semantic memory via entity profiles stored in a key-value plus vector store, backed by LangGraph. It implements procedural memory via prompt optimization; agents update their own system instructions based on feedback, using meta-prompt, gradient, or single-step algorithms. The important caveat: LangMem’s procedural memory requires governance guardrails to prevent drift (the SSGM warning applies). First-class support for procedural memory is not the same as governed procedural memory. For enterprise agent memory frameworks more broadly, consult the full comparison.

Inside Atlan AI Labs and The 5x Accuracy Factor

How enterprise teams are achieving measurable accuracy improvements by grounding AI agents in governed semantic memory, the research behind Atlan AI Labs findings.

Download E-Book

How Atlan approaches semantic and procedural memory

Enterprise data agents face a specific version of the semantic memory governance problem that chatbot-centric frameworks do not address: the need for certified, versioned, lineage-aware definitions at inference time. Standard semantic memory tools (Mem0, Zep, LangMem entity profiles) return facts by similarity. They cannot distinguish between a certified definition and a deprecated one. The authority vacuum is most acute in data agents: “revenue” commonly carries conflicting definitions across finance, marketing, and operations; agents need a single authoritative answer.

Atlan’s context layer was built specifically for this problem. It is not a generic vector store; it is a governed semantic memory substrate where every fact carries approval metadata, version history, and cross-system entity resolution.

What this means in practice:

Certified canonical definitions: net_revenue_q4 approved by Finance, 2026-01-15, version 3.2. A governed fact with approval metadata, not just an embedding.
Cross-system entity resolution: “customer” in CRM = “org” in support = “account” in billing, resolved to a single canonical entity, enabling consistent agent reasoning across systems.
Column-level lineage: Agents know not just what a metric means but where it comes from, who owns it, and what transformations produced it.
Active metadata: Continuously updated from 100+ connected sources, not a static snapshot subject to silent staleness.
Inference-time policy enforcement: Governance enforced at reasoning time, not only at ingestion; agents cannot use uncertified definitions for production decisions.

For procedural memory, Atlan’s governance layer serves as the authoritative source for data access policies, enabling centralized rule updates with audit trails, solving the governance gap that unversioned per-agent prompts create.

CME Group cataloged 18 million assets and 1,300+ certified glossary terms in year one using Atlan. The data catalog as LLM knowledge base pattern that CME Group deployed demonstrates governed semantic memory at enterprise scale is production-ready. For the full architectural picture, see how Atlan’s context layer functions as enterprise memory. Similarly, Workday’s revenue agent “couldn’t answer one question” until it gained access to shared semantic definitions via Atlan’s MCP server, a direct demonstration of the authority vacuum problem and its resolution.

Real stories from real customers: governed semantic memory in production

"We're excited to build the future of AI governance with Atlan. All of the work that we did to get to a shared language at Workday can be leveraged by AI via Atlan's MCP server…as part of Atlan's AI Labs, we're co-building the semantic layer that AI needs with new constructs, like context products."

— Joe DosSantos, VP of Enterprise Data & Analytics, Workday

Watch Now →

"Atlan is much more than a catalog of catalogs. It's more of a context operating system…Atlan enabled us to easily activate metadata for everything from discovery in the marketplace to AI governance to data quality to an MCP server delivering context to AI models."

— Sridher Arumugham, Chief Data & Analytics Officer, DigiKey

Watch Now →

Architectural clarity is the competitive advantage

Most enterprise teams building AI agents are not failing on capability; they are failing on memory architecture. The distinction between semantic and procedural memory is not academic; it maps directly to specific production failure modes that no LLM capability improvement will fix. An agent that stores business rules in a vector database will fail non-deterministically regardless of the model powering it. An agent with facts hard-coded into system prompts will serve stale answers regardless of how accurate its retrieval is.

The resolution is straightforward in principle: semantic memory belongs in a store with authority, freshness, and governance, not just similarity. Procedural memory belongs in a versioned, centrally managed instruction layer, not a per-agent system prompt edited by hand. Gartner research projects that by 2030, 50% of enterprise AI agent deployment failures will be due to insufficient AI governance platform runtime enforcement, not capability gaps. The architectural clarity this page provides is the starting point for avoiding that outcome.

Book a Demo

FAQs

1. What is the difference between semantic memory and procedural memory in AI agents?

Semantic memory is what the agent knows: facts, definitions, entity relationships it can retrieve and state. Procedural memory is how the agent behaves: rules, skills, and decision patterns it follows automatically without an explicit retrieval step. Semantic answers “what is X?”; procedural answers “how should I act?” The cognitive science origin: Tulving (1972) for semantic (“knowing that”), Squire (1987) for procedural (“knowing how”).

2. Where is procedural memory stored in AI agents?

The CoALA framework (arXiv:2309.02427) identifies three substrates: in-weights (baked into LLM parameters via pretraining/fine-tuning), code-embedded (agent executor logic, tool definitions, workflow graphs), and explicit instruction sets (system prompts, managed rule libraries). LangMem is currently the only major framework with first-class support for updating the explicit instruction set substrate at runtime.

3. How do AI agents update their semantic memory?

Via four mechanisms: automated extraction (LLMs pull facts from conversation turns, Mem0’s approach), episodic consolidation (patterns from episodic memories become durable facts), human curation (domain experts write and certify definitions), and continuous ingestion (metadata crawlers extract from connected sources). In-weights semantic memory, world knowledge baked into LLM parameters, cannot be updated without retraining.

4. Can an AI agent rewrite its own procedural memory?

Yes, with frameworks like LangMem. Agents can update their own system prompt instructions via prompt optimization algorithms. This is powerful but carries stability risk: ungoverned self-modification leads to procedural drift, gradual reinforcement of suboptimal workflows. The SSGM framework (arXiv:2603.11768, 2026) documents this as a production failure mode and recommends consistency verification and temporal decay modeling.

5. What is the best storage backend for semantic memory in AI agents?

It depends on the use case. Vector databases (Pinecone, Weaviate, pgvector) are fast and scalable but have no concept of authority or staleness. Knowledge graphs (Neo4j) enable multi-hop reasoning and entity deduplication. Hybrid approaches (Mem0’s graph-enhanced memory: 68.4% accuracy vs. 66.9% vector-only) offer the best of both. For enterprise data agents specifically, governed data catalogs add the certification and lineage layer that standard memory tools lack.

6. What happens when you store business rules in a vector database?

Retrieval becomes non-deterministic. Vector databases resolve queries by cosine similarity, not by recency, authority, or “currently in effect” status. When similar but conflicting versions of a rule exist in the store, agents may retrieve the current rule on one query and a deprecated version on the next. There is no conflict resolution, no staleness flag, and no error signal. Teams see agents following inconsistent procedures with no apparent cause. Business rules are procedural memory and belong in a versioned, governed instruction layer, not a similarity-retrieval store.

7. Is the system prompt an example of procedural memory?

Yes. The system prompt is the most accessible substrate for procedural memory in AI agents. It encodes behavioral rules, constraints, personas, and decision logic that the agent follows automatically on every inference. It corresponds to CoALA’s “explicit instruction sets” substrate. The key limitation: system prompts are typically unversioned, per-agent, and manually maintained, which is why enterprises with large agent fleets face governance gaps when policies change.

8. How do LangMem and Mem0 implement different memory types?

LangMem has first-class support for both: semantic memory via entity profiles (key-value + vector store) and procedural memory via prompt optimization (agents rewrite their own instructions). Mem0 focuses primarily on semantic memory, using a hybrid vector + graph architecture optimized for fact extraction, entity deduplication, and retrieval accuracy (91.6 on LoCoMo, 93.4 on LongMemEval). Neither addresses enterprise-grade governance for semantic memory (certification, lineage, approval chains); that gap is Atlan’s specific angle.

Sources

Share this article

Atlan is the Context Layer for AI — a Leader in the Gartner Magic Quadrant for D&A Governance (2026) and the Forrester Wave for Data Governance (Q3 2025). Atlan unifies your data, business knowledge, and the meaning behind your terms into one Enterprise Data Graph that gives every team and every AI agent the trusted context they need. Trusted by Mastercard, Workday, General Motors, CME Group, HubSpot, FOX, Virgin Media O2, Elastic, and 400+ enterprises representing $10T+ in market cap.

Book a Demo Register to Activate