MCP connects AI agents to data catalogs, but connection is not governance. The Model Context Protocol (MCP) is a delivery mechanism: it standardizes how agents request context from tools and data sources, including data catalogs like Atlan and OpenMetadata. With 97 million monthly SDK downloads and 10,000+ public MCP servers as of 2026, adoption is rapid.[1] But the protocol says nothing about whether the catalog underneath is certified, current, or correct. Governed context starts with a governed catalog.
| What MCP does | Standardizes the protocol layer: how AI agents request and receive context from tools and data sources |
| What it does not do | Define, enforce, or maintain the quality, certification, or governance of the context it delivers |
| The risk | Agents receive ungoverned context at AI speed: fast, formatted, and wrong |
| The fix | A governed catalog underneath MCP: certified assets, verified lineage, business-context-rich metadata |
| Atlan’s role | MCP server that exposes governed, certified, lineage-aware metadata to any compatible AI agent |
What is an MCP connected data catalog?
Permalink to “What is an MCP connected data catalog?”An MCP connected data catalog is a metadata catalog that exposes its assets to AI agents via the Model Context Protocol: tables, columns, business terms, lineage, and data quality scores. MCP handles the connection; the catalog determines what context agents actually receive and how trustworthy it is.
The Model Context Protocol is an open standard introduced by Anthropic in 2024. It defines a consistent interface for AI agents to call tools and retrieve context from external data sources, without requiring custom-built integrations for each source. A data catalog is a governed inventory of data assets, metadata, lineage, and business definitions. “Connected” means an MCP server sits in front of that catalog, translating agent requests into catalog queries and returning metadata as structured context.
Teams are connecting catalogs via MCP because AI agents need data context to reason and act accurately. MCP removes the need for point-to-point integrations between agents and data sources. Adoption is striking: MCP downloads grew from roughly 100,000 in November 2024 to more than 8 million by April 2025, an 80x increase in five months.[3] The use cases driving this growth include natural language SQL generation, automated data discovery, lineage traversal, and agent-powered governance workflows.
What the MCP connection exposes depends entirely on what the catalog contains. The MCP server returns whatever is in the catalog: schemas, business terms, ownership records, lineage graphs, quality scores. If those assets are uncertified, stale, or poorly defined, the agent gets ungoverned context. A connected catalog is not the same as a governed catalog. The difference is in the data inside, not the protocol on top.
How MCP connects AI agents to data catalogs
Permalink to “How MCP connects AI agents to data catalogs”MCP works as a protocol bridge: an AI agent issues tool calls or resource requests via the MCP interface; the MCP server translates those into catalog API queries and returns structured metadata as context. The protocol governs the connection. It has no mechanism to ensure the content returned is certified, complete, or current.
The protocol layer: what MCP does
Permalink to “The protocol layer: what MCP does”MCP defines three primitives: tool calls (actions agents can invoke), resource exposure (data sources agents can read), and prompt templates (pre-structured context patterns). This standardizes the request-response cycle: agents do not need catalog-specific integrations. MCP works with any compatible client: Claude, Cursor, GitHub Copilot, or custom agent frameworks. The protocol handles transport, authentication surface, and context packaging. It does not handle content quality. The MCP market is projected to reach $1.8 billion in 2025, driven by regulated industries including healthcare, finance, and manufacturing.[4]
What the catalog exposes via MCP
Permalink to “What the catalog exposes via MCP”Through an MCP server, an AI agent can access:
- Metadata: table names, column types, descriptions, tags
- Lineage: upstream sources, downstream consumers, transformation history
- Business terms: glossary definitions, domain ownership, SLA records
- Data quality signals: freshness scores, completeness metrics, certification status
- Access context: who owns the asset, who has permission to use it
What MCP does not control
Permalink to “What MCP does not control”The protocol has no visibility into whether metadata is accurate, up to date, or certified. It cannot verify that business definitions reflect current organizational meaning. It cannot detect gaps in lineage from undocumented pipelines. And it cannot signal to the agent whether the context it just received should be trusted or flagged as uncertain.
┌─────────────────────────────────────────────────────────┐
│ AI Agent (Claude / Cursor / custom) │
│ ↓ MCP tool call │
│ MCP Server │
│ ↓ catalog API query │
│ Data Catalog │
│ ↓ governed vs. ungoverned substrate │
│ ┌─────────────────┐ ┌──────────────────────────────┐ │
│ │ GOVERNED │ │ UNGOVERNED │ │
│ │ Certified │ │ Raw schema │ │
│ │ assets │ │ Stale metadata │ │
│ │ Complete │ │ Incomplete lineage │ │
│ │ lineage │ │ Undefined terms │ │
│ │ Business │ │ No quality signals │ │
│ │ glossary │ │ │ │
│ │ Quality scores │ │ │ │
│ └─────────────────┘ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
| Dimension | Connected catalog (MCP only) | Governed connected catalog |
|---|---|---|
| Asset certification | No: agents cannot tell trusted from untrusted | Yes: certified assets flagged; agents know what to trust |
| Business context | Missing: terms undefined or inconsistent | Present: glossary terms, ownership, domain classification |
| Lineage quality | Partial: pipelines may be undocumented | Complete: verified end-to-end lineage |
| Data quality signals | Absent: no freshness or completeness signal | Active: quality scores surface alongside metadata |
| Governance enforcement | None: protocol is transport-only | Enforced: access policies, stewardship records, audit trail |
Why connectivity without governance fails
Permalink to “Why connectivity without governance fails”An MCP connection amplifies whatever is in the catalog, for better or worse. When the underlying metadata is stale, uncertified, or missing business context, agents do not fail slowly and visibly. They produce fast, well-formatted, confidently wrong outputs. The governance gap is not a protocol design flaw; it is a catalog content problem that MCP makes urgent.
What agents inherit from an ungoverned catalog
Permalink to “What agents inherit from an ungoverned catalog”When a catalog has not been governed before MCP is wired up, agents encounter:
- Stale schema definitions: tables renamed, deprecated, or restructured since the last catalog update
- Missing ownership records: agents attribute data to the wrong team or surface assets without a clear steward
- Uncertified assets: agents treat experimental or sandbox tables as production-grade
- Undefined business terms: “revenue” returns three different definitions depending on which schema the agent queries
The “garbage in, garbage out” problem at protocol scale
Permalink to “The “garbage in, garbage out” problem at protocol scale”MCP does not introduce data quality issues; it accelerates their consequences. A human analyst querying a stale catalog catches inconsistencies through domain knowledge and experience. An agent querying at scale propagates those inconsistencies across every downstream decision that depends on its output.
This creates a false binary that frustrates enterprise teams: IT either blocks MCP altogether (which blocks AI innovation) or allows ungoverned MCP access to run. There is no middle path without a governed catalog layer.[7]
Real failure patterns
Permalink to “Real failure patterns”Practitioner communities on Reddit and Hacker News in March 2026 traced production failures consistently to source data quality, not model quality or retrieval architecture. The failure mode is not a crash or an error message. It is systematically incorrect outputs delivered at AI speed, compounding across every downstream decision that relied on them.
Security researchers at Darktrace identified access control as the visible MCP risk.[8] But the silent risk of ungoverned content quality is what creates the largest volume of AI failures in production environments today. The two risks are distinct: who can access data is a permissions problem; whether the data is correct is a governance problem. Solving one does not solve the other.
What makes a data catalog “governed” for MCP
Permalink to “What makes a data catalog “governed” for MCP”A governed catalog for MCP goes beyond storing metadata; it actively maintains the trustworthiness of what agents receive. Four properties define it: asset certification (what agents can trust), business context (what terms mean in your organization), lineage (where data came from), and access controls (what agents are permitted to see).
Asset certification
Permalink to “Asset certification”Certification is the mechanism that tells agents which data is production-grade. A governed catalog attaches explicit certification status to each asset: verified, draft, or deprecated. It includes data quality scores (freshness, completeness, accuracy signals) that surface alongside every metadata response. Stewardship records name the person or team accountable for each domain. Agents can surface only certified assets, or flag uncertainty when querying assets that have not been reviewed.
Business context
Permalink to “Business context”Business context is what transforms raw schema into semantic context that agents can reason with. A governed catalog maintains a business glossary: organization-specific definitions of “revenue,” “active customer,” “churn,” and every other ambiguous term that means different things in different systems. It attaches domain classification and ownership records so agents understand the organizational meaning of each asset, not just its technical shape.
Lineage
Permalink to “Lineage”Complete lineage lets agents trace where a number came from, who transformed it, and who depends on it downstream. A partial lineage graph (one that documents some transformations but not others) is as dangerous as no lineage at all, because agents cannot know which parts to trust. A governed catalog maintains end-to-end traceability with transformation documentation and impact analysis.
Access controls
Permalink to “Access controls”Governing access through MCP means applying RBAC, column-level masking, and row-level security at the catalog level so those restrictions propagate through the MCP server. Agents receive only the context their identity is authorized to see. A complete audit trail records what the agent queried, when, and what it received.
| Criterion | Questions to ask |
|---|---|
| Asset certification | Are certified vs. uncertified assets distinguishable? Does the MCP server surface trust signals? |
| Business glossary | Are terms defined, owned, and linked to physical assets? Can agents retrieve definitions alongside schema? |
| Lineage completeness | Is lineage end-to-end or partial? Are transformations documented? |
| Access governance | Do RBAC policies extend through the MCP server? Is there an audit log of agent queries? |
| Data quality scoring | Are freshness and completeness scores attached to assets? Are stale assets flagged? |
| Stewardship records | Does every asset have a named owner? Is ownership current? |
How to build a governed MCP-connected data catalog
Permalink to “How to build a governed MCP-connected data catalog”Building a governed MCP connection is a two-phase process: govern the catalog first, then expose it via MCP. Most teams reverse this order; they wire the protocol before establishing what the catalog actually contains. The result is fast, ungoverned context delivery. The right sequence starts with the data layer.
Prerequisites before connecting MCP:
- A data catalog with active governance policies, not just metadata storage
- A certification workflow: who approves assets as production-grade, and on what cadence
- A business glossary with organization-wide term definitions linked to physical assets
- An access governance model: which agents (or agent roles) can access which catalog assets
Step 1: Audit what your catalog currently exposes
Permalink to “Step 1: Audit what your catalog currently exposes”Run a certified vs. uncertified asset ratio. In practice, most catalogs carry a significant uncertified asset backlog at first audit. Identify stale assets by reviewing last-updated timestamps and deprecated tables that remain in the catalog. Map which assets are most likely to be queried by agents first, prioritize those for governance cleanup. Document what agents would receive today if MCP were connected. This is the baseline gap that needs to close before the protocol layer goes live.
Step 2: Establish data governance policies for AI-consumable assets
Permalink to “Step 2: Establish data governance policies for AI-consumable assets”Define what “certified” means in your organization and who has authority to certify. Assign stewards to high-priority domains before MCP goes live. Set freshness SLAs: how old can metadata be before it is flagged as potentially stale? Link business glossary terms to physical assets so agents retrieve semantic context alongside schema.[7] Without this step, agents receive well-structured responses with undefined terms and unverified data quality.
Step 3: Implement the MCP server on top of the governed catalog layer
Permalink to “Step 3: Implement the MCP server on top of the governed catalog layer”Deploy an MCP server configured to expose only certified or steward-reviewed assets by default. Configure which catalog object types are tool-callable, tables, glossary terms, lineage graphs. Test with a constrained agent query set before opening to broader access. Databricks introduced an MCP Catalog (Beta) within Unity Catalog that registers, discovers, and governs MCP servers like any catalog object, with activity in a centralized audit table, a useful reference architecture for how governance wraps the protocol layer.[6]
Step 4: Define what agents can access and in what context
Permalink to “Step 4: Define what agents can access and in what context”Scope agent access to domains, not the full catalog. Apply the principle of least privilege. Row-level and column-level security enforced at the catalog level will propagate through the MCP server if the server is properly configured. Define agent personas: a BI agent, a compliance agent, and a developer assistant should see different slices of the catalog. Snowflake’s managed MCP servers, which reached general availability in November 2025, extend RBAC and masking policies through the MCP layer, a strong reference for the access governance pattern.[5]
Step 5: Monitor what agents are consuming and flag quality degradation
Permalink to “Step 5: Monitor what agents are consuming and flag quality degradation”Log every agent query and the asset it accessed. This builds the audit trail and surfaces access patterns that may not have been anticipated at design time. Alert when agents repeatedly query uncertified or stale assets, this signals to the governance team which assets to prioritize next. Track data quality score drift: if freshness degrades, flag before agents propagate stale context downstream.
Three common pitfalls:
- Connecting MCP before the catalog is governed: ensures fast delivery of bad context at protocol speed
- Treating access control as equivalent to governance: RBAC governs who can see data; it does not govern whether the data is correct, certified, or meaningful[5]
- Certifying assets once without ongoing stewardship: catalogs degrade; certification needs a refresh cadence tied to SLAs
How Atlan’s MCP server delivers governed context
Permalink to “How Atlan’s MCP server delivers governed context”Atlan’s MCP server does not just expose metadata; it exposes governed metadata. Certified assets, verified lineage, business glossary terms, data quality scores, and ownership records travel with every agent query. Teams using Atlan do not ask whether their agents are getting good context. The governance layer ensures it.
The ungoverned catalog problem in practice: Most MCP implementations connect the protocol to a catalog that has metadata, but not governed metadata. Engineers wire the MCP server in days; the governance backlog takes months. The result is that agents have fast, formatted access to stale schema and undefined terms. The protocol is live; the catalog is not ready.
Atlan’s catalog addresses this through several specific capabilities:
- Certified assets: Every table carries an explicit certification status (verified, deprecated, or draft) that surfaces through the MCP server. Agents know which assets have been reviewed and which have not.
- Business glossary: An agent querying “revenue” through Atlan’s MCP server receives the organization-specific definition alongside the physical table, not just a schema dump. Glossary terms are linked to physical assets and owned by named stewards.
- Data lineage: Agents can trace origin, transformation, and downstream impact through complete, maintained lineage graphs, not partial documentation of a subset of pipelines.
- Data quality scores: Freshness and completeness metrics attach to every asset response. Agents can factor data quality into their reasoning, not just their retrieval.
- Ownership and stewardship: Every asset returned through Atlan’s MCP server includes the accountable team and steward, so agents can surface attribution alongside data.
Atlan also governs metadata across cloud platforms (Snowflake, Databricks, BigQuery, dbt) rather than within a single data warehouse perimeter. This cross-platform scope matters: most enterprises have data spread across multiple systems, and governance that only covers one platform still leaves agents operating with ungoverned context from the others.
To understand what Atlan’s MCP server exposes and how it is configured, see What Is Atlan MCP? For implementation steps, see the MCP Server Implementation Guide.
Real stories from real customers: MCP context delivered with governance behind it
Permalink to “Real stories from real customers: MCP context delivered with governance behind it”"We're excited to build the future of AI governance with Atlan. All of the work that we did to get to a shared language at Workday can be leveraged by AI via Atlan's MCP server... as part of Atlan's AI Labs, we're co-building the semantic layer that AI needs with new constructs, like context products."
Joe DosSantos, VP of Enterprise Data & Analytics, Workday
Workday’s data organization had spent years building a shared semantic language across business units. Atlan’s MCP server makes that investment AI-accessible: the organizational context that stewards built into the catalog is now the context agents retrieve. The semantic layer does not need to be rebuilt for AI, it surfaces through the protocol layer directly.
"Atlan is much more than a catalog of catalogs. It's more of a context operating system... Atlan enabled us to easily activate metadata for everything from discovery in the marketplace to AI governance to data quality to an MCP server delivering context to AI models."
Sridher Arumugham, Chief Data & Analytics Officer, DigiKey
DigiKey’s framing of “context operating system” captures the distinction this page is built around. A data catalog stores metadata. A context operating system ensures that metadata is governed, certified, and ready to serve AI agents at production scale. The MCP server is the delivery mechanism; the governed catalog is what makes the delivery trustworthy.
Connected is not enough: governed context is the standard
Permalink to “Connected is not enough: governed context is the standard”The MCP protocol has made it trivially easy to wire an AI agent to a data catalog. That ease is valuable and real, and it is exactly why the question shifts immediately to what the catalog exposes. With 80 percent of Fortune 500 companies deploying active AI agents in production as of 2026,[2] the volume of agent queries hitting data catalogs is scaling faster than governance programs are maturing.
The consequence is predictable: fast, formatted, wrong. Agents do not produce error messages when they receive ungoverned context. They produce confident outputs, at AI speed, at AI scale, that are built on stale schemas, undefined business terms, and unverified data quality.
The fix is not a better protocol. MCP is not the problem and it does not need to be replaced. The fix is a governed catalog layer underneath the protocol: certified assets, business-context-rich metadata, complete lineage, and access governance that extends through the MCP server to every agent query.
That is the standard worth building toward. Not “are we connected?” but “is what we’re exposing trustworthy?”
FAQs about MCP connected data catalogs
Permalink to “FAQs about MCP connected data catalogs”1. What is an MCP connected data catalog?
Permalink to “1. What is an MCP connected data catalog?”An MCP connected data catalog is a metadata catalog that exposes its assets: tables, schemas, lineage, business terms, and quality scores, to AI agents via the Model Context Protocol. MCP acts as the protocol bridge between the agent and the catalog. The quality of context agents receive depends entirely on what the catalog contains and whether those assets are governed.
2. How does Model Context Protocol connect to a data catalog?
Permalink to “2. How does Model Context Protocol connect to a data catalog?”An MCP server sits in front of the catalog and translates agent requests into catalog API queries. When an AI agent issues a tool call, for example, “describe the customers table,” the MCP server queries the catalog and returns the relevant metadata as structured context. The protocol defines how the request is made; the catalog determines what comes back.
3. What data can an AI agent access through an MCP server?
Permalink to “3. What data can an AI agent access through an MCP server?”Through an MCP server connected to a data catalog, an agent can access table schemas, column descriptions, business term definitions, data lineage graphs, ownership records, certification status, and data quality scores, depending on what the catalog stores and what the MCP server is configured to expose. Access is also scoped by the permissions model applied at the catalog and MCP layer.
4. Is MCP secure for enterprise data?
Permalink to “4. Is MCP secure for enterprise data?”MCP supports authentication and access control at the protocol layer, but enterprise security requires governance at the catalog layer too. RBAC, column-level masking, row-level security, and audit logging need to be enforced in the catalog and extend through the MCP server. A well-configured MCP server connected to a governed catalog can meet enterprise security requirements; an ungoverned connection cannot.
5. What is the difference between an MCP server and a data catalog?
Permalink to “5. What is the difference between an MCP server and a data catalog?”An MCP server is a protocol interface that receives requests from AI agents and returns context using the Model Context Protocol standard. A data catalog is a governed repository of metadata about your data assets. The MCP server handles the communication layer; the data catalog is the source of the content. Both are required for a governed AI context pipeline; neither alone is sufficient.
6. How do I govern an MCP-connected AI agent?
Permalink to “6. How do I govern an MCP-connected AI agent?”Governing an MCP-connected agent starts with governing the catalog it connects to. Certify assets before exposing them through MCP, assign stewards to high-priority domains, define business glossary terms so agents retrieve consistent definitions, and apply access controls at the catalog level so they propagate through the MCP server. Then monitor what the agent queries and flag quality degradation over time.
7. What happens if an AI agent accesses ungoverned data through MCP?
Permalink to “7. What happens if an AI agent accesses ungoverned data through MCP?”The agent returns fast, well-formatted, confidently wrong answers. Without certification signals, agents cannot distinguish trusted from untrusted assets. Without business context, terms like “revenue” or “active customer” are ambiguous. Without complete lineage, agents cannot trace where numbers came from. The failure mode is not a crash; it is systematically incorrect outputs delivered at AI speed across every downstream decision.
8. What does data certification mean for MCP outputs?
Permalink to “8. What does data certification mean for MCP outputs?”Data certification is the process of explicitly marking catalog assets as verified, production-grade, and trustworthy, as opposed to draft, deprecated, or unreviewed. When an MCP server exposes certification status alongside metadata, agents can prioritize certified assets and flag or exclude uncertified ones. Without certification, agents treat all catalog assets as equally trustworthy, which is rarely accurate.
Sources
Permalink to “Sources”- MCP Ecosystem in 2026: 100 Million Installs and Growing, Effloow / MCP Adoption Statistics 2026, MCP Manager
- MCP Adoption Statistics 2026, MCP Manager
- A Year of MCP: From Internal Experiment to Industry Standard, Pento
- The Complete Guide to Model Context Protocol (MCP) Enterprise Adoption, Deepak Gupta
- Introducing Snowflake Managed MCP Servers for Secure, Governed Data Agents, Snowflake
- Accelerate AI Development with Databricks: Discover, Govern, and Build with MCP and Agent Bricks, Databricks
- AI Agents Are Only as Good as the Data Behind Them, Nexla
- 7 MCP Risks CISOs Should Consider, Darktrace
Share this article
