Data Mesh Catalog: Manage Federated Domains, Curate Data Products, and Unlock Your Data Mesh
Share this article
The power of a data mesh is that the people who truly know the data are the ones who define, manage, and share the data. But in a large organization, how can you manage all the data products emerging across different domains within your enterprise?
The data mesh catalog is the key to setting up a functional data mesh. It brings the data mesh architecture to life and is the brain behind the data mesh.
This article will delve further into the role of a data mesh catalog in managing decentralized data domains, curating data products, implementing federated governance, and enabling self-service.
Why 40% of data catalog programs fail
Lack of adoption and business engagement is the primary driver of failed catalog programs. Atlan helped HelloFresh reach their 3-year business adoption target in 3 months with adoption-first design and automations, like Atlan AI, to reduce manual enrichment by 50%.
Table of contents #
- What is the role of the data catalog in a data mesh?
- How does a data catalog help realize data mesh principles?
- Using the data mesh catalog for federated data governance
- Data mesh catalog for productizing data and incorporating federated governance
- Related reads
What is the role of the data catalog in a data mesh? #
One of the main concerns with the data mesh is knowing and understanding what data you own. The data mesh allows you to have numerous data domains, domain, and data product owners, use cases, etc. Since the mesh supports decentralized data architecture, the people closest to the data get to own, label, manage, and leverage it to create value.
Kasia Bodzioch-Marczewska, the Domain Lead of Data Engineering at Brainly, highlights data silos and collaboration challenges as significant challenges stemming from the data mesh approach:
“Every product team at Brainly is independent and has their own tools and data. While this model paid dividends for innovation and agility, the siloed nature and ownership of this data meant frustrating back-and-forth whenever one team needed another’s data.“
You can see it as a “bridge the gap” problem between data producers and consumers. Data consumers (users from other data domains/projects) don’t know what assets other data domains/projects have created. They might even find the data they need, but they can’t verify its ownership or authenticity.
The data mesh catalog helps bridge this gap by providing a single interface for all data practitioners to know:
- What data domains do we have?
- What data products do we have?
- What data is available? Can I use it to answer my business question?
- Where did the data domains, products, and assets come from?
- Who owns it now? Who else is using it (and how)?
- Can I trust this data?
The business benefits of a data mesh catalog that can tackle such questions include:
- Faster time-to-value
- Improved data governance
- Greater productivity
- Reduced workload for data stewards and data engineers
Now that we understand the pivotal role of the data mesh catalog, let’s delve deeper into its functions and how it supports decentralized data domains.
How does a data catalog help realize data mesh principles? #
The four fundamental data mesh principles are:
- Domain-oriented decentralized data ownership and architecture
- Data as a product
- Self-service data infrastructure as a platform
- Federated computational governance
Here’s how the data mesh catalog can help realize these principles.
1. Data mesh catalog for managing decentralized domains #
The data mesh catalog supports these principles by offering a dedicated space to set up federated data domains. As a result, each domain maintains autonomy over its data products, people, and policies.
The data mesh catalog would provide a unique, personalized workspace for each domain. The landing page would display all the curated data products within that domain, and maintain documentation to provide context for data consumers.
2. Data mesh catalog for creating and curating data products #
Within each domain, data product owners can create and curate data products. Using the data mesh catalog, data producers can provide proper context by including the following:
- A summary describing the data asset
- Asset ownership
- Asset classification and relevant tags
The data mesh catalog can leverage automation and AI to create column descriptions at scale by reading the underlying metadata. It can also create complete asset profiles that tell users about:
- Asset relationships
- Freshness scores and trust certifications
- Quality metrics
- Activity — changes made, users working on or leveraging the asset, downstream usage
- Intuitive, actionable data lineage for data products and domains
- Data contracts
The data mesh catalog can also populate all Slack/Microsoft Teams discussions or Jira tickets related to a specific data asset.
Using the data mesh catalog to ensure the ideal characteristics of data products #
Data products in a mesh should be discoverable, addressable, understandable, trustworthy, natively accessible, interoperable, valuable (on its own), and secure.
The data mesh catalog can ensure these characteristics by assigning data product scorecards. They will attribute scores to data assets in terms of their characteristics.
With these scores, data consumers can gauge the accuracy, completeness, and reliability of data products. Meanwhile, data producers can understand how to improve their data products and make them more useful to data consumers.
3. Data mesh catalog for enabling self-service #
“If we think about this (siloed) data and this (decentralized) setup, if a program department like Tutoring, for example, wants to utilize financial data, they would have to go and ask. There was a very long process to get access to the right data, and to figure out which data you could use for your analysis.” Kasia Bodzioch-Marczewska, the Domain Lead of Data Engineering at Brainly
The data mesh catalog enables self-service by furnishing all data practitioners with a personalized homepage, where they can search for the data they need, get context, understand lineage, find the right people to talk to (if required), and more.
4. Data mesh catalog for federated governance #
“Federated Computational Governance ensures the Data Mesh is interoperable and behaving as an ecosystem, maintaining high standards for quality and security, and that users can derive value from aggregated and correlated data products.” Mark Kidwell, Chief Data Architect, Data Platforms and Services at Autodesk
The data mesh catalog can enable such an ecosystem. For instance, it can:
- Set up a centralized data governance center to create and track domain policies, so that data products are secure and only accessible to the relevant individuals or teams
- Take the pulse of what’s happening to your data domains with a dedicated Reports tab, populating a summary of data products, metrics, activity, domain usage, and more
It can also automate and scale your data documentation efforts, data classification and tagging processes, policy propagation via lineage, data asset quality monitoring, and more.
Also, read → How AI data governance can scale data security, privacy, compliance, and more
The role of metadata in enabling federated data governance
For these efforts to become a reality, you need to activate metadata.
Let’s revisit the elements that will populate the product score, such as discoverability, security, and trust:
- The business glossary entries of an asset can help in gauging its discoverability.
- Documentation of data owners and relevant points of contact can help assess whether a data product is addressable.
- Sensitivity classifications and tags attached can help evaluate the security aspect of an asset.
The data mesh catalog can read each asset’s underlying metadata, look for specific fields, and then award a product health score. The score recalibrates when you update the glossary, asset description, ownership details, downstream usage, etc.
Let’s expand this to lineage. The data mesh catalog can automatically propagate sensitivity classifications via lineage mapping to ensure PII data is always masked and anonymized.
So, metadata can support everything from data discovery and understanding to governance.
“Metadata allows you to shift from siloed context to embedded context (domains), generalized experiences to personalized experiences (data products), minimum automation to truly autonomous (self-service infrastructure), and top-down governance to democratized governance (federated computation governance).” Prukalpa Sankar, Co-founder at Atlan
Dive deeper → The metadata foundation that your data mesh needs
Using the data mesh catalog for federated data governance: 6 best practices to adopt #
The data mesh catalog is instrumental in fostering federated data governance in the data mesh architecture. Here are six best practices you can adopt to make it effective:
- Standardize data products, ensuring uniform data security and privacy policies, data cataloging and lineage practices, data quality metrics, and access controls
- Standardize metadata to ensure consistency in describing, documenting, and managing data
- Activate metadata and enable the bi-directional flow of metadata from various data stack tools to all the connected data products and vice versa
- Embed context within your daily workflows so that data practitioners can collaborate and work with data without switching apps
- Automate for scale and self-service so that anyone can ingest, organize, and publish data with proper documentation
Data mesh catalog for productizing data and incorporating federated governance #
“Where Data Mesh could help us was by enabling any team throughout Autodesk to act as a publisher, to ingest their own data, and to present it to consumers for that data domain.” Mark Kidwell, Chief Data Architect, Data Platforms and Services at Autodesk
The data mesh’s value proposition in scaling data and analytics use cases is compelling. Bringing the concept to life requires a data mesh catalog that can support setting up, documenting, and managing decentralized data domains and products. The catalog should also facilitate federated data governance and enable self-service.
Active metadata management is key to building a thriving data mesh and managing it using the data mesh catalog. Bringing metadata of various types and from various sources under one roof can help automate and orchestrate fundamental concepts behind the data mesh.
If you’re considering adopting a data mesh catalog for your data estate that activates your metadata, you can check out Atlan. Unlike most data catalogs, which force-fit features to modern mesh concepts, Atlan offers the first-ever native data mesh experience in a data catalog.
With Atlan’s native data mesh experience, your data products, domains, and contracts come to life as first-class citizens.
Data mesh catalog: Related reads #
- What is Data Mesh?: Examples, Case Studies, and Use Cases
- Snowflake Data Mesh: Step-by-Step Setup Guide
- Data Mesh Architecture: Core Principles, Components, and Why You Need It?
- Data Mesh Principles — 4 Core Pillars & Logical Architecture
- Data Mesh Setup and Implementation - An Ultimate Guide
- How to implement data mesh from a data governance perspective
- Atlan Activate: Bringing Data Mesh to Life
- How Autodesk Activates Their Data Mesh with Snowflake and Atlan
- Journey to Data Mesh: How Brainly Transformed its Data & Analytics Strategy
Share this article