Data Mesh Setup and Implementation: Ultimate Guide for 2024
Share this article
How to Setup a Data Mesh? #
Setting up the data mesh architecture requires you to follow four primary steps
- Treat your data as a product
- Map the distribution of domain ownership clearly
- Build a self-serve data infrastructure
- Ensure federated governance
See How Atlan Simplifies Data Governance – Start Product Tour
Data mesh is a modern analytics architecture targeted at mid-sized and large organizations. Organizations are keen on implementing the data mesh architecture to move away from a service-oriented view of data.
Instead, they seek to empower business teams to fully own their data and the pipelines enabling data flow across the data ecosystem.
Here we will explore each of these steps at length to understand how to build and implement the data mesh architecture at scale. But first, let’s quickly recap the principles behind the data mesh.
Also read: Snowflake Data Mesh: Step-by-Step Setup Guide
Table of contents #
- How to Setup a Data Mesh?
- The data mesh architecture
- Reasons for adopting the data mesh
- Data mesh principles
- How can you set up the data mesh?
- Rounding up on Data Mesh Setup
- Data mesh implementation: Related reads
The data mesh architecture #
The term “data mesh” was coined by Zhamak Dehghani in 2019 in her article titled “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh.”
Traditionally, organizations have maintained a service-oriented view of data, where data individuals are usually bottlenecks to decision-makers.
With the data mesh architecture, Dehghani puts the onus of data management on business teams. Since they’ll manage their data end to end, they can address the most pressing business questions autonomously, with zero bottlenecks.
Reasons for adopting the data mesh #
Why implement data mesh in 2024?
Here are some reasons for adopting the data mesh in 2024.
Organizations transition to the data mesh architecture when data ownership is centralized, creating a big monolithic data platform.
The monolith creates silos. Yet, data platform engineers are expected to ensure access to the right data, with no understanding of the business domain or use cases.
The data mesh breaks these silos with a decentralized approach.
According to Dehghani, the data mesh will:
- Enable autonomous teams to extract value from data
- Support value exchange between independent, yet interoperable data products
- Scale data sharing across domains
- Encourage a culture of embedded innovation — easy to find data, capture insights, and use it for ML model development
Data mesh principles #
Four fundamental principles shape the data mesh architecture:
- Data domains: Data domains contain data products belonging to a part of the business or domain. Each domain is owned by a department or team self-reliant in creating these data products.
- Data products: Every table or dashboard can be viewed as a data product. Like any other product that an organization offers, data products will also mirror their properties by being:
- Discoverable
- Addressable
- Trustworthy
- Self-describing
- Interoperable
- Secure
- Self-serve infrastructure: Since the business teams would be fully responsible for their data domain, they can create their own data products.
- Federated governance: For the data mesh to work, the data domains, platform, and products must be interoperable and governed by standardized conventions.
Also read: What is the data mesh? | Data fabric vs. data mesh
How can you set up the data mesh? #
Going from a centralized model to a distributed one for data is primarily a mindset shift. So, the steps you must take to successfully set up the data mesh are more about adapting your mindset than choosing the right technology.
In the end, you need the various organizational units to follow similar data governance practices to create interoperable and shareable data assets. For this purpose, here’s what you must do.
1. Treat your data as a product #
The first step towards reaching your goal is treating your data as a product. This helps you set a standard for documenting datasets and dashboards while ensuring that they’re interoperable.
To do so, you must catalog your data in a way that it’s credible and trustworthy. So, the data catalog must ensure the discoverability, addressability, interoperability, security, and integrity of data, besides providing adequate context.
Also read: Data as a Product: Applying Product Thinking Into Data
2. Map the distribution of domain ownership clearly #
Once your datasets are treated as products, the next step is to address their distribution. You should use Domain-Driven Design (DDD) techniques to group your datasets into different domains.
For example, if you’re in e-commerce, you could split your datasets into domains such as Users, Traffic, Orders, and so on.
It’s a lot easier when businesses are already split by domains. If that’s your case — each department has control over a part of your business, then make them own the datasets and dashboards that make sense to their share of the business.
Without aligning your data platform customers on the distribution of ownership when it comes to datasets and domains, you cannot transition to the mesh architecture successfully.
Also read: Understanding the data mesh architecture
3. Build a self-serve data infrastructure #
After the first datasets as products are available and the domain teams start managing them, it’s time to focus on your data infrastructure.
Having up to a handful of departments (domains) working on their own datasets will raise shared needs when it comes to infrastructure usage. That’s when you must take a product-centric approach to building the data infrastructure platform.
So, all data product owners and domains must align on the technology being used and its purpose. That means using the same underlying technology — cloud providers, programming languages, job scheduling tools — to build and handle datasets. As a result, you’ll have the required level of technical governance to succeed in your data mesh implementation.
Are there some data mesh tools to help set up the architecture?
As mentioned earlier, data mesh is a paradigm shift, rather than a technological setup. So, there is no out-of-the-box tool to help set up a data mesh architecture. However, engineering the right data culture can help substantially.
4. Ensure federated governance #
This step requires you to work on establishing the best practices and conventions, such as working agreements and shared nomenclature between domains.
The best way forward is to use the learnings from early adopter domains to document the rules for naming fields and tables, publishing and updating documentation, fixing quality issues, and more.
Governance efforts only succeed when they’re a collaborative effort, involving all the domain and data platform stakeholders. That’s why it’s called federated, and not centralized governance.
Also read: Data Governance 101 | Data documentation framework | Data governance has a branding problem
Rounding up on Data Mesh Setup #
The problem with centralized data repositories like data lakes is the challenge of extracting value quickly. A decentralized approach — the data mesh — that distributes data ownership among domain experts, while following a shared governance framework is the solution.
Implementing the data mesh architecture warrants following the four steps mentioned above. These steps have been modeled after the underlying principles of the data mesh, and as such, play a deciding role in the success of your mesh architecture implementation.
As Dehghani summarizes it, “the approach is a mesh of data that is organized around domains, and owned by cross-functional teams, and managed by centralized governance to allow interoperability, and served by a self-serve infrastructure.”
Written by Xavier Gumara Rigol
Data mesh implementation: Related reads #
- What is Data Mesh?: Examples, Case Studies, and Use Cases
- Snowflake Data Mesh: Step-by-Step Setup Guide
- Data Mesh Architecture: Core Principles, Components, and Why You Need It?
- Data Mesh Principles — 4 Core Pillars & Logical Architecture
- Data Mesh Setup and Implementation - An Ultimate Guide
- Data Mesh Vs. Data Lake — Differences & Use Cases For 2024
- Data Fabric vs Data Mesh: What are the Key Differences?
- Data Catalog: What It Is & How It Drives Business Value
- What Is a Metadata Catalog? - Basics & Use Cases
- Modern Data Catalog: What They Are, How They’ve Changed, Where They’re Going
- Open Source Data Catalog - List of 6 Popular Tools to Consider in 2024
- 5 Main Benefits of Data Catalog & Why Do You Need It?
- Enterprise Data Catalogs: Attributes, Capabilities, Use Cases & Business Value
- The Top 11 Data Catalog Use Cases with Examples
- 15 Essential Features of Data Catalogs To Look For in 2024
- Data Catalog vs. Data Warehouse: Differences, and How They Work Together?
- Snowflake Data Catalog: Importance, Benefits, Native Capabilities & Evaluation Guide
- Data Catalog vs. Data Lineage: Differences, Use Cases, and Evolution of Available Solutions
- Data Catalogs in 2024: Features, Business Value, Use Cases
- AI Data Catalog: Exploring the Possibilities That Artificial Intelligence Brings to Your Metadata Applications & Data Interactions
- Amundsen Data Catalog: Understanding Architecture, Features, Ways to Install & More
- Machine Learning Data Catalog: Evolution, Benefits, Business Impacts and Use Cases in 2024
- 7 Data Catalog Capabilities That Can Unlock Business Value for Modern Enterprises
- Data Catalog Architecture: Insights into Key Components, Integrations, and Open Source Examples
- Data Catalog Market: Current State and Top Trends in 2024
- Build vs. Buy Data Catalog: What Should Factor Into Your Decision Making?
- How to Set Up a Data Catalog for Snowflake? (2024 Guide)
- Data Catalog Pricing: Understanding What You’re Paying For
- Data Catalog Comparison: 6 Fundamental Factors to Consider
- Alation Data Catalog: Is it Right for Your Modern Business Needs?
- Collibra Data Catalog: Is It a Viable Option for Businesses Navigating the Evolving Data Landscape?
- Informatica Data Catalog Pricing: Estimate the Total Cost of Ownership
- Informatica Data Catalog Alternatives? 6 Reasons Why Top Data Teams Prefer Atlan
- Data Catalog Implementation Plan: 10 Steps to Follow, Common Roadblocks & Solutions
- Data Catalog Demo 101: What to Expect, Questions to Ask, and More
- Data Mesh Catalog: Manage Federated Domains, Curate Data Products, and Unlock Your Data Mesh
- Best Data Catalog: How to Find a Tool That Grows With Your Business
- How to Build a Data Catalog: An 8-Step Guide to Get You Started
- The Forrester Wave™: Enterprise Data Catalogs, Q3 2024 | Available Now
- How to Pick the Best Enterprise Data Catalog? Experts Recommend These 11 Key Criteria for Your Evaluation Checklist
- Collibra Pricing: Will It Deliver a Return on Investment?
- Data Lineage Tools: Critical Features, Use Cases & Innovations
- OpenMetadata vs. DataHub: Compare Architecture, Capabilities, Integrations & More
- Automated Data Catalog: What Is It and How Does It Simplify Metadata Management, Data Lineage, Governance, and More
- Data Mesh Setup and Implementation - An Ultimate Guide
- What is Active Metadata? Your 101 Guide
Share this article