Data Mesh Setup and Implementation: Ultimate Guide for 2024

Updated September 28th, 2024

Share this article

How to Setup a Data Mesh? #

Setting up the data mesh architecture requires you to follow four primary steps

Treat your data as a product
Map the distribution of domain ownership clearly
Build a self-serve data infrastructure
Ensure federated governance

See How Atlan Simplifies Data Governance – Start Product Tour

Data mesh is a modern analytics architecture targeted at mid-sized and large organizations. Organizations are keen on implementing the data mesh architecture to move away from a service-oriented view of data.

Instead, they seek to empower business teams to fully own their data and the pipelines enabling data flow across the data ecosystem.

Here we will explore each of these steps at length to understand how to build and implement the data mesh architecture at scale. But first, let’s quickly recap the principles behind the data mesh.

Also read: Snowflake Data Mesh: Step-by-Step Setup Guide

The data mesh architecture #

The term “data mesh” was coined by Zhamak Dehghani in 2019 in her article titled “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh.”

Traditionally, organizations have maintained a service-oriented view of data, where data individuals are usually bottlenecks to decision-makers.

With the data mesh architecture, Dehghani puts the onus of data management on business teams. Since they’ll manage their data end to end, they can address the most pressing business questions autonomously, with zero bottlenecks.

Reasons for adopting the data mesh #

Why implement data mesh in 2024?

Here are some reasons for adopting the data mesh in 2024.

Organizations transition to the data mesh architecture when data ownership is centralized, creating a big monolithic data platform.

The monolith creates silos. Yet, data platform engineers are expected to ensure access to the right data, with no understanding of the business domain or use cases.

The data mesh breaks these silos with a decentralized approach.

How data mesh compares with traditional data warehousing/lake principles. Source: Data Mesh, Zhamak Dehghani, O'Reilly

According to Dehghani, the data mesh will:

Enable autonomous teams to extract value from data
Support value exchange between independent, yet interoperable data products
Scale data sharing across domains
Encourage a culture of embedded innovation — easy to find data, capture insights, and use it for ML model development

Data mesh principles #

Four fundamental principles shape the data mesh architecture:

Data domains: Data domains contain data products belonging to a part of the business or domain. Each domain is owned by a department or team self-reliant in creating these data products.
Data products: Every table or dashboard can be viewed as a data product. Like any other product that an organization offers, data products will also mirror their properties by being:
- Discoverable
- Addressable
- Trustworthy
- Self-describing
- Interoperable
- Secure
Self-serve infrastructure: Since the business teams would be fully responsible for their data domain, they can create their own data products.
Federated governance: For the data mesh to work, the data domains, platform, and products must be interoperable and governed by standardized conventions.

Also read: What is the data mesh? | Data fabric vs. data mesh

How can you set up the data mesh? #

Going from a centralized model to a distributed one for data is primarily a mindset shift. So, the steps you must take to successfully set up the data mesh are more about adapting your mindset than choosing the right technology.

In the end, you need the various organizational units to follow similar data governance practices to create interoperable and shareable data assets. For this purpose, here’s what you must do.

1. Treat your data as a product #

The first step towards reaching your goal is treating your data as a product. This helps you set a standard for documenting datasets and dashboards while ensuring that they’re interoperable.

To do so, you must catalog your data in a way that it’s credible and trustworthy. So, the data catalog must ensure the discoverability, addressability, interoperability, security, and integrity of data, besides providing adequate context.

Also read: Data as a Product: Applying Product Thinking Into Data

2. Map the distribution of domain ownership clearly #

Once your datasets are treated as products, the next step is to address their distribution. You should use Domain-Driven Design (DDD) techniques to group your datasets into different domains.

For example, if you’re in e-commerce, you could split your datasets into domains such as Users, Traffic, Orders, and so on.

It’s a lot easier when businesses are already split by domains. If that’s your case — each department has control over a part of your business, then make them own the datasets and dashboards that make sense to their share of the business.

Without aligning your data platform customers on the distribution of ownership when it comes to datasets and domains, you cannot transition to the mesh architecture successfully.

Also read: Understanding the data mesh architecture

3. Build a self-serve data infrastructure #

After the first datasets as products are available and the domain teams start managing them, it’s time to focus on your data infrastructure.

Having up to a handful of departments (domains) working on their own datasets will raise shared needs when it comes to infrastructure usage. That’s when you must take a product-centric approach to building the data infrastructure platform.

So, all data product owners and domains must align on the technology being used and its purpose. That means using the same underlying technology — cloud providers, programming languages, job scheduling tools — to build and handle datasets. As a result, you’ll have the required level of technical governance to succeed in your data mesh implementation.

Are there some data mesh tools to help set up the architecture?

As mentioned earlier, data mesh is a paradigm shift, rather than a technological setup. So, there is no out-of-the-box tool to help set up a data mesh architecture. However, engineering the right data culture can help substantially.

4. Ensure federated governance #

This step requires you to work on establishing the best practices and conventions, such as working agreements and shared nomenclature between domains.

The best way forward is to use the learnings from early adopter domains to document the rules for naming fields and tables, publishing and updating documentation, fixing quality issues, and more.

Governance efforts only succeed when they’re a collaborative effort, involving all the domain and data platform stakeholders. That’s why it’s called federated, and not centralized governance.

Also read: Data Governance 101 | Data documentation framework | Data governance has a branding problem

Rounding up on Data Mesh Setup #

The problem with centralized data repositories like data lakes is the challenge of extracting value quickly. A decentralized approach — the data mesh — that distributes data ownership among domain experts, while following a shared governance framework is the solution.

Implementing the data mesh architecture warrants following the four steps mentioned above. These steps have been modeled after the underlying principles of the data mesh, and as such, play a deciding role in the success of your mesh architecture implementation.

As Dehghani summarizes it, “the approach is a mesh of data that is organized around domains, and owned by cross-functional teams, and managed by centralized governance to allow interoperability, and served by a self-serve infrastructure.”

Written by Xavier Gumara Rigol

What is Data Mesh?: Examples, Case Studies, and Use Cases
Snowflake Data Mesh: Step-by-Step Setup Guide
Data Mesh Architecture: Core Principles, Components, and Why You Need It?
Data Mesh Principles — 4 Core Pillars & Logical Architecture
Data Mesh Setup and Implementation - An Ultimate Guide
Data Mesh Vs. Data Lake — Differences & Use Cases For 2024
Data Fabric vs Data Mesh: What are the Key Differences?

Data Catalog: What It Is & How It Drives Business Value
What Is a Metadata Catalog? - Basics & Use Cases
Modern Data Catalog: What They Are, How They’ve Changed, Where They’re Going
Open Source Data Catalog - List of 6 Popular Tools to Consider in 2024
5 Main Benefits of Data Catalog & Why Do You Need It?
Enterprise Data Catalogs: Attributes, Capabilities, Use Cases & Business Value
The Top 11 Data Catalog Use Cases with Examples
15 Essential Features of Data Catalogs To Look For in 2024
Data Catalog vs. Data Warehouse: Differences, and How They Work Together?
Snowflake Data Catalog: Importance, Benefits, Native Capabilities & Evaluation Guide
Data Catalog vs. Data Lineage: Differences, Use Cases, and Evolution of Available Solutions
Data Catalogs in 2024: Features, Business Value, Use Cases
AI Data Catalog: Exploring the Possibilities That Artificial Intelligence Brings to Your Metadata Applications & Data Interactions
Amundsen Data Catalog: Understanding Architecture, Features, Ways to Install & More
Machine Learning Data Catalog: Evolution, Benefits, Business Impacts and Use Cases in 2024
7 Data Catalog Capabilities That Can Unlock Business Value for Modern Enterprises
Data Catalog Architecture: Insights into Key Components, Integrations, and Open Source Examples
Data Catalog Market: Current State and Top Trends in 2024
Build vs. Buy Data Catalog: What Should Factor Into Your Decision Making?
How to Set Up a Data Catalog for Snowflake? (2024 Guide)
Data Catalog Pricing: Understanding What You’re Paying For
Data Catalog Comparison: 6 Fundamental Factors to Consider
Alation Data Catalog: Is it Right for Your Modern Business Needs?
Collibra Data Catalog: Is It a Viable Option for Businesses Navigating the Evolving Data Landscape?
Informatica Data Catalog Pricing: Estimate the Total Cost of Ownership
Informatica Data Catalog Alternatives? 6 Reasons Why Top Data Teams Prefer Atlan
Data Catalog Implementation Plan: 10 Steps to Follow, Common Roadblocks & Solutions
Data Catalog Demo 101: What to Expect, Questions to Ask, and More
Data Mesh Catalog: Manage Federated Domains, Curate Data Products, and Unlock Your Data Mesh
Best Data Catalog: How to Find a Tool That Grows With Your Business
How to Build a Data Catalog: An 8-Step Guide to Get You Started
The Forrester Wave™: Enterprise Data Catalogs, Q3 2024 | Available Now
How to Pick the Best Enterprise Data Catalog? Experts Recommend These 11 Key Criteria for Your Evaluation Checklist
Collibra Pricing: Will It Deliver a Return on Investment?
Data Lineage Tools: Critical Features, Use Cases & Innovations
OpenMetadata vs. DataHub: Compare Architecture, Capabilities, Integrations & More
Automated Data Catalog: What Is It and How Does It Simplify Metadata Management, Data Lineage, Governance, and More
Data Mesh Setup and Implementation - An Ultimate Guide
What is Active Metadata? Your 101 Guide