What is Data Mesh? - Examples, Case Studies, and Use Cases
Updated on: March 05th, 2023
Share this article
What is data mesh?
Data mesh is a decentralized data architecture where data is treated as a product and managed by dedicated data product owners.
The data mesh decentralizes data ownership by transferring the responsibility from the central data team to the business units that create and consume data.
It operates on the principles of domain-driven design, product thinking, and federated governance.
Table of content
- What is data mesh?
- 4 principles of data mesh
- Data mesh architecture
- Advantages of data mesh
- How to set up a data mesh for your organization
- Metadata as the foundation for your data mesh needs
- Data mesh in action: Case studies
- Data Mesh: Related reads
Download report: Future of Modern Data Stack in 2023 - ft. 4 new emerging trends.
Here’s a quick 101 on the data mesh approach, its principles, popular architecture examples, advantages, basics of setup, and case studies.
Zhamak Dehghani (ex-director of emerging technologies for ThoughtWorks in North America) first proposed the data mesh approach as an alternative to monolithic data architectures.
She defines data mesh as a “decentralized socio-technical approach to share, access, and manage analytical data in complex and large-scale environments—within or across organizations.”
Meanwhile, ThoughtWorks defines data mesh as “an analytical data architecture and operating model where data is treated as a product and owned by teams that most intimately know and consume the data.”
The simplest way to visualize the data mesh is to think of it as “a federation, with independent business owners still agreeing on a common language and common units of exchange.”
How do data leaders interpret the data mesh?
To add more color to the concept, here we’ve compiled perspectives from diverse data leaders on data mesh.
According to Max Schultze, Data Engineering Manager at Zalando, the data mesh is product thinking for data and platform thinking for data infrastructure with federated governance.
Meanwhile, Mohammad Syed, ex-Head of Data Strategy & Management at Credera UK, sees the data mesh as a data platform version of microservices. Here’s how he describes the model:
“In a data mesh model, the data architecture is decentralised into independent, interoperable, and business-owned data products which operate as microservices. While each data product is built to common standards of quality and interoperability, each product’s architecture - from storage types to data models - is optimised for its domain-specific business use case and can deliver value more precisely than a centralised data architecture.”
He also believes CDOs should use the data mesh concept to test new ways of delivering data and analytics solutions at pace and at scale.
Jean-Georges Perrin, Intelligence Platform Lead at PayPal, compares the impact of data mesh with that of the agile movement in software engineering:
Data Mesh is bringing to data engineering many of the concepts you may have been familiar with in agile software engineering.
Adam Bellamare, the Staff Technologist at Confluent, believes that the data mesh “places the onus of responsibility for providing clean, available and reliable data on the crew that generates, uses and stores the data — not on a centralized analytics team.”
What problems would such a decentralized, domain-driven design for data solve?
According to James Serra, Data & AI Solution Architect at Microsoft:
“Data mesh tries to solve three challenges with a centralized data lake/warehouse:
- Lack of ownership: Who owns the data – the data source team or the infrastructure team?
- Lack of quality: The infrastructure team is responsible for quality but does not know the data well
- Organizational scaling: The central team becomes the bottleneck, such as with an enterprise data lake/warehouse”
The data mesh also overcomes the issue of lack of context — common with a centralized architecture managed by a central data team. As Eckerson Group’s Kevin Petrie puts it, “business domain knowledge matters, and it gets lost in central platforms.”
The domain-driven design ensures that the teams with complete context create and use data for decision-making.
Here’s how John Cutler, Senior Director of Product Enablement at Toast, makes a case for the data mesh:
“The opposite of data mesh is data meh — producers do whatever they want, and some poor soul needs to make sense of it all so it can be consumed.”
Now, how would you implement data mesh?
Kevin Petrie, VP of Research at Eckerson Group, suggests the following:
- Put the smartest business domain experts – perhaps the operational owners of a certain region or business unit – in charge of their data throughout the lifecycle.
- Have those domain experts transform and deliver their data as a discoverable, consumable product to the rest of the business.
- Give them the people and training needed to handle data engineering.
- Create a federated governance team to devise and implement policies.
- And relegate the rest to a standard, enterprise-wide resource pool, such as a cloud Infrastructure as a Service (IaaS) platform.
Atlan’s Latest Great Data Debate - founders of the modern data stack in an interactive discussion on the changing data ecosystem
Get The Recording
Data mesh vs. data fabric
The data fabric is a centralized data architecture design wherein an integrated layer (fabric) of data connects various processes. A central data team orchestrates the data fabric.
Meanwhile, the data mesh is a decentralized approach to designing data architecture. Data is stored across different domains and managed by domain experts.
According to Noel Yuhanna, Principal Analyst at Forrester, the difference between data mesh and data fabric is all about the APIs:
“A data mesh is basically an API-driven [solution] for developers, unlike [data] fabric,” Yuhanna said. “[Data fabric] is the opposite of data mesh, where you’re writing code for the APIs to interface. On the other hand, data fabric is low-code, no-code, which means that the API integration is happening inside of the fabric without actually leveraging it directly, as opposed to data mesh.”
Read more → Data mesh vs. data fabric
Data mesh vs. data lake
The data lake architecture is a centralized approach to designing data platforms. A central data lake would store all organizational data, and a central data team would oversee it.
Meanwhile, the data mesh architecture is decentralized and domain-driven. So, instead of a central data team, each data domain has a dedicated team responsible for managing the data it creates.
Here’s how Barr Moses of Monte Carlo establishes the difference:
“Unlike traditional monolithic data infrastructures that handle the ETL in one central data lake, a data mesh supports distributed, domain-specific data consumers and views “data-as-a-product,” with each domain handling their own data pipelines. Underlying the data mesh is a standardized layer of observability and governance that ensures data is reliable and trustworthy at all times.”
Read more → Data mesh vs. data lake
What are the 4 principles of data mesh?
There are four fundamental principles of the data mesh architecture:
- Domain ownership
- Data as a Product
- Self-serve data platform
- Federated governance
1. Domain ownership
Each domain is responsible for creating, managing, storing, and sharing the data it creates without relying on a central data team. As a result, those with full context are in charge of handling data.
This ensures that domains own and are held accountable for their data.
When the tire manufacturer Michelin started chalking out their data mesh strategy, they identified four core data domains — Manufacturing, Research and Development, Services and Solution, and Business Services.
Read more → Domain data ownership in the data mesh
2. Data as a Product
Instead of treating data as a by-product of business processes, it should be seen as the product itself. The consumers of this product must be treated as customers and offered a delightful experience by the domain data owners.
When applying product thinking to data, it also extends to the various components of data, such as metadata, code, policies, and more. So, the characteristics of data as a product — discoverable, addressable, understandable, accessible, trustworthy, interoperable, and secure — apply to these components as well.
Continuing with our example of Michelin, the Manufacturing domain would have products, such as production, quality, maintenance, and industry supply chain.
Read more → Data as a product
3. Self-serve data platform
For domain teams to be fully autonomous and manage their data products end-to-end, self-serve data infrastructure must be in place. This infrastructure would remove all the complexities involved in managing the lifecycle of data products. It would also empower cross-functional teams across domains to collaborate and share data.
Read more → Self-serve data infrastructure as a platform
4. Federated governance
One of the main concerns of distributed domain data ownership is the possibility of duplicated effort, the creation of data silos, and a lack of interoperability across data domains.
That’s why a federated governance model establishing a common language — standards, terms, definitions, policies — is crucial for the data mesh to work. The domain data owners will follow a set of federal/global data governance rules, while retaining their autonomy.
Read more → Federated computational governance
What does the data mesh architecture look like?
There are several approaches to designing the data mesh architecture. Let’s look at the most common approach for enterprises.
According to data mesh architect Eric Broda, the components of an enterprise data mesh architecture can be:
- Data products: The building blocks within each domain of the data mesh containing operational and analytical data
- Domain-oriented data pipelines: The data pipelines responsible for consuming, transforming, and serving data for data products in each domain
- Data infrastructure: The infrastructure component to help you build, deploy, run data product code, store and access big data and metadata
- Governance: A governance model that describes global standardization, records data changes, and maps them in a catalog
Read more → Understanding Data Mesh Architecture
Besides this design, the mesh can take up various topologies to strike a balance between decentralization and centralization. These variations offer a feasible alternative for enterprises cautious of embracing a fully decentralized architecture.
As Piethein Strengholt, the author of Data Management at Scale, puts it:
“Larger enterprises enjoy a data product management mindset. However, they don’t like the idea of a fully decentralized architecture, which could result in data duplication when joining data, repeated efforts of platform management, creation of silos and proliferation of technology standards. Others fear the costs and decreased performance of combining data from multiple teams. Or are frightened by the need for deep expertise for complex system management.”
Read more → Data Mesh: The Balancing Act of Centralizationand Decentralization
What are the advantages of data mesh?
The data mesh is a way to resolve data quality, ownership, accountability, and trust issues, which are common with monolithic data architectures.
The top advantages of the data mesh architecture include:
- Greater autonomy and control over your data, leading to faster decision-making
- Product thinking gets embedded everywhere
- Easier data discovery and accessibility
- Greater scalability of data systems with autonomous data domains and teams
- Better data quality when the team creating data is in charge of managing it and extracting value from it
- Interoperability across data domains
- Better regulatory compliance and data security
Secrets of a Modern Data Leader - The First 365 Days Inside a Data Team
Download The Ebook
How to set up a data mesh for your organization
The six steps to setting up a data mesh are:
- Treat your data as a product and define the qualifying criteria, characteristics, and KPIs
- Map the distribution of domain ownership clearly
- Build a self-serve data infrastructure that can support all domains and product owners
- Migrate ownership of existing data in lakes/warehouses to the domain teams
- Ensure federated governance by establishing global rules, naming conventions, and best practices for documentation, among others
- Start small and build MVPs before scaling organization-wide
Read more → Data mesh setup and implementation
Metadata as the foundation for your data mesh needs
Metadata is the driving force behind seeing the underlying principles of the data mesh in practice.
For instance, Dehghani highlights seven characteristics of Data as a Product, such as discoverability, interoperability, security, and more.
How do you ensure that in practice? Here are two examples:
- If you aggregate metadata across data products and make it easy to search for using a Google-like interface, that will make data discoverable and accessible.
- Similarly, building a 360-degree profile of each data asset using metadata makes the asset understandable.
Another principle of the data mesh is federated computational governance. Implementing it would require feedback loops and bottom-up input from across the organization.
For example, usage metadata can tell you about the frequently used assets. This information can help you set up a product health score to assess data quality and trustworthiness.
Read more → The metadata foundation that your data mesh needs
The Metadata Foundation that Your Data Mesh Needs: An Atlan Session at Datanova 2022
Data mesh in action: case studies
Intuit wanted data-driven systems to enable smarter product experiences. However, its data workers (business analysts, data engineers, and data scientists) needed help with issues around data discoverability, understandability, trust, and use.
Questions such as “Which team supports this data if it breaks?”, “Who can approve my access so that I can see samples of the data?”, and “Am I duplicating data that already exists?” plagued the data workers.
Intuit wanted to empower its data workers to create and own high-quality data-driven systems. They would be responsible for designing, developing, and supporting these systems. So, Intuit decided to leverage the data mesh architecture.
As a result, they developed data products — a set of internal processes and data that produce a set of externally consumable data, all aligned around the same problem in the business domain.
The data workers creating and managing these products must be capable of understanding the business problem, the data required, and the implementation of processes to solve the problem.
Each data product must be well-documented to show who’s answerable for the authorship, detailed description, governance, quality, and operational health of that product. Intuit also developed a framework to help its data workers understand and document their responsibilities.
As a result, its data workers didn’t have to run around asking questions about finding, accessing, using, and managing data.
Read more → Intuit’s data mesh strategy
2. JP Morgan and Chase
JP Morgan and Chase wanted a cloud-first strategy to modernize its platform. Their goal behind the strategy was to cut costs, unlock new opportunities, and reuse data.
JP Morgan and Chase set up an environment where each business line (i.e., data domain) could create and own its data lake end-to-end. They could create as many data producer and consumer accounts as they wanted.
All the data products were interconnected and overseen by standardized data governance policies.
The company then uses a catalog of metadata to track lineage and provenance, and ensure that the data is accurate, updated, consistent, and trustworthy.
Read more → How JP Morgan is implementing a data mesh on the AWS Cloud
3. Delivery Hero
The data teams at Delivery Hero were grappling with issues such as data availability, data ownership, access management, data quality, and security. In addition, they knew they had to work on infrastructure scalability, protection from unauthorized access, and data sharing across functions.
They switched to the data mesh approach as it “described how to organize a team and accountability, in order to build a data-driven organization and create a paradigm shift from centralized ownership to decentralized ownership.”
After addressing the team structure aspect, Delivery Hero worked on building data infrastructure as a platform using GCP. Each domain data unit would get a dedicated GCP project with all the necessary components such as BigQuery, VPC, Kubernetes Cluster, CloudSQL, and Load Balancer.
Read more → Delivery Hero’s data mesh platform
Summing it up
While several companies, such as those mentioned above, have started implementing the data mesh architecture, it’s crucial to note that these are just a few interpretations of the mesh.
Looking at your organization’s data maturity, needs, use cases, and culture is essential before embarking on a full-fledged data mesh journey. It’s equally important to note that the data mesh is a paradigm shift in how we view, manage, and experience data.
Technologically, this requires a decentralized setup with an ecosystem of data products as opposed to a single monolith. Culturally, it all starts by treating data users as consumers and offering them a delightful experience, ensuring that finding and using data is easy and seamless.
Want to understand more about how a data mesh can supercharge your data initiatives?
Data Mesh: Related reads
- Data Mesh Vs. Data Lake: Definition, principles, and architecture
- Data Fabric vs Data Mesh: Definition, architecture, benefits, and use cases
- What is Data Mesh?: Examples, Case Studies, and Use Cases
- Data Mesh Architecture: Core Principles, Components, and Why You Need It?
- Data Mesh Setup and Implementation - An Ultimate Guide
- Data Mesh Principles: Top 4 Fundamentals and Architecture
- Snowflake Data Mesh: Step-by-Step Setup Guide, with Detailed Notes on Scaling and Maintenance
Share this article