Is data fabric the same as data mesh?
Managing data strategy for a small organization is, well, manageable. The problem first occurs when an organization starts growing in size, and there are too many different data sources to keep track of.
Once you have the basics in place, you want to know how to optimize your data management architecture. That’s where two design concepts come into place - Data Fabric and Data Mesh.
Data Fabric vs Data Mesh. What’s the difference? Let's look at a quote from James Serra, a Data Platform Architecture Lead at EY:
“A data fabric and a data mesh both provide an architecture to access data across multiple technologies and platforms, but a data fabric is technology-centric, while a data mesh focuses on organizational change.”
Before we understand the key differences between data fabric and data mesh, let's revisit our basic understanding of these two concepts.
What is a Data Fabric?
According to Gartner, data fabric is a design concept that serves as an integrated layer (fabric) of data and connecting processes. It is a composable, flexible and scalable way to maximize the value of data in an organization. It's not one tool or process, rather an emerging design concept that gives a framework to think about how to stack existing tools, resources, and processes.
Data fabric connects data from multiple sources and prepares it in a way in which you - a business user or a data professional - can access and analyze it easily.
According to Forrester Wave, a data fabric can help enterprise users with the following:
- Increase agility: As data fabric creates a semantic layer that accelerates the delivery of data and insights by automating key processes, it automatically increases agility while engaging business users and analysts in the data preparation process.
- Minimize complexity: Data fabric aims to automate the processes of ingestion, curation, and integration of data sources to enable the analytics and insights that are critical for business success. It minimizes complexity by automating processes, workflows, pipelines, generating code automatically, and streaming data to simplify deployment.
Data fabric solves the following pain points:
- Data silos: No data is transferred to the “central storage”, which means you completely avoid the problem of data hosting and privacy policies. As a result, you can spend more time analyzing the data.
- Replication: No need to maintain consistency and replication across all repositories.
- Latency: There’s no need to constantly move huge amounts of data to central storage. The amount of data grows over time, which often causes delays in the systems if we’re talking about centralized approaches.
What is a Data Mesh?
One of the most recognizable Data Mesh definitions comes from Zhamak Dehghani:
The data mesh platform is an intentionally designed distributed data architecture, under centralized governance and standardization for interoperability, enabled by a shared and harmonized self-serve data infrastructure.
The data mesh paradigm stands for decentralized and domain-specific data ownership that is easily discoverable and ready for consumption for everyone in the organization.
Data mesh has a couple of key characteristics that distinguish it from other paradigms, as mentioned in the article Data Mesh Principles and Logical Architecture by Martin Flower:
- Data ownership: Data mesh stores data across different domains. This data is maintained and managed by domain experts.
- Data as a product: Each data domain is seen as a product, and the users are its customers.
- Self-serve data platform: A data mesh advocates setting up an ecosystem that supports creating, using, and maintaining data products without needing specialized knowledge or expertise in sophisticated tooling and technologies.
- Federated computational governance: Decentralized data products can lead to data silos. A federated approach to governance standardizes rules, definitions, and procedures related to data.
Data mesh moves away from the concept of storing, transforming, and processing analytical data centrally. Instead, it advocates that each business domain is responsible for hosting, preparing, and serving their data to their own domain and larger audience.
Several companies have publicly presented their data mesh strategy - here are just a few:
- Intuit’s breakdown of their data mesh strategy
- JPMorgan Chase shared their experiences on how to get started with data mesh implementation
- HSBC uses data mesh in their data strategy
Data Fabric vs Data Mesh: Key Differences
|Data Fabric||Data Mesh|
|Architecture||Data is centralized. Data made available through APIs. Aims to eliminate human effort with machine learning and AI.||Data is stored within each domain of a company. Data is copied into specific datasets for specific use-cases. Less emphasis on AI, since work is handled by domain experts.|
|Benefits||Self-service data consumption and collaboration. Automates governance, data protection, and security. Automates data integration and data engineering.||Agility and scalability with fast access and accurate data delivery. Platform connectivity and data security. Robust data governance and end-to-end compliance.|
|Use-Cases||Business applications - challenges of data availability and reliance for business applications. Data discovery - what data is available and where. Machine learning - minimizes the data preparation phase when training ML models.||Financial sector - fast fraud threat analysis without copying data to a central database. Sales and marketing - targeted campaigns based on user profiles. Machine learning - create virtual data warehouses as a basis for training ML models.|
Data Fabric vs. Data Mesh: Architecture
In data fabric, the data is centralized, whereas the data is stored within each domain of a company in data mesh. Each node in the data mesh has local storage and computation power, but no single point of control.
But probably the key difference between the two is in how data is accessed. In data fabric, data is made available with objective-based APIs or data stores. On the other hand, in data mesh, data is copied into specific datasets for a specific use-case, but under complete control of the business domain.
It’s also worth pointing out that a data fabric requires a central team that owns the critical functions for the fabric orchestration. It’s unlikely for that team to become a bottleneck, as much of their work is automated by artificial intelligence. On the other hand, there’s no need for a human team in data mesh. There’s less emphasis on AI since all of the work is handled by domain experts.
Data Fabric vs. Data Mesh: Benefits
Right off the bat, there are three data fabric benefits worth discussing:
- Self-service data consumption and collaboration: Enable pertinent users of data within organizations to find quality data quicker, and spend more time exploring the data.
- Automate governance, data protection, and security: AI-enhanced automation creates data governance rules and definitions by extracting content from regulatory documents automatically.
- Automate data integration and data engineering tasks: Data fabric optimizes and accelerates data delivery within the organization, and hence eliminates inefficient, repetitive, and manual data integration workflows.
Following are the benefits of data mesh:
- Agility and scalability with fast access and accurate data delivery: Data mesh improves time-to-market, scalability, and business domain agility. Businesses can access data from anywhere with SQL queries with much lower latency.
- Platform connectivity and data security: Data mesh attacks data where it lives, instead of requiring users to make a copy and route it through a public network to a data warehouse. Hence, there’s a low risk of a data breach or information loss.
- Robust data governance and end-to-end compliance: The decentralized data operations simplify compliance with global data governance guidelines.
Data Fabric vs. Data Mesh: Use cases
Use cases of data fabric:
- Business applications - Data fabric solves the challenges of business data availability, the unreliability of data storage formats and security, poor scalability, and reliance on underperforming legacy systems. Read more.
- Data discovery - This layer controls access to the right data. The data discovery layer unfolds what data is available and where.
- Machine learning - AI engineers can use machine learning models efficiently in a data fabric environment because the data preparation time is minimized. They also have access to secure data, which facilitates enhanced machine learning processes. Learn more.
Data mesh use cases, according to Starburst:
- Financial sector - Data mesh allows international financial bodies to analyze data locally to identify fraud threads without replicating datasets and transporting them to a central database.
- Sales and marketing - These departments can get a 360-degree view of the consumer profile and behaviors from different systems. As a result, the departments can make more targeted campaigns which leads to an increase in the customer lifetime value and a decrease in churn.
- Machine Learning - Data mesh enables AI engineers to create virtual data warehouses from different sources on which machine learning models are trained without having to consolidate data in a central location.
Data Fabric vs. Data Mesh: How to choose?
Data fabric and data mesh aren’t conflicting design approaches but instead solve different problems.
Data fabric is extremely useful if you want to manage the trouble of dealing with huge amounts of data stored in silos. This design approach provides a centralized data integration across multi-cloud, hybrid clouds, on-premise, and stand-alone hosted systems. All data in a data fabric is made available through APIs or data stores. It’s also worth mentioning that data fabric relies more on machine learning than data mesh.
On the other hand, data mesh focuses more on organizational changes. Data is managed via controlled datasets that are domain-specific, which implies a decentralized approach. All data in a data mesh is copied into a specific dataset for specific use cases but under the complete control of the business domain.
To summarize, both data fabric and data mesh provide powerful solutions to make your organization data-driven and even data-led. Data fabric allows everyone (within permission) easy access to data at the right time. Data mesh takes a decentralized approach by keeping separate domain-specific datasets.
Choosing one over the other essentially boils down to the problem your organization is dealing with.
Data fabric vs. Data mesh: Related reads
- What is data fabric: Definition, components, benefits & use cases.
- What is data mesh: Definition, architecture, and benefits.
- Understanding data mesh architecture and why is it important?
- Data fabric vs. Data virtualization: Overview, comparison, and differences.
- Data mesh vs data lake: Understanding decentralized and centralized approaches to data management.