Data Fabric: What is It & Why is It Critical for Organizations?
Share this article
What is a data fabric? #
A data fabric is a technology-agnostic, network-based, automation-focused data architecture and design pattern. It provides you with a consistent and reliable way of working with data. The core idea behind a data fabric is to mimic weaving various data resources into a fabric that holds all of them together.
This article will take you through the fundamental ideas behind a data fabric, the core problem a data fabric solves, and how using a data fabric can help your business.
Table of contents #
- What is a data fabric?
- Why is a data fabric critical for organizations?
- Data fabric architecture and principles
- Data fabric alternatives
- Data fabric: What did we learn?
- Data fabric: Related reads
Why is a data fabric critical for organizations? #
A data fabric saves on data processing and movement costs while future-proofing your architecture to add more data sources and siloed data.
There’s been a rapid increase in the variety of data technologies, API-based development, and microservices-based application architectures over the last few years. With that increase, businesses have had various data sources to integrate from.
The emergence of data processing technologies has provided more power and freedom to various business functions that wish to leverage data according to their own needs. This has also resulted in businesses having more dispersed and spread-out data, commonly known as siloed data.
There are a few data platform engineering approaches that can help deal with silos:
- Don’t allow siloed data - a data warehouse can help with this; to an extent, a data lake can also help with this.
- Allow siloed data and let business teams own siloes - this is basically the approach data mesh takes.
- Allow siloed data and connect the siloes so that they’re being siloed and become irrelevant to the business without moving or copying data around.
A data fabric takes the third approach from the ones listed above.
Data fabric architecture and principles #
Although a data fabric, as mentioned earlier, is a technology-agnostic architecture pattern, there are several core features that define what it is. In this section, we’ll talk about what makes a data fabric from a core principles and features point-of-view:
- Seamless data integration and delivery
- Complete data cataloging and discovery
- Data governance and security
- Observability and transparency
- A high degree of automation
Data movement and copying are the banes of a data platform’s existence. Let’s first look at how a data fabric helps square that circle.
1. Seamless data integration and delivery #
One of the primary offerings of a data fabric is a way to seamlessly integrate heterogeneous, spread-out, and often siloed data sources. By using concepts like cross-platform data sharing, clean rooms, and CDC (Change Data Capture), a data fabric can weave your data sources together to fit on a single data plane.
However, actually going through traditional data movement patterns, such as ETL or ELT, reduces additional workload from the data ecosystem within a business.
A data fabric allows you to achieve precisely that. As the data is not being moved around or copied frequently, it makes data governance much easier, and the data owner can control access.
Although having a data fabric on top of a data lake or a data warehouse might seem antithetical to the idea of not moving or copying data around, it is not.
Any traditional data system can be a part of a data fabric if it can support the basic functionality it needs to operate, such as exposing data via JDBC connectors and REST APIs. Let’s take the example of data sharing across organizations.
A data fabric would provide you with several methods to access shared data right where it currently resides.
Obviously, all the governance and privacy policies will apply before you access the shared data, which is why the control always rests with the business sharing its data with you.
2. Complete data cataloging and discovery #
Data catalogs and data discovery tools are powered by metadata. Metadata can be fetched directly from various data sources along with their business context and data lineage information. In a data fabric, the principles of minimal copying and data movement also apply when dealing with metadata.
What makes a data fabric different in handling the data cataloging and discovery problem is its ability to provide a more wholesome and up-to-date view of the data ecosystem. This is why a data catalog becomes the first layer at which data consumers explore and interact with data in a data fabric.
The data catalog of a data fabric isn’t entirely like the data catalog of a data warehouse or a data lake. Here the role of data cataloging expands when different types of metadata, such as data dictionary, data lineage, business context, and so on, lead to the building of a semantic network of your data assets. This network is more popularly called a knowledge graph.
In a data fabric, the data catalog becomes the first touchpoint for data consumers and makes the data available to them. This means that the data consumers can search, understand, and access business data using a single interface.
At this juncture, identity management, permissions, data privacy, data security, and the overarching topic of data governance come into the picture. Let’s discuss that in the next section.
3. Data governance and security #
Everything in a data fabric, including data integration, cataloging, and discovery, happens on top of the virtualization layer that the fabric creates. Data governance and security are no different.
Permissions related to data access, sharing, modification, and profiling are all controlled at the virtualization layer.
Data governance suffers from the same bureaucratic friction that most time-consuming processes in large organizations do.
Data governance tools and processes are introduced to solve data access problems, but they end up making it needlessly more difficult to access data.
A data fabric helps solve that problem by enabling you to govern your data from the virtualization layer without moving any data from your sources.
The virtualization layer becomes a bridge between you and your data. This bridge lets you transport cargo of any shape and size from anywhere, with proper security and checkpoints to check the cargo and its recipients.
4. Observability and transparency #
Data observability is an overarching theme that covers data reliability, availability, quality, security, governance, and more.
From basic monitoring of processes and jobs to logging custom, fine-grained messages to understand who is using what data and how - observability covers it all.
Building observability into the system using a data fabric also automatically builds trust in the system. The data fabric, through its virtualization layer, makes it very easy for you to look at any component of the system and see what it is doing, not just after an incident or a raised bug, but in real time.
The SRE approach to observability starts by making it easy for developers to get alerts when something is wrong with their code. Once they get alerts, they can assess the impact of the problem.
This leads to the most important question - what to do now? With properly configured observability, this is taken care of by accessing the right data, which gives the developers a complete understanding of what went wrong.
Data observability has all that, but it also has the governance aspect added to it. Data observability enables you to see if RBAC and ABAC rules are followed if PII or PHI data-sharing rules are violated, and if there are blind spots that make the business vulnerable to data-related security incidents.
5. A high degree of automation #
None of those above areas can be taken care of without automation at the core of all data-related processes, whether managing permissions, sharing data, updating knowledge graphs, etc. The observability pillar entirely rests upon the automated delivery of logs and messages to a search engine.
On the infrastructure front, technologies like Terraform, Pulumi, and CloudFormation are extremely useful, especially when you are dealing with everchanging multi-cloud setups.
Then there are CI/CD tools that allow code promotion and delivery with integrated data quality, testing, and profiling.
With data governance, too, you can have data privacy and security-related incidents reported in real-time by creating automated governance tests. Such issues, if caused early, can sometimes prevent catastrophes, humongous fines, and a loss of reputation.
Data fabric alternatives #
1. Data mesh vs data fabric #
The most prominent alternative of a data fabric is a data mesh. Both data mesh and a data fabric attack the problem of siloed data in their own ways.
Data mesh solves the problem of siloed data by making data into a product that different teams and individuals own. Data mesh, therefore, takes a decentralized approach to data organization. A data fabric takes a slightly different approach.
Although, like data mesh, a data fabric doesn’t get rid of siloed data, it indeed attempts to link it up in a way that it seems like a part of the same plane; that is, the fact that there is siloed data will be irrelevant to the end user.
All data sources in a data fabric will be accessible via a centralized data catalog that sits horizontally across every data asset in the system.
2. Data fabric vs data warehouse vs data lake #
Comparing a data fabric to data warehouses and data lakes isn’t a like to like comparison. With both these and a data fabric, it is not a question of whether or when you’re thinking of implementing a data solution.
A data fabric isn’t a replacement for a data warehouse or a data lake. It often co-exists with either or both of these. This is why it is fair to compare a data fabric with data mesh but not with data warehouses and data lakes.
Data fabric: What did we learn? #
This article talked about what constitutes a data fabric when it makes for a good use case and how it solves the problem of siloed data without removing the siloes and without copying or moving a whole lot of data around.
Data fabric: Related reads #
- Data Fabric vs. Data Virtualization: Overview, Comparison, and Differences
- Data Catalog for Data Fabric: 5 Essential Features to Consider
- Data Mesh vs. Data Fabric: How do you choose the best approach for your business needs?
Share this article