Data Catalog for Data Fabric: 5 Essential Features to Consider

Last updated on: March 01st, 2023, Published on: November 24th, 2021
header image

Share this article

The data fabric is a design concept that has emerged in response to the growing need to efficiently & nimbly manage large amounts of data. A data fabric is composed of several units, one of those units is a data catalog.

The data catalog is a fundamental component of the data fabric, which brings in capabilities like data access, governance, understanding, and more.

In this article, we will understand:

  1. What is a data fabric?
  2. What is a data catalog?
  3. Why do you need data catalogs to build data fabrics?
  4. essential features to look for while evaluating data catalog for data fabric
  5. Data catalog for data fabric: Related reads

What is a data fabric? #

A data fabric is a composable, flexible, and scalable design concept that enables effective management & utilization of large amounts of disparate data.

It can be used across the company to collect distributed data from different sources and prepare that data for business decisions or to build data-powered products.

The data fabric offers a consistent user experience and data access for all members of an organization. The goal of the data fabric is to create a single view of related data to make it easier for applications to access information, regardless of where the data is located.

It is also used to simplify the analysis, often using artificial intelligence and machine learning.

An important advantage of using a data fabric is that data in this model does not move. A data fabric connects data from different sources, processes it, and prepares it for analysis.

The data fabric also allows you to dynamically connect and interact with new data. This allows you to significantly increase the speed of data processing and analysis.

A data fabric is made up of components that can be selected and combined in various combinations. Consequently, the implementation of a data fabric can be very different depending on your use cases.

Want to know more about data fabrics? Get data fabric 101 here.

What is a data catalog? #

The data catalog is a well-organized list of data assets across all of your data sources. It is used to better discover, understand and use data. The data catalog makes all of your data and associated metadata - organized, indexable, and easily accessible.

The data catalog links data to assets that make it meaningful. It combines metadata with data management and retrieval capabilities to help a company organize its data, find the data and assess whether an asset is appropriate for a specific use case.

Using the data catalog enables teams to easily discover, truly understand, and effectively use the data they need.

The concept of a data catalog and data catalog use cases in data teams have evolved drastically over the years. It’s important to appreciate that journey. Because traditionally what served as a data catalog wouldn’t truly qualify as a unit in a data fabric. Modern data catalogs (or third-generation data catalogs) do.

If you are new to data catalogs and want to understand more about them, and their evolution over the years, we highly recommend reading the following:

Why do you need data catalogs to build data fabrics? #

The most defining components of a data fabric are active-metadata-driven augmented data management and knowledge graphs. Both those components are interlinked with augmented data catalogs.

Data catalogs may bring any or all of the following capabilities to data fabrics:

  • Metadata Lake: Acts as a single central metadata store, collects metadata from all tools from across the stack
  • Data Discovery: Helps make data easily searchable and understandable
  • Business Glossary: Can use machine learning to connect metadata to organizational terminology, thus forming the business semantic layer in the data fabric
  • Data Quality: Automates data quality and profiling
  • Data Lineage: Builds trust in data with end-to-end data lineage
  • Data Governance: Help set and implement a robust data governance framework

So, what are the must-have features that qualify some data catalogs uniquely for data fabrics over others? Let’s find out.

5 essential features to look for while evaluating data catalog for data fabric #

  1. Acts as a unified active metadata repository.
  2. Offers a fully-automated end-to-end data lineage.
  3. Enables granular access governance.
  4. Had embedded collaboration capabilities.
  5. Supports reverse metadata integration.

#1- Acts as a unified active metadata repository #

With the data catalog, you can populate your data fabric with active metadata. It allows you to collect data from different sources, refine and weave it into the data fabric.

Important to note that traditional data catalogs just passively organize and catalog technical metadata. Third-generation data catalogs, on the other hand, actively inventory and enrich operational, business, and social metadata - in addition to technical metadata.

4 different types of metadata

Different kinds of Metadata. Image by Atlan.

#2- Offers a fully-automated end-to-end data lineage #

Data Lineage shows how the data has evolved over its life cycle. Its main purpose is to simplify the process of backtracking the origin of data as much as possible. It is a framework for keeping track of the sources of data from which they are derived and the transformation steps they went through.

The use of a data lineage in a data fabric is very important to ensure data reliability. Knowing about the data source, its path, and all its transformations from the moment it was entered into the database until the moment you saw it in the report will help you make the right decisions.

The use of a data lineage in a data fabric is very important to ensure data reliability. Knowing about the data source, its path, and all its transformations from the moment it was entered into the database until the moment you saw it in the report will help you make the right decisions.Automated lineage via SQL parsing in Atlan

Automated lineage via SQL parsing. Image by Atlan

#3- Enables granular access governance #

Ensure that you pick a data catalog for your data fabric, that helps control access to data without compromising the democratization of the data. It allows you to classify and group data according to different criteria, as well as creating user groups based on functions or roles, and control the permitted actions using policies.

Well-implemented data governance helps organizations avoid unauthorized access to data and comply with organizational and regulatory policies. It also helps break down data silos to ensure that every team member has access to the data they need, and if they don’t, they at least know who to ask.

Granular data governance using Access

Managing access via user groups, actions, or even personas. Image by Atlan

#4- Has embedded collaboration capabilities #

The best data catalogs for data fabrics help ensure that stakeholders are not working in isolation. They empower even non-technical data consumers to find and use data, enabling data sharing across the company.

It provides the ability to create group projects and annotate data, which improves user productivity and increases the usefulness of data throughout the organization.

Embedded collaboration capabilities in Atlan

In-line chats. Image by Atlan.

#5- Supports reverse metadata integration #

Building a data fabric means it will span a number of teams, with diverse data users, that come with very different tooling preferences. The data catalog that you choose while building a data fabric should ideally allow reverse metadata integration - which means centralized enriched metadata should flow back to the tools that users use daily.

The data fabric is as fluid as it sounds, it’s a design idea, and how you build it and use it will dictate the use cases that emerge out of it. It’s important to choose a data catalog that is truly extensible and open - to compose a flexible and scalable data fabric on top of it.

Looking for the ideal data catalog that can act as the foundation for your data fabric? Surely you’d like to evaluate Atlan - a third-generation data catalog. Here are some quick links to the demo, or best, speak to Team Atlan and discuss all your evaluation queries directly.

Share this article

Ebook cover - metadata catalog primer

Everything you need to know about modern data catalogs

Adopting a modern data catalog is the first step towards data discovery. In this guide, we explore the evolution of the data management ecosystem, the challenges created by traditional data catalog solutions, and what an ideal, modern-day data catalog should look like. Download now!

[Website env: production]