What is Active Metadata Management? Why is it a key building block of a modern data stack?

March 09, 2022

header image for What is Active Metadata Management? Why is it a key building block of a modern data stack?

What is active metadata?

As defined by Gartner in their market guide for active metadata management, active metadata is the continuous analysis of all available users, data management, systems/infrastructure, and data governance experience reports determining the alignment and exception cases between data as designed versus actual experience.

Active metadata lives up to its name. It enables a data ecosystem that's always on, intelligent, and action-oriented. It is the key to deriving the most out of a modern data stack.

Our diverse data tools and systems are generating and capturing more metadata than ever before. Yet we have barely scratched the surface of what’s possible with metadata. Realizing the fullest potential of metadata lies in going from passive to active metadata.

In fact, it has been widely recognized as the heart and soul of some of the most significant data trends of 2021 like Data Fabric, Data Mesh, consumerization of data tools, Autonomous DataOps, etc.

[Free Download] → Data catalogs are going through a paradigm shift. Here’s all you need to know about a 3rd Generation Data Catalog

What is the difference between active and passive metadata?

Passive metadata is just technical metadata — schemas, data types, models, etc.

Active metadata is data that defines data, plus data about everything that happens to the data and is done to the data. So it's operational, business, and social metadata — in addition to technical metadata.

Here's a look at the examples of each of these metadata types:

Types of Metadata

Types of metadata. Image source: Atlan

Traditionally metadata management platforms worked on storing and organizing just technical metadata — basic information about the organization's data. The advent of the modern data stack enabled the generation of many more kinds of metadata, often in real-time.

Metadata can be qualified as truly "active" if metadata management systems can automate ubiquitous finding, inventorying, and using all the different kinds of metadata — to create that single source of truth — across multiple capabilities and domains.

[Download ebook] What is Active Metadata and Why Does it Matter?

What are the benefits of active metadata?

  1. Improved context of metadata powering data discovery
  2. Auto-generated data quality and lineage impact analysis
  3. Auto-classification of sensitive data enabling easy governance and compliance
  4. Making embedded collaboration possible
  5. Orchestration of metadata across platforms

Learn about 20 Active Metadata use cases in 20 minutes

Let's discuss some of the benefits in detail:

Improved context of metadata powering data discovery

Active metadata ensures improved context to data and more importantly often without human intervention. It lets you correlate business terms with any data object — tables or columns, saved SQL snippets, and reproducible queries. This is especially helpful in a system of diverse data producers and consumers, where each team may have different logic of labeling and defining data.

Active metadata thus helps generate a common understanding of the data and how to use it.

Auto-generated data quality & auto-constructed data lineage

Active metadata allows auto-profiling of data to identify missing values, outliers, and other data anomalies — thus giving users the chance to proactively act on bad data before it impacts applications.

Systems utilizing active metadata also have the ability to reveal how data has evolved (or is evolving) through the life cycle. This helps foresee assets that will be impacted upstream or downstream in case of any change going forward.

Auto-classification of sensitive data enabling easy governance and compliance

True data democratization is possible when data users across the organization have visibility of all existing data. But that in no way means compromising on sensitive information. Active metadata-powered platforms can be trained to auto-classify sensitive information and mask them — only to be revealed to users with authorized access.

Such systems thus enable automatic regulation compliances and help customize granular access policies across unique governance strategies.

Making embedded collaboration possible

Data teams are composed of data scientists, data engineers, data analysts, product managers, marketing professionals, citizen analysts, and more. Each of these groups has different preferences when it comes to the tools they use and the ecosystems they inhabit.

Embedded collaboration is about work happening where you are with the least amount of friction, and active metadata seems to make that possible. It enables all data producers and consumers to operate within their systems of choice while continuing to access and work on data.

Orchestration of metadata across platforms

The modern data stack is ever-evolving and complex. The data lives across multiple tools and systems in very different formats. Active metadata can help create a federated metadata management system that makes these different tools and systems talk to each other, thus making data assets across these systems interoperable.

A Guide to Building a Business Case for a Data Catalog

Download free ebook

How is active metadata transforming the modern data stack?

Very recently, the famous Magic Quadrant for Metadata Management Solutions was scrapped by Gartner, and they replaced it with a Market Guide for Active Metadata. This is being seen as a transformational leap towards a new approach to metadata — with active metadata right in the driver's seat.

"The stand-alone metadata management platform will be refocused from augmented data catalogs to a metadata 'anywhere' orchestration platform", predicts Gartner in this latest market guide — and this is perhaps what can be defined as the step forward to finally breaking down data silos and creating that single source of truth across all data assets in an organization.

The major problem that this new approach is trying to solve is ensuring that metadata management platforms catch up with the speed and types of metadata that are being generated and captured. It also particularly recognizes the increase in the value of metadata, once it stops living "passively" across platforms and changes to "actively" moving between platforms.

Active metadata will prompt metadata management platforms to evolve to the following capabilities:

  • Ability to import and export metadata, workflows, and other optimization strategies
  • Use of machine learning to recommend job flows, resource allocation, etc.
  • Metadata analysis across platforms

What does an active metadata management platform look like?

A great way to activate metadata is by implementing an active metadata platform.

These four characteristics define an active metadata platform:

  1. They are always on.
  2. They don't just collect metadata, they create intelligence.
  3. They don't stop at intelligence; they drive action.
  4. They are API-driven; enabling embedded collaboration.

5 keys components of an active metadata platform

  1. The metadata lake: A unified repository to store all kinds of metadata, in raw and processed forms, built on open APIs and powered by a knowledge graph.
  2. Programmable-intelligence bots: A framework that allows teams to create customizable ML or data science algorithms to drive intelligence.
  3. Embedded collaboration plugins: A set of integrations, unified by the common metadata layer, that seamlessly integrates data tools with each data team’s daily workflow.
  4. Data process automation: An easy way to build, deploy, and manage workflow automation bots that will emulate human decision-making processes to manage a data ecosystem.
  5. Reverse metadata: Orchestration to make relevant metadata available to the end-user, wherever and whenever they need it, rather than in a standalone catalog.

Active Metadata Platform Architecture

Active Metadata Platform architecture. Image courtesy: Humans of Data

Understand the anatomy of active metadata platforms in detail here.

How can you get started with active metadata management?

Getting started with active metadata also automatically gets you started towards building a more forward-looking stack. Whether you are trying to compose a data fabric, data mesh, or trying to democratize data across all teams in the organization, the first thing you need to do is to choose metadata management tools that have the ability to use and exchange active metadata.

This helps minimize some of the fundamental problems that currently plague data teams such as:

  • I need this data to make this decision. Is it available in this organization?
  • What does this column even mean?
  • Something about this table doesn't look right. How can I confirm?
  • Who owns this data, and where has it been used? Is it ok if I change it?

Are you also keen to solve these problems for your team? Want to chat about the endless possibilities of active metadata and explore how Atlan can help get you started on active metadata management?

Free Guide: Find the Right Data Catalog in 5 Simple Steps.

This step-by-step guide shows how to navigate existing data cataloging solutions in the market. Compare features and capabilities, create customized evaluation criteria, and execute hands-on Proof of Concepts (POCs) that help your business see value. Download now!