What is Active Metadata? Why is it the Present and Future of Metadata Management?

August 19th, 2021

What is a data catalog? Image of catalogued books in a library

What is active metadata?

Active metadata is the key to deriving the most out of a modern data stack. It is metadata enriched after the application of machine learning, human intervention, and process outputs.

Our diverse data tools and systems are generating and capturing more metadata than ever before. Yet, we have barely scratched the surface of what’s possible with metadata. Realizing the fullest potential of metadata lies in going from passive to active metadata

As defined by Gartner in their latest market guide for active metadata management, active metadata is the continuous analysis of all available users, data management, systems/infrastructure, and data governance experience reports to determine the alignment and exception cases between data as designed versus actual experience.

Active metadata lives up to its name, it enables a data ecosystem that's always on, intelligent, and is action-oriented. In fact, it has been widely recognized as the heart and soul of some of the most significant data trends of 2021 like Data Fabric, Data Mesh, Consumerization of data tools, Autonomous DataOps, etc.


Active vs Passive Metadata

Passive metadata is just technical metadata - schemas, data types, models, etc.

Active metadata is data that defines data, plus data about everything that happens to the data and is done to the data. So it's operational, business, and social metadata - in addition to technical metadata.

Here's a look at the examples of each of these metadata types:

types-of-metadata-1

Types of Metadata. Image source: Altan

Traditionally metadata management platforms worked on storing and organizing just technical metadata - basic information about the organization's data. The advent of the modern data stack enabled the generation of many more kinds of metadata and often real-time.

Metadata can be qualified as truly "active" if metadata management systems can automate ubiquitous finding, inventorying, and using all the different kinds of metadata - to create that single source of truth - across multiple capabilities and domains.


What are the benefits of active metadata?

  1. Improved context of metadata powering data discovery
  2. Auto-generated data quality & lineage impact analysis
  3. Auto-classification of sensitive data enabling easy governance & compliance
  4. Making embedded collaboration possible
  5. Orchestration of metadata across platforms

Improved context of metadata powering data discovery

Active metadata ensures improved context to data and more importantly often without human intervention. It enables to correlate business terms with any data object—tables or columns saved SQL snippets, & reproducible queries. This is especially helpful in a system of diverse data producers and consumers, where each team may have different logic of labeling and defining data.

Active metadata thus helps generate a common understanding of the data & how to use it.

Auto-generated data quality & auto-constructed data lineage

Active metadata allows auto-profiling of data to identify missing values, outliers & other data anomalies - thus giving users the chance to proactively act on bad data before it impacts applications.

Systems utilizing active metadata also have the ability to reveal how data has evolved (is evolving) through the life cycle. This helps foresee assets that will be impacted upstream or downstream in case of any change going forward.

Auto-classification of sensitive data enabling easy governance & compliance

True data democratization is possible when data users across the organization have visibility of all existing data. But that in no way means compromising on sensitive information. Active metadata-powered platforms can be trained to auto-classify sensitive information and mask them - only to be revealed to users with authorized access.

Such systems thus enable automatic regulation compliances and help customize granular access policies across unique governance strategies.

Making embedded collaboration possible

Data teams are composed of data scientists, data engineers, data analysts, product managers, marketing professionals, citizen analysts, and more. Each of these groups has different preferences when it comes to the tools they use & the ecosystems they inhabit.

Embedded Collaboration is about work happening where you are with the least amount of friction, and active metadata seems to make that possible. It enables all data producers and consumers to operate within their system of choice while continuing to access and work on data.

Orchestration of metadata across platforms

The modern data stack is ever-evolving and complex. The data lives across multiple tools and systems in very different formats. Active metadata can help create a federated metadata management system that makes these different tools and systems talk to each other, thus making data assets across these systems interoperable.


How is active metadata transforming metadata management?

Very recently, the famous Magic Quadrant for Metadata Management Solutions was scrapped by Gartner and they replaced it with a Market Guide for Active Metadata. This is being seen as a transformational leap towards a new approach to metadata - with active metadata right at the driver's seat.

"The stand-alone metadata management platform will be refocused from augmented data catalogs to a metadata “anywhere” orchestration platform" predicts Gartner in this latest market guide - and this is perhaps what can be defined as the step forward to finally breaking down data silos & creating that single source of truth across all data assets existing in an organization.

The major problem that this new approach is trying to solve is ensuring that metadata management platforms catch up with the speed and types of metadata that are being generated & captured. It also particularly recognizes the increase in the value of metadata, once it stops living "passively" across platforms to "actively" moving between platforms.

Active metadata will prompt metadata management platforms to evolve to the following capabilities:

  • Ability to import & export metadata, workflows, and other optimization strategies
  • Use of machine-learning to recommend job flows, resource allocation, etc.
  • Metadata analysis across platforms

What is an active metadata platform?

A great way to activate metadata is by implementing an active metadata platform.

These four characteristics define an active metadata platform:

  1. They are always on
  2. They don't just collect metadata, they create intelligence
  3. They don't stop at intelligence, they drive action
  4. They are API driven, enabling embedded collaboration

5 keys components of an active metadata platform

  1. The metadata lake: A unified repository to store all kinds of metadata, in raw and processed forms, built on open APIs and powered by a knowledge graph.
  2. Programmable-intelligence bots: A framework that allows teams to create customizable ML or data science algorithms to drive intelligence.
  3. Embedded collaboration plugins: A set of integrations, unified by the common metadata layer, that seamlessly integrates data tools with each data team’s daily workflow.
  4. Data process automation: An easy way to build, deploy, and manage workflow automation bots that will emulate human decision-making processes to manage a data ecosystem.
  5. Reverse metadata: Orchestration to make relevant metadata available to the end-user, wherever and whenever they need it, rather than in a standalone catalog.

active-metadata-platform-architecture-1

Active Metadata Platform architecture. Image courtesy: Humans Of Data

Understand the anatomy of active metadata platforms in detail, here.


Getting started with active metadata

Getting started with active metadata also automatically gets you started towards building a more forward-looking stack. Whether one is trying to compose a data fabric, a data mesh, or trying to democratize data across all teams in the organization - the first thing one needs to do is to choose metadata management tools that have the ability to use & exchange active metadata.

This helps minimize some of the fundamental problems that currently plague data teams such as:

  • I need this data to make this decision, is it available in this organization?
  • What does this column even mean?
  • Something about this table doesn't look right, how can I confirm?
  • Who owns this data, and where all has it been used? Is it ok if I change it?

Are you also keen to solve these problems for your team? Want to chat about the endless possibilities of active metadata?

Speak to our team

Free Guide: Find the Right Data Catalog in 5 Simple Steps.

This step-by-step guide shows how to navigate existing data cataloging solutions in the market. Compare features and capabilities, create customized evaluation criteria, and execute hands-on Proof of Concepts (POCs) that help your business see value. Download now!