What is active metadata?
As defined by Gartner in their market guide for active metadata management, active metadata is the continuous analysis of all available users, data management, systems/infrastructure, and data governance experience reports determining the alignment and exception cases between data as designed versus actual experience.
Active metadata lives up to its name. It enables a data ecosystem that's always on, intelligent, and action-oriented. It is the key to deriving the most out of a modern data stack.
Our diverse data tools and systems are generating and capturing more metadata than ever before. Yet we have barely scratched the surface of what’s possible with metadata. Realizing the fullest potential of metadata lies in going from passive to active metadata.
[Free Download] → Data catalogs are going through a paradigm shift. Here’s all you need to know about a 3rd Generation Data Catalog
What is the difference between active and passive metadata?
Passive metadata is just technical metadata — schemas, data types, models, etc.
Active metadata is data that defines data, plus data about everything that happens to the data and is done to the data. So it's operational, business, and social metadata — in addition to technical metadata.
Here's a look at the examples of each of these metadata types:
Traditionally metadata management platforms worked on storing and organizing just technical metadata — basic information about the organization's data. The advent of the modern data stack enabled the generation of many more kinds of metadata, often in real-time.
Metadata can be qualified as truly "active" if metadata management systems can automate ubiquitous finding, inventorying, and using all the different kinds of metadata — to create that single source of truth — across multiple capabilities and domains.
[Download ebook] What is Active Metadata and Why Does it Matter?
What are the benefits of active metadata?
- Improved context of metadata powering data discovery
- Auto-generated data quality and lineage impact analysis
- Auto-classification of sensitive data enabling easy governance and compliance
- Making embedded collaboration possible
- Orchestration of metadata across platforms
Improved context of metadata powering data discovery
Active metadata ensures improved context to data and more importantly often without human intervention. It lets you correlate business terms with any data object — tables or columns, saved SQL snippets, and reproducible queries. This is especially helpful in a system of diverse data producers and consumers, where each team may have different logic of labeling and defining data.
Active metadata thus helps generate a common understanding of the data and how to use it.
Auto-generated data quality & auto-constructed data lineage
Active metadata allows auto-profiling of data to identify missing values, outliers, and other data anomalies — thus giving users the chance to proactively act on bad data before it impacts applications.
Systems utilizing active metadata also have the ability to reveal how data has evolved (or is evolving) through the life cycle. This helps foresee assets that will be impacted upstream or downstream in case of any change going forward.
Auto-classification of sensitive data enabling easy governance and compliance
True data democratization is possible when data users across the organization have visibility of all existing data. But that in no way means compromising on sensitive information. Active metadata-powered platforms can be trained to auto-classify sensitive information and mask them — only to be revealed to users with authorized access.
Such systems thus enable automatic regulation compliances and help customize granular access policies across unique governance strategies.
Making embedded collaboration possible
Data teams are composed of data scientists, data engineers, data analysts, product managers, marketing professionals, citizen analysts, and more. Each of these groups has different preferences when it comes to the tools they use and the ecosystems they inhabit.
Embedded collaboration is about work happening where you are with the least amount of friction, and active metadata seems to make that possible. It enables all data producers and consumers to operate within their systems of choice while continuing to access and work on data.
Orchestration of metadata across platforms
The modern data stack is ever-evolving and complex. The data lives across multiple tools and systems in very different formats. Active metadata can help create a federated metadata management system that makes these different tools and systems talk to each other, thus making data assets across these systems interoperable.
A Guide to Building a Business Case for a Data Catalog
How is active metadata transforming the modern data stack?
Very recently, the famous Magic Quadrant for Metadata Management Solutions was scrapped by Gartner, and they replaced it with a Market Guide for Active Metadata. This is being seen as a transformational leap towards a new approach to metadata — with active metadata right in the driver's seat.
"The stand-alone metadata management platform will be refocused from augmented data catalogs to a metadata 'anywhere' orchestration platform", predicts Gartner in this latest market guide — and this is perhaps what can be defined as the step forward to finally breaking down data silos and creating that single source of truth across all data assets in an organization.
The major problem that this new approach is trying to solve is ensuring that metadata management platforms catch up with the speed and types of metadata that are being generated and captured. It also particularly recognizes the increase in the value of metadata, once it stops living "passively" across platforms and changes to "actively" moving between platforms.
Active metadata will prompt metadata management platforms to evolve to the following capabilities:
- Ability to import and export metadata, workflows, and other optimization strategies
- Use of machine learning to recommend job flows, resource allocation, etc.
- Metadata analysis across platforms
What does an active metadata management platform look like?
A great way to activate metadata is by implementing an active metadata platform.
These four characteristics define an active metadata platform:
- They are always on.
- They don't just collect metadata, they create intelligence.
- They don't stop at intelligence; they drive action.
- They are API-driven; enabling embedded collaboration.
5 keys components of an active metadata platform
- The metadata lake: A unified repository to store all kinds of metadata, in raw and processed forms, built on open APIs and powered by a knowledge graph.
- Programmable-intelligence bots: A framework that allows teams to create customizable ML or data science algorithms to drive intelligence.
- Embedded collaboration plugins: A set of integrations, unified by the common metadata layer, that seamlessly integrates data tools with each data team’s daily workflow.
- Data process automation: An easy way to build, deploy, and manage workflow automation bots that will emulate human decision-making processes to manage a data ecosystem.
- Reverse metadata: Orchestration to make relevant metadata available to the end-user, wherever and whenever they need it, rather than in a standalone catalog.
Understand the anatomy of active metadata platforms in detail here.
How can you get started with active metadata management?
Getting started with active metadata also automatically gets you started towards building a more forward-looking stack. Whether you are trying to compose a data fabric, data mesh, or trying to democratize data across all teams in the organization, the first thing you need to do is to choose metadata management tools that have the ability to use and exchange active metadata.
This helps minimize some of the fundamental problems that currently plague data teams such as:
- I need this data to make this decision. Is it available in this organization?
- What does this column even mean?
- Something about this table doesn't look right. How can I confirm?
- Who owns this data, and where has it been used? Is it ok if I change it?
Are you also keen to solve these problems for your team? Want to chat about the endless possibilities of active metadata and explore how Atlan can help get you started on active metadata management?
- What is metadata? — A gentle Introduction to unlocking the value of your data assets.
- Metadata management 101: Benefits, tools, best practices, and the future.
- What is metadata management: Its evolution, types, architecture, and benefits.
- Top 6 best practices for effective enterprise metadata management
- Data Catalog 3.0: A modern metadata platform, one that is just as fast, flexible, and scalable as the rest of the modern data stack.
- Enterprise metadata management and its importance in the modern data stack