What is Active Metadata? Your 101 Guide
Share this article
Active metadata helps you continuously access and process all kinds of metadata to understand your data better, regardless of the tools.
With active metadata management, you get an always-on, intelligent, API-driven, and action-oriented system that powers use cases from cost optimization and quality control to data security.
Gartner predicts that through 2024, organizations that adopt active metadata capabilities can decrease the time-to-delivery of new data assets to users by as much as 70%. So, let’s understand active metadata, its characteristics, use cases, and management.
Table of Contents
- What is active metadata?
- The 4 characteristics of active metadata
- Active metadata example
- Active vs. passive metadata: What’s the difference?
- 14 active metadata use cases
- What is active metadata management?
- What does an active metadata management platform look like?
What is active metadata?
Active metadata is a way of managing metadata. It leverages open APIs to connect all the tools in your data stack and ferry metadata back and forth in a two-way stream.
This is what allows active metadata to bring context, say, from Snowflake into Looker, Looker into Slack, Slack into Jira, and Jira back into Snowflake.
The 4 characteristics of active metadata
There are four fundamental characteristics of active metadata:
- Active metadata is always on
- Active metadata is intelligent
- Active metadata is action-oriented
- Active metadata is open by default
#1- Active metadata is always on
Active metadata is always on. This means having the ability to automatically and continually collect metadata from various sources and steps of data flow — logs, query history, usage statistics, and more.
#2- Active metadata is intelligent
Active metadata isn’t just about collecting metadata. It’s about constantly processing metadata to connect the dots and create intelligence from it. This means that with active metadata, the system will only get smarter over time as people use it more and it observes more metadata.
So, you can auto-classify sensitive data, use automatic suggestions to document a data asset’s description, send alerts about critical issues, and more.
#3- Active metadata is action-oriented
Active metadata doesn’t just stop at intelligence. It should drive action by:
- Curating recommendations
- Generating alerts
- Making it easier for people to make decisions
- Automatically making decisions without human intervention, like stopping downstream pipelines when data quality issues are detected
#4- Active metadata is open by default
Active metadata platforms use APIs to hook into every piece of the modern data stack. This makes magical user experiences possible by saving data practitioners from the endless tool- and context-switching.
This is called embedded collaboration, which is when work happens where you are with the least amount of effort.
How active metadata works: an example
Let’s take Spotify as an example to understand how active metadata works.
When you open Spotify, the platform’s algorithm analyzes various types of metadata associated with each song, such as the genre, mood, and tempo to automatically suggest similar songs or artists that you might enjoy. This happens as soon as you listen to a song or play one of your playlists.
Spotify also analyzes the metadata associated with each song or album, such as the artist, release date, and popularity, to auto-classify music into playlists such as “Discover Weekly,” “Release Radar,” and “Daily Mix” that are tailored to each user’s taste. These playlists are constantly updated depending on your listening history and tastes.
So, Spotify can create personalized playlists, categorize its music library, and provide intelligent recommendations, all thanks to active metadata.
Active vs. passive metadata: What’s the difference?
Both active and passive metadata refers to how we aggregate, store, and use metadata.
The main difference between active and passive metadata is that passive metadata is the standard way of collecting technical metadata — schemas, data types, models, etc. Meanwhile, active metadata is a way of making metadata flow dynamically across the entire data stack.
This enables bidirectional data flow, embedding enriched context and information in every tool in the data stack. So, active metadata goes beyond technical metadata to include operational, business, and social metadata.
Here’s how Prukalpa Sankar, co-founder at Atlan, highlights the difference between active vs. passive metadata:
“Think of passive metadata as putting out information on a personal blog. Every so often, it could get picked up and go viral, but most of the time, it’s just going to sit unseen and unused. Think of active metadata as a viral story. It shows up everywhere you already live in what seems like seconds. It’s immediately cross-checked against and combined with other information, bringing together a network of related context into a larger trend or story. And it sparks conversations, making everyone more knowledgeable and informed in the end.”
14 active metadata use cases
While there are numerous active metadata use cases, here’s a list of the top 14 enterprise use cases to get you started:
- Optimize data stack spending with dynamic pipeline optimization. Collect runtime metrics from data processing engines and usage metrics from BI tools. Monitor peak access times, identify the most clunky processes, and track assets that get updated the most/least.
- Purge stale or unused assets. Track the popularity of each data asset with usage metadata to know when it was last used or how many people used it. If the asset hasn’t been updated or used in months, then it’s stale or redundant and must be purged.
- Reduce the time spent on the root cause and impact analysis. Active metadata can automate lineage — tracking data flow across the data universe — and reduce the analysis time to mere minutes.
- Instill more trust in your data. Send real-time alerts and announcements about the status of each data asset so that data users are always in the loop.
- Manage security classifications. Active metadata gives you the ability to propagate CIA (confidentiality, integrity, availability) ratings automatically via column-level lineage in real time. This is essential to comply with regulations like GDPR or CCPA.
- Raise security alerts programmatically. Automatically send real-time alerts and announcements about change events to the data security team via channels like Slack or Jira.
- Archive data programmatically. Set up automatic workflows to crawl data, notify the right stakeholders as soon as that data is available, keep track of its storage period, and archive it at the right time to avoid any compliance breaches.
- Generate periodic data security and compliance reports. Understand each data asset fully with 360-degree profiles and explore its relationships with other assets across the data universe via lineage.
- Set up and regulate data access. Define access control policies using contextual metadata — classifications, business glossary, etc. and link it to the relevant data assets and their fields. Set up tag-based or attribute-based access control and propagate it automatically across assets via column-level lineage.
- Streamline analyst service requests. View each data asset – metrics, queries, data sets, etc. – as a product and build Github-like repositories for each of these products. Then share those profiles with just a link.
- Streamline data/analytics engineer service requests. Build engineering context by leveraging active metadata so that all users can look up information such as last run status in Airflow or freshness of dbt tags whenever they want.
- Speed up the onboarding of your data team. 360-degree asset profiles offer context about each asset’s origins, ownership, upstream and downstream workflows, quality, freshness, and more. You can trace its full lineage all the way to its source.
- Write better SQL queries. Keep track of SQL queries run for each data asset. From the asset’s profile, you can easily see definitions, recent joins, metrics, and any issues/warnings involved.
- Enrich user experience with BI tools. Bring context into dashboards. Relevant metadata (business terms, descriptions, owners, and lineage) can be pushed into the BI tool.
What is active metadata management?
Active metadata management is a system for transforming analytical outcomes into operational alerts and suggestions. It detects patterns in data operations, ultimately leading to AI-assisted adjustments of data and corresponding operations.
An active metadata management platform enables the two-way movement of metadata by analyzing all types of metadata from various data sources and then sending enriched metadata back into different tools in the tech stack.
Here’s how Prukalpa Sankar, co-founder at Atlan, envisions a world with active metadata management:
“Imagine a world where data catalogs aren’t standalone tools. Instead, a user can get all the context where they need it — either in the BI tool of their choice or whatever tool they’re already in, whether that’s Slack, Jira, the query editor, or the data warehouse. It’s like reverse ETL, but for metadata.”
Active metadata management and the modern data stack
Metadata is being generated with incredible speed and variety – and metadata management platforms have struggled to keep up. Gartner even scrapped their famous Magic Quadrant for Metadata Management Solutions and replaced it with a Market Guide for Active Metadata.
The next generation of platforms will need to shift from augmented data catalogs to a metadata ‘anywhere’ orchestration platform.
What does that mean for the modern data stack? With an always-on, active, and intelligent way of handling metadata, the entire modern data stack will support a bidirectional flow of metadata. This will require an active metadata management platform that:
- Imports and exports metadata, workflows, and other optimization strategies
- Uses machine learning to recommend job flows, resource allocation, etc.
- Analyzes metadata across platforms
Let’s further explore the anatomy of such a platform.
What does an active metadata management platform look like?
An active metadata management platform is always on, intelligent, API-driven, and action-oriented, enabling a bidirectional flow of metadata to ensure embedded collaboration. They’re no longer passive solutions that solve the “too many tools” problem by adding yet another tool — expensive shelfware.
To my many friends/followers doing metadata/catalog startups, I have a request: please integrate the metadata info with my BI tool so that I can see it *while I am doing queries.*— @spite.vc on bluesky (@josh_wills) April 29, 2022
I have no desire to *ever* visit a third website to just "browse the metadata."
Core components of an active metadata management platform
An active metadata management platform sends metadata back into every tool in the data stack. The five core components of such a platform are:
- The metadata lake: A unified repository to store all kinds of metadata, in raw and processed forms, built on open APIs and powered by a knowledge graph
- Programmable-intelligence bots: A framework that allows teams to create customizable ML or data science algorithms to drive intelligence
- Embedded collaboration plugins: A set of integrations, unified by the common metadata layer, that seamlessly integrates data tools with each data team’s daily workflow
- Data process automation: An easy way to build, deploy, and manage workflow automation bots that will emulate human decision-making processes to manage a data ecosystem
- Reverse metadata: Orchestration to make relevant metadata available to the end-user, wherever and whenever they need it, rather than in a standalone catalog
So, what’s next? Here’s the first step to getting started with active metadata management
Getting started with active metadata means starting your journey toward building a more forward-looking stack. The first step is to identify your use cases, and then pick the right tools.
Whether you are trying to compose a data fabric, data mesh, or trying to democratize data across all teams in the organization, the first thing you need to do is to choose metadata management tools that have the ability to use and exchange active metadata.
Learn more → Secrets of a Modern Data Leader
The right active metadata management platform will help you, as Forrester puts it, address the diversity, granularity, and dynamic nature of data and metadata and weave end-to-end visibility into your modern data stack.
With bidirectional communication, collaboration, and data flows, you can set up a living, intelligent, action-oriented data ecosystem that helps you optimize costs, enhance data security, ensure regulatory compliance, and improve data team productivity.
Share this article