Active Data Governance: What It Is and How to Get Started
Share this article
According to Gartner, bad data can cost organizations up to $15 million annually. But how do you even know you have bad data if you’re not looking for it? That’s where a proactive approach to data governance can make a difference.
In this article, we’ll outline the importance of active data governance, how it differs from passive data governance, and how you can take concrete steps to shift from a passive to an active approach starting today.
Want to make data governance a business priority? We can help you craft a plan that’s too good to ignore! 👉 Talk to us
Table of contents
- What is active data governance?
- The need for active data governance
- Benefits of adopting active data governance
- 5 key features of an active data governance platform
- Examples of active data governance in action
- How to get started with active data governance
- Bottom line
- Related Reads
What is active data governance?
Active data governance is a scalable way of securing your data, upholding its privacy, maintaining its integrity, and promoting data democratization.
With active data governance, you can ensure data enablement and encourage collective responsibility for your data assets.
Explaining active data governance with an analogy
The evolution of automobile safety is a great analogy to explain active data governance.
Over recent decades, car manufacturers have developed safety features—airbags, seat belts, anti-lock braking systems (ABS), tire pressure monitors, etc. While these advancements mitigate damages from accidents, they don’t entirely eliminate risk.
Passive data governance is a reactive approach similar to the evolution of car safety features over time.
What if you could avoid the accident in the first place?
According to Mo Gawdat, former chief business officer at Google X, the vast majority of road accidents are caused by human error. Self-driving cars (in theory) remove the human element from the equation — the root cause of most road accidents.
In the realm of data, active data governance operates akin to self-driving cars, automating processes and facilitating scalable governance. This shift diminishes the risk of unauthorized access and non-compliance with data privacy regulations.
The origins of active data governance
The surge in data volume and diversity, coupled with compliance mandates, the rise of data-driven decision-making, and the evolution of the modern data stack, has sparked conversations on rethinking data governance.
According to Prukalpa Sankar, co-founder at Atlan, data teams have become mainstream and the modern data stack has evolved to simplify data ingestion and transformation. This, in turn, has brought us to a moment of redemption for data governance:
“For the first time, the need for governance is being felt bottom-up by practitioners, instead of being enforced top-down due to regulation. Let’s finally rebrand data governance and give it the rightful place and respect it deserves in our stacks.”
Meanwhile, Forrester coined data governance 2.0, an agile approach mirroring the earlier definition of active data governance:
“Data governance 2.0 is an agile approach to data governance focused on just enough controls for managing risk, which enables broader and more insightful use of data required by the evolving needs of an expanding business ecosystem.”
Meanwhile, Gartner advocates for an adaptive approach to data governance that’s flexible, rather than being a “one-size-fits-all,’ command-and-control-based IT governance capability.
What does active data governance do?
Active data governance uses automation to monitor, document, and manage your data estate continuously.
For example, active data governance can help spot potential compliance and data quality issues proactively. For every issue detected, active data governance would involve taking a remediation action - such as issuing an alert or applying an automatic corrective procedure.
Active vs. passive data governance: What’s the difference?
In passive governance, business users, engineers, and data stewards will likely not detect a problem until something breaks. Someone will only file a bug (manually) after they notice the downstream impact.
In active data governance, that process is proactive. It’s embedded into every workflow, makes sure that your assets are continuously monitored, and notifies the right people whenever there’s an issue.
For example, let’s assume that someone has changed the value of a field in a database table so that it now has a different data type or has possible null values.
In a passive data governance approach, a business analyst might not notice this change until one of their reports fails to render right before a major presentation.
In an active data governance approach, a background job could detect this change as soon as data enters the system and send an alert to both upstream producers and downstream consumers about the potential impact.
The need for active data governance
Active data governance is a powerful concept that organizations can use to shift data governance to the left. A shift-left approach to data governance identifies issues early in the data lifecycle before they lead to work stoppages and business losses.
This approach also puts the onus of accountability on those who create data assets. It makes data quality and compliance a priority while data assets are being created, modified, or updated.
Instead of being an afterthought, data governance becomes an integral part of the daily workflows of data practitioners. That means actively embedding data protection, security, and privacy into each process — moving data governance closer to data asset creation.
The benefits of adopting active data governance over a traditional, passive approach
Active data governance provides numerous benefits to businesses managing large volumes of data:
- Quick detection and resolution of data issues
- Proper classification, protection, and compliance with laws and regulations
- Minimization of the risk of data breaches, fines, and reputational damage
- Facilitation of cooperation between stakeholders
- Enablement of data-driven insights and business value
- Save money by implementing compliance measures
Let’s delve deeper into each benefit.
Quick detection and resolution of data issues
Take our example from above. With passive data governance, a data quality issue can likely go undetected until something breaks. Active data governance proactively looks for issues and implements a remediation strategy.
With active data governance, you can avoid many problems before users even realize there’s a problem. That increases user’s confidence in the company’s data.
Proper classification, protection, and compliance with laws and regulations
In a passive data governance approach, you may not know how much sensitive information exists in your system and whether or not it’s properly classified. You also won’t receive a notification if someone lowers the data classification of sensitive data.
With active data governance, you explicitly monitor how much of your data is tagged and develop automated scripts to identify and tag untagged data. You can also generate alerts if the system detects an anomalous classification change.
Also, read → Automated data governance
Minimization of the risk of data breaches, fines, and reputational damage
With passive data governance, you may not be aware if the data compromised in a data breach was sensitive customer data, such as Personally Identifiable Information (PII).
With active data governance, you can automatically identify all of the PII in your data estate and take additional steps to ensure it’s secured.
Facilitation of cooperation between stakeholders
In passive data governance, different roles in the data space - data engineers, analytics engineers, business analysts, product owners, data stewards, etc. - may have little to zero visibility into how other teams are doing.
With active data governance, data doesn’t live in siloes. Everyone can see what assets you own, and review business definitions, descriptions, classifications, and more to get the full context.
They can use data lineage to see exactly how data flows and changes throughout a company’s data estate.
However, only those with the right access permissions can further analyze and modify data for their daily workflows.
Enablement of data-driven insights and business value
Typically, over 50% of a company’s data is “dark data” - i.e., it’s never used. This wastes money in terms of storage and processing. It also makes information harder to find, as it creates unnecessary clutter.
In passive data governance, you may have little information about how much data is actually used in your system.
With active data governance, you can leverage active metadata to see exactly what data your users use and how frequently it’s updated. This enables you to focus on the active, usable data that drives business insights and value.
Save money by implementing compliance measures
According to Investopedia, it costs an average of $5 million to implement compliance. Given that the risk of non-compliance can be as high as $15.5 million, compliance is an essential investment.
However, the cost and complexity of compliance is rising year over year.
Active data governance can help lower compliance costs by leveraging automation to implement data governance processes that were previously tedious manual exercises.
For example, whenever an asset is tagged as PII, your access policies for sensitive data are automatically applied. You can also customize who gets to:
- Query or preview sensitive data
- Edit data definitions, metric formulas, and business taxonomies
- See or edit metadata about your assets
5 key features of an active data governance platform
So how would you implement active data governance? In other words, what features would you expect an active data governance platform to support?
Some of the essential features of modern active data governance platforms include:
- Personalized access controls
- Actionable, column-level data lineage
- Auto-documentation and intelligent suggestions
- Real-time alerting on potential problems
- Data quality resolution
Personalized access controls
An active data governance platform should enable access controls down to the columnar level to ensure fully granular security.
It should also use role-based access controls and permissions to offer personalized experiences to users based on how they use data. These can be personalized depending on your user personas, data domains, or projects.
For example, someone in the sales department would never see customer PII or data fields specifically meant for use by the data engineering team.
Another example is when you use a tag titled “Public Financial Reporting”, it curates all of your publicly reported assets, and automatically allows access to your Finance and Legal team.
Actionable data lineage
For instance, a business analyst could use data lineage to trace back an issue with one of their reports and raise an issue with the upstream producers, who can then fix the issue.
Actionable data lineage also supports policy propagation automatically—every asset derived from a PII source column inherits restricted policies, columns with similar names get the same descriptions, etc.
Auto-documentation and intelligent suggestions
According to IDC, the world will produce 175 zetabytes of data by 2025. That’s a lot of data - too much to document and track by hand.
An active data governance platform can assist by using AI technology to produce documentation and make intelligent recommendations. The platform can analyze documentation on similar fields, as well as system usage, to suggest initial drafts of business glossary definitions, README files, descriptions, and other key metadata assets.
Also, read → AI data governance
Real-time alerting on potential problems
Alerting is one of the best capabilities of an active data governance platform that comes in handy in detecting an issue requiring manual attention.
An active data governance platform enables configuring alerts for a range of conditions, such as:
- Data ownership changes
- Suspected data quality issues and anomalies
- Classification tagging changes
- Lack of classifications
- Missing metadata
- Data formatting changes
Data quality resolution
Active data governance platforms enable setting up automations that detect and - when possible - address issues of data quality. Combined with data lineage, such detection helps fix issues upstream in the originating systems and data pipelines.
An active data governance platform would also provide a range of metrics for tracking and monitoring data quality. Effective metrics include measures of data accuracy, completeness, consistency, validity, uniqueness, and timeliness.
Examples of active data governance in action
Active data governance can save both time and money. Take Tide, a UK-based financial services firm, which faced the challenge of how to apply sensitivity tagging to millions of unclassified assets.
Left to manual procedures, Tide would have spent 50 days implementing these changes. Using Atlan’s Playbooks to proactively detect and tag assets, the company reduced that task to a mere five hours.
At Elastic, the search engine company struggled with notifying users and engineers when data pipelines broke. Using Atlan as their data governance platform, Elastic leveraged data lineage to automatically notify both upstream teams and downstream consumers immediately when an error occurred.
This shift to active data governance has helped Elastic increase transparency and build more trust with stakeholders.
How to get started with active data governance
How do you make the move from passive data governance to active data governance? Here are a few key principles to follow:
- Start with a strong data governance framework
- Adopt a data catalog
- Start small
- Gradually shift your entire organization over
Start with a strong data governance framework
Active data governance requires a certain level of organizational maturity. Specifically, every organization tackling it should already have a solid data governance framework in place.
A data governance framework defines the standards, protocols, and processes that an organization uses for maintaining and safeguarding data.
Once those processes are in place, your company can translate them into a set of security policies, access controls, and automation — run and supported by an active data governance platform.
Adopt a data catalog
Some active data governance initiatives involve manual processes. But there’s no two ways about it: you can’t implement an active data governance program through manual procedures alone.
Automation is indispensable to making active data governance work. Furthermore, such automation needs to touch every asset in your data estate across the company.
A data catalog serves as the single source of truth for all data in an organization. Using strong active data governance tools like Atlan ensures that you can manage all of your data from a single, central location.
Data from McKinsey shows that, when organizations undertake large-scale digital transformations, they fail 70 percent of the time.
Instead of aiming to make the shift to active data governance a colossal, one-shot change, start with small wins.
Onboard one team and learn what you can from the experience. Evangelize your efforts and make sure you have adequate training for your data catalog before moving on to the next team. Set realistic metrics to measure data catalog adoption and use them to drive your strategy.
Gradually shift your entire organization over
After you find success with a single team, onboard the next team - and then the next one. With every iteration of your active data governance initiative, re-evaluate your plan and see where you need to make changes or expand your program’s scope.
For example, when Brainly began its shift to active data governance, it implemented Atlan as its data catalog. That brought all of its data assets into one location so that employees could find and manage it. With the basics in place, it then focused on actively improving various aspects of its data governance, including documenting assets and establishing clear lines of ownership.
Active data governance can lower compliance costs as well as catch costly data mistakes and compliance issues before they happen. By using active data governance to shift compliance left, you can increase your overall data quality - and boost your user’s trust in your data.
Active Data Governance: Related Reads
- What is Data Governance? It’s Importance, Principles & How to Get Started?
- Enterprise Data Governance — Basics, Strategy, Key Challenges, Benefits & Best Practices
- Automated Data Governance: How Does It Help You Manage Access, Security & More at Scale?
- 6 commonly referenced data governance frameworks in 2023
- 8 best practices for a robust data governance program
- Data governance roles and responsibilities: The complete list
- 5 Popular Data Governance Certification & Training in 2023
- 8 Best Data Governance Books Every Data Practitioner Should Read in 2023
- Snowflake Data Governance — Features, Frameworks & Best Practices
- What is Active Metadata? — Definition, Characteristics, Example & Use Cases
Share this article