What is Data Governance? Its Importance, Principles & How to Get Started?

Updated August 24th, 2023
header image

Share this article

80% of organizations aiming to scale their digital business will fail as they do not take a modern approach to data governance.

Traditionally, data governance was synonymous with centralized control, rules, and policies — something conjured up by middle managers to add friction to data scientists’ lives.

Want to make data governance a business priority? We can help you craft a plan that’s too good to ignore! 👉 Talk to us

Top 10 barriers to achieving data governance objectives

Top 10 barriers to achieving data governance objectives. Source: Gartner.

That’s why modern data governance must be community-led, centered on collaboration and data democratization, with privacy at its core. Everyone in an organization should be able to access, understand, and use the right data to unearth valuable business insights.

Here we will cover the principles behind such an approach to governance, followed by an action plan to implement it and data governance tools that can help.

Table of Contents

  1. What is data governance?
  2. Traditional vs. modern data governance
  3. Importance of data governance
  4. Principles driving modern data governance
  5. How to get started with a personalized, bottom-up approach to data governance?
  6. Implementing data governance
  7. Popular data governance tools
  8. Bottom line
  9. Related reads on data governance

What is data governance?

Gartner defines data governance as a way to “specify decision rights and accountability to ensure appropriate behavior as organizations seek to value, create, consume, and control their data, analytics, and information assets.”

Most data governance definitions focus on establishing organization-wide policies on data use. Traditionally, data governance was all about controlling data for compliance purposes. So, the measures were associated with bureaucracy and red tape — a way of slowing things down.

Besides a top-down approach, traditional data governance models also involved appointing a data steward responsible for enforcing data governance policies and standards.

Such an approach isn’t scalable.

Traditional vs. modern data governance: What’s the difference?

The first step is to change the perception of data governance — it isn’t about control, red tape, and bottlenecks. Instead, data governance must help you ensure that your data is trustworthy, useful, and easily available.

That’s why we define data governance as a personalized, community-centered approach toward data enablement.

To implement governance at scale, you must also understand that:

  1. The goal of data governance isn’t merely regulatory compliance
  2. The purpose of governance is to increase the value of data
  3. Everyone — not just data stewards or data governance steering committees — is responsible for data governance
Traditional approach to data governanceModern data governance
A centralized, top-down approach focused on controlling data accessA decentralized, community-centered approach focused on data enablement
Requires enforcers — data stewards and steering committeesIs self-governed as everyone is responsible and held accountable
Involves manual processes in granting approvals, tagging and classifying data, and moreAutomates as many processes as possible to eliminate human error and achieve data governance at scale

Tradtional and modern data governance: What is the difference?

Tradtional and modern data governance: What is the difference? Source: Gartner.

Read more: To explore the primary objectives of modern data governance, check out this article that discusses novel ways to approach them.

Now let’s look at the benefits of implementing data governance at scale.

Importance of data governance for modern data teams

Good governance can maximize the value of data assets and help your data teams work better. The core benefits of modern data governance are:

  • Effective metadata management leads to better data security and accuracy
  • Better productivity and faster data discovery by eliminating time spent on non-value-added tasks
  • Lower risks and costs due to poor data management and manual processes

Let’s explore each aspect further.

1. Effective metadata management leading to better data security

Metadata management requires a reliable single source of truth and enterprise-scale visibility into all data assets.

Data governance ensures that data is gathered, organized, and used appropriately, without redundancies. The goal is to know what data you have, where it came from, how it has changed, and how you can use it.

It all starts with metadata. For instance, LinkedIn implemented data governance at scale by:

  • Thoroughly documenting metadata to track data definition and evolution
  • Tracing the changes to metadata using a data catalog i.e., (DataHub), so that it’s always accurate, relevant, and updated
  • Making each data domain owner a data steward, so that they’re responsible for documenting the changes to metadata that they create, manage, or use

As a result, the company has improved data security and is well-equipped to tackle compliance and anti-abuse concerns too.

Robust metadata management is the key to better data governance

Robust metadata management is the key to better data governance. Source: Atlan.

2. Better productivity and faster data discovery by eliminating time spent on non-value-added tasks

According to Mike Loukides from O’Reilly Media, Inc:

Data governance and data discovery go together. You can’t use your data if you can’t find it. You can’t use your data if you don’t even know what data you have.”

Without good governance, your teams end up spending 30% of their time on non-value-added tasks such as data sourcing, processing, cleanup, and manual reporting. This affects their productivity substantially.

On the other hand, easy access to the right data helps data teams discover opportunities and insights faster, without wasting any resources.

For instance, Georgia-Pacific suffered from:

  • Siloed teams: Each team was responsible for its data, making cross-team data exchange and collaboration a challenge.
  • Lack of a structured data governance program: There was no structure in place to identify and define data assets and attributes across all products and make them available to the right people.

So, their teams couldn’t put together the information required to analyze buyer trends.

The company tackled the problem by adopting a company-wide data governance program that was process-centric, rather than function-centric. This got rid of their data silos and enabled seamless interoperability across different divisions and systems.

As a result, the company could launch digital transformation initiatives successfully across all business units.

Faster data discovery: Key building block of modern data governance

Faster data discovery: Key building block of modern data governance. Source: Atlan.

3. Lower risks and costs due to poor data management and manual processes

According to McKinsey’s 2019 Global Data Transformation Survey, data governance was one of the three reasons for leading firms to have “eliminated millions in cost from their data ecosystems and enabled digital and analytics use cases.”

Moreover, the companies underinvesting in governance — think unsustainable, manual processes and temporary controls — also exposed themselves to regulatory risk.

In 2020, a Citibank employee made an error while manually adjusting a Revlon loan, leading to the $900 million loan being paid in full instead of the usual monthly interest payment.

Besides dealing with creditors who refused to return the money, the bank was fined $400 million by the Office of the Comptroller of the Currency (OCC) for improper data governance measures. These included persistent issues in risk management and internal controls.

The OCC also demanded a “re-engineering of processes … maximize’s straight-through processing and minimize’s manual inputting and adjustments,” which refers to the Revlon fiasco.

So, what would a modern data governance solution look like? Let’s start by understanding the underlying principles of an effective governance platform for the modern data stack.

Read more: To further explore the benefits of data governance, check out this article on how good governance can help build great data teams.

Data Catalog 3.0: The Modern Data Stack, Active Metadata, and DataOps

Download ebook

Principles driving modern data governance

Four principles are central to ensuring effective data governance for the modern data stack:

  1. Governance is for data and analytics
  2. Governance is personalized
  3. Governance is community-led
  4. Governance is a part of your daily workflow

1. Governance is for data and analytics

As Gartner highlights in its definition, data governance must extend to data, analytics, and information assets.

While it’s important to ensure data sharing is easy and well-managed, it’s vital to ensure the same applies to analytics. That’s because data assets go beyond raw data - to dashboards, and models to include analytics. Analytics also has real, measurable value, and, as such, must be similarly governed.

2. Governance is personalized

As mentioned earlier, your governance policies should be built around your data team’s projects or use cases, and not the other way around.

For instance, your finance team is allowed to automatically access all public financial KPIs if you define policies as per the finance team’s projects and curate financial data assets under one roof.

Similarly, you can create persona-based policies by displaying custom metadata relevant to each user. This requires understanding the various data domains, projects, and user roles thoroughly and setting up rules that support such personalized experiences.

Atlan's "Personas" are a way to control access to users who belong to a group/domain

Atlan's "Personas" are a way to control access to users who belong to a group/domain. Source: Atlan.

3. Governance is community-led

Across an organization, different teams will have different relationships with data. So, even though product and sales teams refer to the same data sets, they use them differently.

The product team could be using customer survey results to decide which product features to build, whereas the marketing team could be using that data to decide which keywords to target.

If both teams are to access and use the data effectively, they must be involved in defining the policies around that data — how it gets categorized and what rules govern its usage. This requires a collaborative approach to governance where the processes, policies, and roles that make up data governance are crowd-sourced.

Such an approach also ensures that your people truly adopt the governance structure you build.

Modern data governance tools must foster collaboration

Modern data governance tools must foster collaboration. Source: Atlan.

4. Governance is a part of your daily workflow

If governance is to be integral to your organization’s daily workflow, it cannot be an extra step in your operations.

For instance, everyone should be able to know what data exists within the organization and look up business definitions, descriptions, classifications, and more. This ensures that while request processes exist for gaining access, nobody’s barred from seeing what data is available.

Let’s revisit the previous example of product and sales teams. If someone from sales comes across a dataset on product usage that doesn’t have a description or an owner, they can make a request or offer a suggestion to the product team (i.e., the domain owner) via Slack. The product team can review the request and then decide to approve or reject it.

Effective data governance enables users to find and understand data right with in the tools they use everyday

Effective data governance enables users to find and understand data right with in the tools they use everyday . Source: Atlan.

Read more: Delve further into the principles of data governance by exploring this article.

How can you get started with a personalized, bottom-up approach to data governance?

Implementing a modern data governance program means embracing the principles mentioned above and choosing solutions that mirror them. Here’s a checklist to get you started:

  1. Map the various personas, domains, projects, and data and analytics assets
  2. Understand and chart out the data that each persona or project must have access to
  3. Establish data classifications — PII, Confidential, or Public
  4. Set up programmable bots to auto-identify sensitive PII, HIPAA, and GDPR data
  5. Also, define masking and hashing policies for sensitive data
  6. Automate policy propagation — for e.g. any asset tagged as PII gets its access policies updated immediately so that data never falls into the wrong hands
  7. Define permission levels for data assets — who can view data, who are allowed to edit metadata, who can query and preview sensitive data, and so on

Implementing data governance: Capabilities and Tooling

While there are several data governance tools available, most are built by engineers, for engineers, thereby involving a steep learning and setup curve. In other cases, the tool capabilities aren’t clear.

So, we’ve put together a list of capabilities that modern data governance tools must have, followed by a few recommendations.

What capabilities should you look for in data governance tools?

A good modern data governance platform should support:

  • Auto-tagging and classification
  • Role-based and policy-based access controls (i.e., RBAC and ABAC)
  • End-to-end column-level data lineage mapping
  • Auto-policy propagation via lineage hierarchy
  • Simple data sharing and collaboration
  • Rich context via READMEs, business glossary, descriptions, and more

1. Atlan

Atlan is an active data governance platform that offers a scalable way to secure your data while ensuring data democratization.

Atlan lets you customize your data governance policies according to data team personas, projects, or domains, i.e., RBAC (role-based access controls) and ABAC (attribute-based access control).

With Atlan, you can build custom bots to auto-classify sensitive data and propagate policies automatically via column-level lineage mapping.

Lastly, you can stop worrying about losing tribal knowledge with Requests, where anyone with view access to data can offer suggestions that add context to data.

Atlan’s open API architecture allows you to integrate it with popular modern data stack tools such as Databricks, Snowflake, and S3, making it easier for you to set up an interoperable data ecosystem.

Check out all the supported sources Atlan has to offer.

Auto-classify all sensitive PII assets in your data ecosystem

Auto-classify all sensitive PII assets in your data ecosystem. Source: Atlan.

A demo of Atlan data governance capabilities

2. Apache Atlas

Apache Atlas is an open-source data governance and metadata framework for Hadoop.

With Apache Atlas, you can create classifications for data sensitivity, expiry, and quality. Atlas also offers true data lineage, allowing you to propagate metadata properties to entities down the lineage hierarchy.

Atlas is also equipped with data privacy and security features for defining granular access permissions and data masking policies.

The downside is that Atlas was originally built for the Hadoop ecosystem, making it harder to develop and use.

en source data governance tool

Open source data governance tool: Apache Atlas. Source: Apache Atlas.

3. DataHub

LinkedIn built (and later open-sourced) DataHub primarily for metadata search and discovery. DataHub lets you have fine-grained access control of the metadata. For instance, you can define who gets to access and work on metadata. It also supports classifications and tagging for better governance and privacy.

The DataHub roadmap includes RBAC (role-based access control) for entities and aspects for fine-grained access controls.

Open source data governance tool: Linkedin DataHub

Open source data governance tool: Linkedin DataHub. Source: Linkedin DataHub.

Read more: Check out a comparison guide for the most popular open-source data governance tools available.

Data governance: Bottom line

As the modern data stack has evolved, the diversity of data, its consumers, and the technologies being used has also expanded. We’re in an era where new data and analytics use cases crop up every day.

In this environment, building a rigid set of policies that define data and its use in consultation with a handful of middle managers won’t cut it. The reason most data governance programs fail is because of a one-dimensional approach focusing on control, restrictions, and bureaucratic processes.

That’s why the perception toward data governance must change. Data governance programs must be seen as a way to extract value from data. They should be approached holistically and involve everyone in the process.

If you are evaluating and looking to deploy best-in-class data access governance for your data ecosystem without compromising on data democratization? Do give Atlan a spin.

Share this article

[Website env: production]