7 Best Practices for Data Governance

Updated August 22nd, 2023
header image

Share this article

What are data governance best practices?

Data governance best practices are a set of guidelines that successful data teams use to scale their data governance efforts effectively.

You can think of them as guard rails and policies that help you answer questions, such as:

  • What data does your organization have?
  • Where does this data live?
  • Where and how does it flow through your organization?
  • How do people use it? What reports and metrics are they generating from it?
  • How does someone access this data?
  • Who owns the data?
  • Who defines, modifies, and uses the data?
  • Can employees share this data?

Table of contents

  1. What are data governance best practices?
  2. Best practices for data governance
  3. 1. Lead with your “why”
  4. 2. Adopt a “data product” mindset
  5. 3. Embed collaboration in daily workflows
  6. 4. Automate wherever possible
  7. 5. Ensure data enablement with DataOps
  8. 6. Invest in the right technology
  9. 7. Keep changing and adapting your outlook on data governance
  10. Why should you follow these data governance best practices?
  11. Next steps
  12. Related reads

Establishing a data governance framework

Before you implement data governance best practices, you need a data governance framework.

A data governance framework defines standards, guidelines, protocols, and processes for managing data securely and effectively. It provides the how - a blueprint for enforcing governance.

The three pillars of a successful data governance framework are:

  • Governance encompasses all data assets. All dashboards, reports, code, data science models, etc. are data assets. A successful data governance program encompasses everything within the data and analytics spaces.
  • Data governance is a practitioner-led, bottom-up practice. In the past, companies entrusted a handful of people, such as data stewards, with safeguarding data. That approach doesn’t scale. Companies need to empower every creator to take responsibility for governance and compliance.
  • Every team embeds data governance practices within their daily workflows. No one will use a data governance framework if it’s tacked onto their jobs as an afterthought. Embedding it directly in everyone’s daily data workflows ensures everyone has access to accurate, relevant, high-quality, and trustworthy data.

Having a clear data governance framework in place gives you greater visibility into your data efforts and into your existing data estate. It also establishes transparency in data handling, establishes monitoring protocols for data usage, and lays a strong foundation for regulatory compliance.

Best practices for data governance

How you implement your data governance framework will depend on your organization and its specific needs. However, the following principles represent best practices of data governance that apply to almost all companies:

  1. Lead with your “why”
  2. Adopt a “data product” mindset
  3. Embed collaboration in daily workflows
  4. Automate wherever possible
  5. Ensure data enablement with DataOps
  6. Invest in the right technology
  7. Keep changing and adapting your outlook on data governance

1. Lead with your “why”

The need for an overarching purpose

Most data governance frameworks start with a why — a goal, a corporate driver, or a strategic layer for governance strategy and vision. The “why” defines how your actions will deliver value and align with your organization’s business objectives.

Having an overarching purpose also helps the people in your organization develop a sense of purpose and engagement. According to Kathryn Minshew, co-founder, and CEO of the popular career advice site The Muse:

“Younger employees want to believe in the value of their work. They expect to be heard and are less likely to follow orders without context.”

How does creating and communicating the “why” help your teams?

Another reason for starting with your “why” and involving your people in the process is the way data governance itself has evolved over time.

In another data governance article, we highlight how modern data governance can’t be a top-down directive. Instead, it should be a decentralized, community-led initiative. In such an environment, data governance becomes a collective, shared responsibility of everyone in your organization.

So it’s crucial that employees understand the purpose behind your data governance program, policies, and standards. You can start by asking your teams how they visualize your organization’s data culture in the next 12-18 months.

2. Adopt a “data product” mindset

Another best practice for data governance is shifting from a data service mindset to a “data product” mindset.

A data product is anything that extracts value from data and helps you generate meaningful insights. It can be raw data, warehouses, KPI dashboards, domain data, algorithms, and more.

DJ Patil, who was formerly the Chief Data Scientist at the US Office of Science and Technology Policy, adds further context to the term:

When you think about data products more broadly, you start to realize that even the dashboards inside your company count. Suddenly your horizons are open, and you can start creating processes that allow you to understand, make and sell things at scale.

Applying product thinking to data can help you generate meaning from data at scale. Unlike a service, you build a product once and many customers can use it to solve a problem. You can update the product and optimize the value your customers receive. However, the basic premise remains unchanged.

Here’s how Prukalpa Sankar, co-founder at Atlan, highlights the impact of product thinking on data teams:

A product isn’t measured on how many features it has or how quickly engineers can quash bugs — it’s measured on how well it meets customers’ needs. Similarly, data product teams should be centered on the users (i.e. data consumers throughout the company), rather than questions answered or dashboards built. This allows data teams to focus on experience, adoption, and reusability, rather than ad-hoc questions or requests.

Read more → How to apply product thinking to data

How can you apply a data product mindset to data governance?

Identify each data domain as a data product, and appoint domain data owners, i.e., data product owners to govern the data they create. When you put the onus of managing data on the ones who create it, you simplify data accountability and trust issues.

You should treat the consumers of that data product — analysts, scientists, and business managers — as customers. Each data product owner should, as a fundamental objective, aim to provide them with a delightful customer experience.

Therefore, data product owners make sure that the data product is:

  • Reusable
  • Reproducible
  • Well-documented
  • Scalable
  • Accessible
  • Easy to understand and use, enabling self-service

3. Embed collaboration in daily workflows

The role of metadata in data governance

A core outcome of data governance is making your organization’s data easy to access, understand, and consume. Metadata plays a central role in this outcome by offering relevant context that makes data discoverable and understandable to its consumers.

However, it’s a mistake to house metadata in yet another tool that data teams must switch through to get the full context. Josh Wills, a software engineer at Slack, described the conundrum in his tweet — he has no desire to ever visit a third website to just “browse the metadata”.

screenshot showing tweet by Josh Wills

The need for embedding metadata in our daily workflows. Source: Twitter.

What is embedded collaboration?

Embedded collaboration is about work happening where you are, with the least amount of friction.

With embedded collaboration, you can answer several questions about the origins and traceability of data, which further simplifies data governance.

As Atlan’s co-founder Prukalpa Sankar says, “Embedded collaboration can unify dozens of micro-workflows that waste time, cause frustration, and lead to tool fatigue for data teams, and instead make these tasks delightful.”

What does embedded collaboration for data governance look like?

By embedding metadata into the daily workflows of your teams, you help them collaborate and discuss data using their tool of choice. For instance, they could search for data definitions with Slack, or trace lineage without leaving Looker.

Anyone trying to understand a dataset can do so using their BI tool of choice. They can get all the context on that asset — glossary definition, Slack discussions, queries, data lineage mapping, and more.

4. Automate wherever possible

The rise of automation

Statista notes that, from 2020 to 2022 alone, the total volume of data increased from 1 petabyte to over 2 petabytes. Matillon and IDG Research report that the number of data sources within a single company averages around 400.

At the same time, regulatory standards are growing more strict. As of 2023, some 65% of the world’s personal data is covered by privacy regulations.

In other words, data is growing at a rate that’s outstripping the ability of humans to manage it by hand.

Why automation must be a data governance best practice

Automation in data governance replaces hard-to-scale and error-prone manual processes with automated, sustainable, and reproducible ones. It’s the only way to scale data governance to meet today’s strenuous demands for data.

Examples of automation in data governance include:

  • Granular column-level access control. Automatically enforce access via users, groups, and teams with high levels of granularity.
  • Auto-constructed data lineage. Visualize the flow of data throughout your systems without the need to construct complex SQL statements by hand.
  • Auto-propagate policies through lineage. Tag all tables or columns derived from another automatically to ensure data remains protected across the system. For example, automatically hiding a sensitive field containing Personally Identifiable Information (PII) in a sales report.
  • Auto-generated audit logs. Record every interaction with data to learn more about how employees at the company are using data.

Automated data governance can cut the time required to perform repetitive data classification tasks by several orders of magnitude. UK-based digital bank Tide had originally calculated it would take 50 working days to identify all of the PII in their system. Using Atlan Playbooks to identify and tag this data automatically instead, Tide reduced that task down to a mere five hours.

5. Ensure data enablement with DataOps

DevOps and software development

DevOps rose to prominence because of its mission to deliver applications and services at scale by eliminating the silos in software development and operations.

It emphasizes developing a collaborative culture between the operations and development teams, and advocates using automation to make software delivery quicker with CI (continuous integration), CD (continuous delivery), and CD (continuous deployment).

SalesOps and sales productivity

Similarly, SalesOps came into the picture to reduce the friction between the various sales processes. According to HubSpot, SalesOps supports sales teams by offering insights on process bottlenecks, assisting with finding new leads and prospects, and using technology to make sales more efficient.

Both DevOps and SalesOps are a collection of philosophies, practices, and tools that reduce friction and promote collaboration across teams.

Data products also need a similar practice that focuses on tools, processes & culture to make the rest of the organization more data-driven and can help with better data governance. That’s where DataOps can help.

Implementing DataOps to elevate from data governance to data enablement

According to Gartner,

DataOps is “a collaborative data management practice focused on improving the communication, integration, and automation of data flows between data managers and data consumers across an organization.”

It applies the principles of lean manufacturing, Agile methodology, and DevOps to data. So, DataOps ensures that you:

  • Develop your data product with the goal of delivering value to end users and the business
  • Ship “data products” just like “software products” using the Agile methodology and automation (i.e., CI/CD pipelines)
  • Weave data governance into the daily workflows of everyone in your organization

6. Invest in the right technology

Technology has undergone massive shifts in the last decade as production costs have decreased substantially and cloud computing has become the norm.

As a result, we live in an era where the “end users are also employees of enterprises, and their expectations of the digital technologies in the enterprise are conditioned by the technology they use in their everyday lives.

This phenomenon is called the consumerization of technology. It’s why investing in the right technology requires you to look for:

  • An intuitive, memorable experience
  • Hyper personalization
  • Quick and snappy user interfaces
  • Tools that are alive and constantly adapting
  • Multiple modalities with rich interactions
  • Anytime, anywhere access
  • Collaboration

What tools help enable data governance best practices?

The tool you use to promote data governance across your organization must embody these characteristics.

To ensure you have a solution that lets you embrace data governance best practices, your chosen tool/platform should have the following capabilities:

  • An easily searchable data catalog. A study from Coveo revealed that users can spend up to 3.6 hours a day looking for relevant information. A data catalog cuts down on the time spent finding information by providing a single, one-stop, easy-to-use repository for all data within an organization.
  • Customized data workspaces. Using features such as personas, users can navigate information based on their job function and how they use data. For example, a data catalog can filter out technical data from data sets used by field sales personnel.
  • Business glossary. A business glossary is a set of unique business terms and definitions that help everyone in a company align on common terminology. Business owners typically own and maintain glossary terms, making the glossary’s maintenance dynamic and scalable.
  • Granular, role-based access controls. Granular access controls protect data, not just at the column level, but down to the data level. They also provide scalability via automated enforcement.
  • Automation tooling. Support for tools such as automated data lineage and automated propagation via lineage help scale data classification processes and enforce data quality standards.
  • Cross-system, column-level data lineage. Column-level data lineage shows how individual columns of a data table have changed through time. This increases trust and transparency in data, data quality and reliability, and data security.
  • Data quality profiling. Data quality profiling tools find and correct inconsistencies in data and enforce data standards no matter where a given piece of information lives. Using real-time data quality metrics, you can measure how quality is improving over time.

7. Keep changing and adapting your outlook on data governance

The evolution of the data landscape and the modern data stack

The data landscape keeps evolving and the modern data stack keeps changing. Within two decades, we’ve gone from relational databases to cloud data lakehouses and the ecosystem will continue to evolve as more data and analytics use cases emerge.

Here’s how Matt Turck, VC at FirstMark, describes this evolution:

data warehouses have unlocked an entire ecosystem of tools and companies that revolve around them: ETL, ELT, reverse ETL, warehouse-centric data quality tools, metrics stores, augmented analytics, etc. Many refer to this ecosystem as the “modern data stack”.

The Machine learning, Artificial intelligence and Data (MAD) landscape by Matt Turck and John Wu

The Machine learning, Artificial intelligence and Data (MAD) landscape by Matt Turck and John Wu at FirstMark. Source: Matt Turck.

Why is continuously reviewing your approach to data governance a best practice?

While capturing and ingesting large volumes of data has become easier and cheaper, keeping track of all that data, getting adequate context, and using it for decision-making continue to be painful.

That’s why there’s a lot more room to evolve in the data tooling ecosystem. Matt Turck further mentions how data engineering tools and practices are still very much behind the level of sophistication and automation of their software engineering cousins.

That’s why it’s crucial to view data governance as a constantly evolving project, rather than a one-time exercise, just like the rest of the data stack.

Here’s how Snowflake emphasizes this need:

“As data volumes grow, new data streams emerge, and new access points emerge, you’ll need a policy for periodic reviews of your data governance structure — essentially governance of the data governance process.”

Why should you follow these data governance best practices?

The data governance best practices we’ve identified here address why some data governance programs fail.

Most organizations already have a data governance program in place. However, its effectiveness is far from guaranteed.

According to Gartner’s D&A governance survey in 2021, 61% said their governance objectives included optimization of data for business processes and productivity. But only 42% of that group believed they were on track to meet that goal.

In the same survey, Gartner estimates that by 2025, 80% of organizations seeking to scale digital business will fail because they do not take a modern approach to data governance. Such an approach should be decentralized, community-led, and collaborative.

Data governance best practices: Next steps

Adopting a “data product” mindset, embedding collaboration in daily workflows, embracing DataOps, and leveraging highly customizable and programmable tools are critical.

You can start by identifying a high ROI use case for data governance and following the best practices above. Once you’ve seen proof-of-concept, you can scale data governance for the remaining data and analytics use cases.

Share this article

[Website env: production]