What Is Data Curation? How Does It Intersect with Governance?

January 22nd, 2021

Data Lineage

What is data curation?

Data curation is an end-to-end process of preparing and managing data so business users can easily understand and readily use it. It is the skill of selecting and bringing together relevant data into structured, searchable data assets that are ready for analysis.

“Governed data curation can bridge the gap between data and business.”

-Tableau

The ultimate goal of data curation is to reduce the time from data to insights. With the growing amount of data in organizations today, data curation is becoming essential. Without it, business users can neither locate useful data nor use it to its maximum potential.

As data curation becomes more important, self-servicing analytical tools and modern data catalogs are growing in popularity. These help curate both data and metadata, which ultimately makes data management efforts successful.

The need for data curation

For business users, curated data can speed up analysis and drive quicker decisions. It means less time spent on finding, cleaning, or preparing data and more on answering business questions.

As cited by HBR, in 2019, 55% of companies had invested over $50 million in big data and AI. However, 77% reported that business adoption of these initiatives is a big challenge. This gap between data existence and data use is why data curation is essential.

For example, imagine visiting a museum with randomly placed artifacts. As a visitor, will you be able to experience the gallery at its best? Of course not.

Now imagine looking at the artifacts without any contextual description about them. You are left confused and helpless. You really wanted to know the name of the painter or the era to which the painting belonged, but you couldn’t. So you walk away… You definitely don’t want this to happen to your business users, so remember to curate your data assets.

Who are data curators?

Data curators are responsible for the entire data lifecycle, right from ingestion to consumption. They are industry experts who understand business context and can create relevant data assets for the business users. If an organization operates in different domains, it can have multiple data curators, each responsible for its own domain.

Data curators may also add metadata and necessary data context. However, their role should not be confused with a database administrator, who curates datasets and metadata from different databases.

Why can’t everyone in the organization be a data curator? Because it will take time away from the things they are much better at doing. However, organizations should find ways to crowdsource human tribal information into the curation process.

For data curators, it is also important to make sure to uphold the principles of data governance while curating data for an organization.

4 benefits of data curation

    Data curation can solve four fundamental data needs. Addressing these will help an organization achieve both high quality curation and good governance.
  1. Easily discover and use data.
  2. Ensure data quality.
  3. Maintain metadata linked with data.
  4. Ensure compliance through data lineage and classification.

Easily discover and use data

The primary objective of data curation is to make data discovery easier.

Modern data catalogs can solve this problem. They bring together data from disparate sources, which a data curator can then neatly organize and maintain.

 Data catalog with several integrations.

Modern catalogs’ Amazon-like search and filter-rich browsing can make discovering data fast and intuitive for business users.

Looking for a modern data catalog? This list of 28 data catalogs can help you choose the right one for your organization.

Ensure data quality

Curated data builds trust. With curation, users will know that their data assets have been verified and approved by the data curator.

Add quality status to data assets.

Proper documentation, data profiling, a data dictionary, and status tags are handy tools to help curators demonstrate and maintain this trust. Calculating data quality metrics for every ingested data table can also help the data curator flag bad data for business users.

Read this article to learn more about data quality.

Maintain metadata linked with data

A data curator is responsible for bringing both data and metadata together. However, it is important to make sure that metadata is not created far away from the real data. The column description, date of update, primary key, and all other important information about a data asset should be accessible right next to it.

Metadata catalogs are great tools that make it easier to build and link metadata to your data assets.

Link metadata with data.

Ensure compliance through data lineage and classification

A data curator has to be very quick in troubleshooting issues with data. And to do that, having an end-to-end lineage setup is crucial. This lets a curator track the origins of data and see its impact on other assets.

Check out some of the top open-source and paid lineage tools that can help your organization.

Set up data lineage and impact analysis for your data assets.

Using AI-enabled bots to auto-classify your PII data assets should also be a part of a data curator’s bucket list. This will govern and protect sensitive data in the organization.

As per Tamr’s CTO, maturing enterprises are seeking out new methods of managing and curating data, built for both scale and speed. Looking to get your data curation and governance under control? Check out some more tips to improve the data curation process.

Ready to take our data curation and governance solution for a test drive?See the demo

Ebook cover - data catalog primer

Data Catalog Primer - Everything You Need to Know About Data Catalogs.

Adopting a data catalog is the first step towards data discovery. In this guide, we explore the evolution of the data management ecosystem, the challenges created by traditional data catalog solutions, and what an ideal, modern-day data catalog should look like. Download now!