Data Catalogs in 2024: Features, Business Value, Use Cases

Updated March 02nd, 2023
header image

Share this article

The data catalog is finally coming of age. It is evolving to a level where it is outgrowing its own name.

The use cases that data practitioners expect to drive from a data catalog have significantly changed in the past couple of years. While this change hasn’t been sudden, we appear to be at an inflection point where organizations are compelled to demand more from their data catalog.



What is a data catalog? How is it maturing? And what should you consider table stakes when evaluating data catalog solutions for your organization? This article explores these questions and more.


Table of contents

  1. What is a data catalog?
  2. How is a data catalog usually defined?
  3. Data catalogs and The Cambrian Explosion (2021-2025)
  4. Data catalogs in 2024: defining capabilities
  5. What value should you be able to drive from your data catalog?
  6. Data catalog use cases
  7. Data catalog: resources for deep dive

What is a data catalog?

A data catalog is a workspace that serves as a context, control, collaboration, and action plane integrating your entire data estate, diverse data users, and divergent data use cases.

Data catalogs - Representation of how a third-gen data catalog encompasses your data estate, people, and processes


Representation of how a third-gen data catalog encompasses your data estate, people, and processes. Source: Atlan Activate



How is a data catalog usually defined?

If you ask ChatGPT what is a data catalog, this is how it would define it:

Screenshot of ChatGPT generated answer to what is a data catalog.


Screenshot of ChatGPT generated answer to what is a data catalog. Source: Open AI

While not incorrect, this definition requires a shift in perspective. To start, qualifying a data catalog as a mere repository or inventory of data is limiting.

Forrester listed the following as must-have attributes that define the best data catalog tools


To quote, Forrester recommended that enterprise data catalog customers should look for providers that:

  • Address the diversity, granularity, and dynamic nature of data and metadata
  • Generate deep transparency of the nature and path of data flow and delivery
  • Deliver a UI/UX that reinforces modern DataOps and engineering best practices

Source: The Forrester Wave™: Enterprise Data Catalogs For DataOps, Q2 2022

Quadrant from the latest Forrester Wave Report that includes an assessment of the top data catalog solution providers.

Quadrant from the latest Forrester Wave Report that includes an assessment of the top data catalog solution providers. Source: Forrester

Gartner too had declared traditional metadata practices insufficient


It is important to note that the ecosystem has been calling for a revised approach to metadata for a while now. Gartner had previously replaced its Magic Quadrant for Metadata Management with a Market Guide for Active Metadata. The starting lines of the report were enough to prompt a call to action.

The increased demand for orchestrating existing and new systems has rendered traditional metadata practices insufficient. Organizations are demanding “active metadata” to assure augmented data management capabilities. Source: Gartner, Market Guide for Active Metadata Management

Not just industry advisors, data practitioners are also vocally restless about how data catalogs fail to keep up with their needs



A tweet by data practitioner Josh Willis on his expectations from a data catalog. Source: Twitter


Data catalogs and The Cambrian Explosion (2021-2025)

In December 2020, Tristan Handy, the Founder, and CEO of dbt labs wrote a blog post expressing his vision for the Modern Data Stack.

In the post, he pondered about best-of-breed tools reaching a certain level of maturity/stasis and wrote about eagerly awaiting the next Cambrian Explosion when getting hands on a tool will feel like being granted superpowers.

So, what should an ideal data catalog look and feel like in 2024? What features in data catalogs make you feel like you have superpowers? Let’s derive from our initial definition

A data catalog is a workspace that serves as a context, control, collaboration, and action plane integrating your entire data estate, diverse data users, and divergent data use cases.

Essentially, when implementing a data catalog, it is important to consider the “why,” “who,” and “how” of the data

Screenshot from the demo of the Atlan data catalog.


Screenshot from the demo of the Atlan data catalog. Source: Atlan Activate


Data catalogs in 2024: Defining capabilities

All features of a data catalog in 2024 are guided by these four fundamental & transformational capabilities:

  1. End-to-end visibility of your entire data estate
  2. Embedded collaboration that unifies workflows of diverse data users
  3. Programmable bots that can be trained as per different use cases
  4. Architecture that’s fundamentally open-by-default

1. End-to-end visibility of your entire data estate


Users want complete visibility into their data assets, including ownership, source, and permitted usage, without the need to switch between various data quality, lineage, catalog, and governance tools. Data catalogs can make this available in one seamless experience. This manifests through several features:

  • Column-level lineage
  • 360-degree data asset profile
  • Custom metadata to bring in context from ETL tools, orchestration tools, etc
  • Visual data previews & related queries
  • and more

2. Embedded collaboration that unifies workflows of diverse data users


Embedded collaboration is all about making work happen where you are, with the least amount of friction possible. Data catalogs recognize the diversity of data users and their different tool preferences and ensure seamless integration with teams’ daily workflows.

This can take many forms, including:

  • Requesting and accessing data assets via a link.
  • Approving or rejecting access requests using your preferred collaboration tool.
  • Configuring data quality alerts on Slack, allowing your team to ask questions about a data asset and receive context directly in Slack.
  • Triggering support requests on Jira without leaving the screen where you’re investigating a data asset.

3. Programmable bots that can be trained as per different use cases


No single algorithm can magically create context, identify anomalies, and achieve the dream of intelligent data management for every industry, company, and use case.

That’s why third-generation tools rely instead on programmable bots — a framework that allows teams to create their own algorithms. For instance, companies that have specific naming conventions for their data sets can create bots that automatically organize, classify, and tag their data ecosystem using preset rules.

4. Architecture that’s fundamentally open-by-default


Metadata will be key to unlocking several operational use cases in the future, such as auto-tuning data pipelines and CI/CD pipelines. It can even serve as the foundation for modern concepts like data fabric and data mesh. To achieve this, the fundamental metadata store needs to have an openly accessible API layer that allows teams to build on top of it.


Download the primer on third-generation data catalogs to dive into each of these principles and how they manifest

Download free ebook


What value should you be able to drive from your data catalog?

A data catalog is perhaps one of the best investments that you can make as a data leader in 2024. Here are various ways that you can use a data catalog to generate value:

  • Reduce costs
  • Maximize productivity
  • Mitigate risk
  • Maximize revenue
  • Improve customer experience

#1- Reduce costs


For example, a data catalog can be used to deprecate expensive and unused data assets or to reduce unnecessary data processing and improve resource utilization.

Read about more ways in which a data catalog can reduce costs

#2- Maximize productivity


Data catalogs have been known to cut down the onboarding time of new hires from weeks to just days. Data catalogs also significantly increase data consumer productivity by enabling non-technical users to self-serve data requests

Learn how a $3.5B startup broke out of the “Data as a Service” trap and made self-service real through reusable data products

#3- Mitigate risk


Compliance with global and local regulations is easy with data catalogs - deployment of such policies can be accomplished in hours instead of days

Learn how a UK-based digital bank, with nearly 500,000 small business customers, improved its compliance with GDPR’s “Right to Erasure” by automating its manual processes using a data catalog

#4- Maximize revenue


Secure and accessible data, improved data quality and trust, and the ability to confidently act on data are all important factors that lay the groundwork for businesses to innovate using insights uncovered from data.

Learn how a $20 billion global insurer used a data catalog to deliver better insurance solutions through better data

#5- Improve customer experience


For e.g. to improve customer satisfaction, one conducts an impact analysis on downstream data use. This analysis determines how data is used after collection and how any potential issues in downstream processes can affect the customer experience.


Here’s how Nasdaq accelerated key business users’ ability to access data by ramping up on a modern data stack, which included deploying a data catalog tool



Data catalog use cases

Here are some typical use cases that a data catalog can power:

Each use case has been linked to relevant feature previews for you to explore and understand how these abstract concepts manifest in a third-generation data catalog tool.


Read how Brainly implemented a data catalog and prioritized its adoption to improve data discoverability and governance across the company

How Brainly implemented a data catalog and prioritized its adoption to improve data discoverability and governance across the company.


How Brainly implemented a data catalog and prioritized its adoption to improve data discoverability and governance across the company. Source: Brainly on Medium


When do you need to buy a data catalog tool?


The bottom of this curve is the ideal time to buy a data catalog.


The bottom of this curve is the ideal time to buy a data catalog. Source: HumansOfData

As Austin Kronz, explains in his blog on how to kickstart a data governance program, as your team grows, any consistent increase in time to value (for example, quarter over quarter) is a sign you need to invest in a data catalog.

Quoting from the same resource:

Recognizing the inflection point of growth in data and analytics roles and the effects on time to value are tell-tale signs that it is time to formalize data governance efforts and procure a modern data catalog. Without this, organizations would have to invest excessive amounts of money on hiring to manually manage new data products — something that isn’t possible in the economic conditions faced in 2023.


Data catalog: Resources for deep dive

We have compiled a few resources that will help you find answers to more of your questions regarding data catalogs. These will be periodically updated.

What?


How?


Why?



Share this article

[Website env: production]