dbt Data Catalog: Native Features, Alternative Tools & More

October 21st, 2022

header image for dbt Data Catalog: Native Features, Alternative Tools & More

dbt is one of the most popular tools among businesses trying to implement newer data engineering patterns such as lakehouse and data mesh. Even the businesses revamping their existing data transformation infrastructure rely on dbt.

However, easy search and discovery become a huge problem for these teams when dealing with data at scale. This calls for a comprehensive data cataloging solution for dbt.

Here, we will take you through the various native features of the dbt data catalog, and talk about additional capabilities you must consider bringing into your stack — for comprehensive cataloging of dbt assets.

First, let’s start off by appreciating the importance of having a data catalog for dbt.

Importance of cataloging dbt assets

dbt, like many other tools in the modern data stack, was born out of a need to standardize model implementation for large-scale transformation workloads by better organization and templatization. However, to fully leverage DBT’s potential, the cataloging of assets is key.

Here’s why:

Efficient documentation: With smaller, distributed, and autonomous teams, there’s a need for clarity on business definitions across the organization to ensure seamless collaboration. Efficient documentation is of the utmost importance in this scenario.

Accessible and discoverable documentation: In the world of data, efficient documentation isn't just plain old text; it must be interactive. Data users should be able to explore data sources, entities, relationships, models, constraints, lineages, etc., which are searchable and discoverable via a data catalog.

dbt, with the workloads it encourages and supports, needs such a data catalog. So, where do we get started? Let’s first look at dbt’s native catalog capabilities.

A Guide to Building a Business Case for a Data Catalog

Download free ebook

dbt data catalog: Native capabilities

dbt offers you a way to generate data model documentation automatically. dbt publishes this documentation on a static website by default. You can get the static documentation up and running with a simple dbs docs serve command.

This documentation provides you with the following features:

  • Search and discovery for all data models in the dbt project
  • Relationships between different entities and data models
  • Finer details about columns, their types, allowed values, etc.
  • The SQL script for any given entity or data model
  • Table-level lineage for all data models in the dbt project

Let's understand some of these capabilities in detail:

Column-level details in a data model

With dbt data catalog, you get a descriptive tabular representation of all the database object columns with details, such as column description, data type, column-level tests, allowed values, and more.

For instance, here's the Columns section for the orders table in the jaffle_shop project.

Column-level data dictionary in dbt.

Column-level data dictionary in dbt. Source: dbt

Check out how the allowed values are enriched with a description so that anyone reading through the documentation can clearly understand the column's purpose.

Search and discovery

Menu-based navigation is handy, but it still leaves room for many use cases where business users unaware of the database schema or the project structure find it challenging to search for what they're after.

Fortunately, dbt allows you to perform a full-text search on everything in the catalog using the search bar located at the top of the page, as shown in the image below:

dbt provides native search to discover data assets

dbt provides native search to discover data assets. Source: dbt

The search feature also lets you search through specific subsets of information, such as names, descriptions, and tags.

Table-level data lineage

Finally, the lineage graph is another great feature of the dbt data catalog. You'll see a round-shaped, green-colored icon on the bottom right corner of the documentation page.

When pressed, this icon opens a full-size pop-up window with a lineage diagram describing the data transformation journey of your data, as shown in the image below.

Understand how the data flows with dbt's table-level data lineage

Understand how the data flows with dbt's table-level data lineage. Source: dbt

The data lineage visualization builds upon the depends on and referenced by fields that dbt maintains for every data model based on your transformation workflow. You can customize the lineage graph by selecting the resources you want it to use to calculate the lineage; as shown in the image below.

dbt data lineage customization

dbt data lineage customization. Source: dbt

dbt's data catalog covers quite a bit of ground to make it worthwhile for developers and some data teams, especially dbt developers. However, business teams might need more information about the data they're using.

In the next section, let's explore some better cataloging opportunities that aren’t possible with the dbt data catalog.

[Download] → Forrester Wave™: Enterprise Data Catalog for DataOps, Q2 2022

Other capabilities to consider while deploying a data catalog for dbt assets

dbt’s native data catalog ticks off some necessary features, such as search and discovery, table-level lineage, and metadata management.

It doesn't, however, touch upon the critical areas, such as data governance, which is why, despite being a great addition to dbt, it warrants the need for a comprehensive data cataloging solution.

Here are some other capabilities that you must look for while deploying a cataloging solution for your dbt assets:

  • Data governance features, especially classification, tagging, and criticality
  • Finer data lineage features, such as column-level lineage with custom metadata
  • Automation of data cataloging workflows, personalization, and curation of metadata
  • A holistic view of all the data across the business
  • Ability to promote self-service for teams across all levels of business and technical capability

Some of these missing features are essential for a data cataloging solution to be a success for a business.

Data Catalog 3.0: The Modern Data Stack, Active Metadata, and DataOps

Download ebook

Atlan + dbt: Bridge the gap between your analytics engineers and business users

In addition to significant improvements on what dbt offers in data cataloging, Atlan provides value by adding state-of-the-art data governance, data lineage, search, and discovery features. This helps you get a 360° view of your data across the board.

Some popular capabilities amongst users include:

  • Active data governance
  • End-to-end column-level lineage
  • dbt metrics as an asset with their own profiles
  • Building documentation standards right from dbt
  • Exposing dbt documentation to your entire team
  • Bringing dbt context into other tools
  • Enabling self-service in data consumers and producers

Let's look at some of Atlan's features in more detail.

Active data governance

Atlan's powerful data governance engine more than covers everything missing from the dbt data catalog. The extensive ownership and classification features help teams across the organization work with data more efficiently, considering all compliance and regulatory concerns. A unified permissions model allows you to integrate dbt with other data sources seamlessly.

Atlan's data governance for dbt data assets

Atlan's data governance for dbt data assets. Source: Atlan

End-to-end column-level lineage

To get complete context and a 360° view of your data, you need to see how it flows end-to-end, from one system to the next. While dbt provides table-level data lineage, that isn't enough for most business teams.

Atlan's comprehensive column-level data lineage feature allows you to track lineage at the finest level with as much context as possible. With a smooth and interactive user interface, you can easily find your way around data in any system across your organization.

Atlan's column level lineage for dbt data assets

Atlan's column level lineage for dbt data assets. Source: Atlan

A Demo of Atlan data catalog

dbt metrics as first-class citizens on Atlan

Atlan’s integration with the dbt Semantic Layer brings dbt’s rich metrics into the rest of the data stack. With this integration, company metrics are now a part of column-level lineage, spanning from data sources and data storage to transformation and BI tools.

Understand in detail how joint users benefit from this integration.

dbt metrics profile on Atlan

dbt metrics get their own 360° profile on Atlan. Source: Atlan

Building documentation standards right from dbt

Atlan’s deep integration with dbt lets you create repeatable, metadata properties — table owners and verified tags — in your dbt models. It provides the base for sharing knowledge across your organization, by standardizing documentation for developers.

Business glossary and data dictionary — documentation for dbt data assets

Business glossary and data dictionary — documentation for dbt data assets. Source: Atlan

Expose dbt documentation to your entire team

With features like certification, freshness, relevance, and popularity, integrated with a Google search-like search interface, Atlan supercharges your data discovery experience, saving you precious time sorting through documentation and scrambling for information on email and chat.

Search and discover assets through your entire data ecosystem

Search and discover assets through your entire data ecosystem. Source: Atlan

Bring dbt context to your tools (reverse metadata)

Atlan's Chrome extension brings dbt metadata where you work. If you're in a BI dashboard, you don't have to go searching for context in dbt.

Access dbt metadata in the tools that you use everyday

Access dbt metadata in the tools that you use everyday. Source: Atlan

Enabling self-service in data consumers and producers

Atlan enables teams across the business to meet their data needs by self-service, a feature at the core of the modern data stack and its evolution to date. Capabilities such as intelligent automation, personalization, and custom metadata make Atlan more intuitive, flexible, and valuable to every team.

Self-service data discovery for everyone who wants to understand their business better

Self-service data discovery for everyone who wants to understand their business better. Source: Atlan

Bottom line

Creating a data asset like a data warehouse or a data lake isn't enough. You must ensure that the data is visible, discoverable, and understandable by business and technical teams. Data catalogs play a significant role in making that happen.

The dbt data catalog is a good built-in tool for dbt users with no additional cost or development effort, but if you want a comprehensive solution that gives you data governance, data lineage, and advanced search and discovery features, you should definitely give Atlan a try.

Free Guide: Find the Right Data Catalog in 5 Simple Steps.

This step-by-step guide shows how to navigate existing data cataloging solutions in the market. Compare features and capabilities, create customized evaluation criteria, and execute hands-on Proof of Concepts (POCs) that help your business see value. Download now!