dbt Data Catalog: Discussing Native Features Plus Potential to Level Up Collaboration and Governance with Atlan

Updated March 27th, 2023
header image

Share this article


Quick answer:

Here’s a 2-minute explainer on the need for a dbt data catalog and how this article can help you find the right fit for your data estate:

  • A data catalog collects metadata and helps you discover, understand, trust, and use your dbt assets.
  • This article will explore the importance of cataloging dbt assets and the essential capabilities to look for while deploying a data catalog.
  • Looking for a data catalog to manage and activate your dbt assets? Make sure to check out Atlan — a leader in enterprise data catalogs. Book a demo or take a guided product tour.


dbt is a popular tool that helps analysts and engineers transform data in their warehouses more effectively. A data catalog for dbt can help you gather enough metadata and trust signals on your dbt models.

This can help downstream consumers discover, trust, and understand the data assets that are being prepared for them.

Active data catalogs go a step further and can even empower dbt users to work on dbt models with confidence. They provide full visibility of how any change impacts tables or dashboards downstream.

In this article, we will discuss dbt’s native data cataloging capabilities and provide an overview of additional capabilities that you should consider bringing into your stack to achieve comprehensive cataloging & governance of dbt assets.

First, let us appreciate the importance of having a dbt data catalog


Table of contents #

  1. Importance of cataloging dbt assets
  2. Native capabilities of dbt data catalog
  3. Capabilities to consider while deploying a dbt data catalog
  4. Atlan + dbt: Bridge the gap between your analytics engineers and business users
  5. The bottom line on dbt data catalog
  6. Atlan + dbt implementation: Related Resources

Importance of cataloging dbt assets #

dbt, like many other tools in the modern data stack, was born out of a need to standardize model implementation for large-scale transformation workloads by better organization and templatization. However, to fully leverage dbt’s potential, the cataloging of assets is key.

Some reasons why data cataloging for dbt is non-negotiable: #


Efficient documentation: With smaller, distributed, and autonomous teams, there’s a need for clarity on business definitions across the organization to ensure seamless collaboration. Efficient documentation is of the utmost importance in this scenario.

Accessible and discoverable documentation: In the world of data, efficient documentation isn’t just plain old text; it must be interactive. Data users should be able to explore data sources, entities, relationships, models, constraints, lineages, etc., which are searchable and discoverable via a data catalog.

Automatic capturing of lineage: A data catalog solution for dbt can automatically capture data lineage - which can help analysts and engineers understand the impact of any change and ensure that downstream consumers can trust and understand the data assets being prepared for them. This can help prevent errors and inconsistencies, increase efficiency, and ultimately improve the quality of the data produced.

dbt, with the workloads it encourages and supports, needs such a data catalog. So, where do we get started? Let’s first look at dbt’s native catalog capabilities.


A Guide to Building a Business Case for a Data Catalog

Download free ebook


Native capabilities of dbt data catalog #

dbt offers you a way to generate data model documentation automatically. dbt publishes this documentation on a static website by default. You can get the static documentation up and running with a simple dbs docs serve command.

This documentation provides you with the following features:

  • Search and discovery for all data models in the dbt project
  • Relationships between different entities and data models
  • Finer details about columns, their types, allowed values, etc.
  • The SQL script for any given entity or data model
  • Table-level lineage for all data models in the dbt project

Let’s understand some of these capabilities in detail:

Column-level details in a data model #


With the dbt data catalog, you get a descriptive tabular representation of all the database object columns with details, such as column description, data type, column-level tests, allowed values, and more.

For instance, here’s the Columns section for the orders table in the jaffle_shop project.

Column-level data dictionary in dbt.

Column-level data dictionary in dbt. Source: dbt

Check out how the allowed values are enriched with a description so that anyone reading through the documentation can clearly understand the column’s purpose.

Search and discovery #


Menu-based navigation is handy, but it still leaves room for many use cases where business users unaware of the database schema or the project structure find it challenging to search for what they’re after.

Fortunately, dbt allows you to perform a full-text search on everything in the catalog using the search bar located at the top of the page, as shown in the image below:

dbt provides native search to discover data assets

dbt provides native search to discover data assets. Source: dbt

The search feature also lets you search through specific subsets of information, such as names, descriptions, and tags.

Table-level data lineage #


Finally, the lineage graph is another great feature of the dbt data catalog. You’ll see a round-shaped, green-colored icon on the bottom right corner of the documentation page.

When pressed, this icon opens a full-size pop-up window with a lineage diagram describing the data transformation journey of your data, as shown in the image below.

Understand how the data flows with dbt's table-level data lineage

Understand how the data flows with dbt's table-level data lineage. Source: dbt

The data lineage visualization builds upon the depends on and referenced by fields that dbt maintains for every data model based on your transformation workflow. You can customize the lineage graph by selecting the resources you want it to use to calculate the lineage; as shown in the image below.

dbt data lineage customization

dbt data lineage customization. Source: dbt

dbt’s data catalog covers quite a bit of ground to make it worthwhile for developers and some data teams, especially dbt developers. However, business teams might need more information about the data they’re using.

In the next section, let’s explore some better cataloging opportunities that aren’t possible with the dbt data catalog.



Capabilities to consider while deploying a dbt data catalog #

dbt’s native data catalog ticks off some necessary features, such as search and discovery, table-level lineage, and metadata management.

It doesn’t, however, touch upon the critical areas, such as data governance, which is why, despite being a great addition to dbt, it warrants the need for a comprehensive data cataloging solution.

Here are some other capabilities that you must look for while deploying a cataloging solution for your dbt assets:

  • Data governance features, especially classification, tagging, and criticality
  • Finer data lineage features, such as column-level lineage with custom metadata
  • Automation of data cataloging AI-powered workflows, personalization, and curation of metadata
  • Ability to integrate across your data stack - right from the source layer to the consumption layer
  • Bi-directional flow of metadata between your dbt data catalog and other preferred tools in your dbt workflow
  • UI/UX that’s intuitive and encourages adoption across technical and non-technical data practitioners

Some of these missing features are essential for a data cataloging solution to drive business-critical outcomes.


Data Catalog 3.0: The Modern Data Stack, Active Metadata, and DataOps

Download ebook


Atlan + dbt: Bridge the gap between your analytics engineers and business users #

In addition to significant improvements on what dbt offers in data cataloging, Atlan provides value by adding state-of-the-art data governance, data lineage, search, and discovery features. This helps you get a 360° view of your data across the board.

Some popular capabilities amongst users include:

  • Active data governance
  • End-to-end column-level lineage
  • dbt metrics as an asset with their own profiles
  • Embedded impact analysis in GitHub
  • Building documentation standards right from dbt
  • Exposing dbt documentation to your entire team
  • Bringing dbt context into other tools
  • Enabling self-service in data consumers and producers

Let’s look at some of Atlan’s features in more detail.

Active data governance #


Atlan’s powerful data governance engine more than covers everything missing from the dbt data catalog. The extensive ownership and classification features help teams across the organization work with data more efficiently, considering all compliance and regulatory concerns. A unified permissions model allows you to integrate dbt with other data sources seamlessly.

Atlan's data governance for dbt data assets

Atlan's data governance for dbt data assets. Source: Atlan

End-to-end column-level lineage #


To get complete context and a 360° view of your data, you need to see how it flows end-to-end, from one system to the next. While dbt provides table-level data lineage, that isn’t enough for most business teams.

Atlan’s comprehensive column-level data lineage feature allows you to track lineage at the finest level with as much context as possible. With a smooth and interactive user interface, you can easily find your way around data in any system across your organization.

Atlan's column level lineage for dbt data assets

Atlan's column-level lineage for dbt data assets. Source: Atlan


Embedded impact analysis in GitHub #


Atlan’s integration with GitHub solves collaboration challenges faced by data engineers who lack visibility around downstream usage of data assets and scramble to account for every change.

This integration enables data governance to move closer to the data creation process by helping data engineers understand how changes in a dbt model impact upstream assets.

Atlan - GitHub integration - GitHub actions screenshot

Atlan - GitHub integration - GitHub actions screenshot. Image by Atlan

With Atlan and GitHub, data engineers can identify and collaborate with asset owners, and changes impacting high-value assets can be approved or disapproved by stakeholders. Atlan brings lineage to GitHub, making it easy to see the impact of changes made to important data pipelines. Whenever someone opens a pull request to change a dbt model, the Atlan-GitHub action automatically creates a list of all downstream assets that will be impacted.


dbt metrics as first-class citizens on Atlan #


Atlan’s integration with the dbt Semantic Layer brings dbt’s rich metrics into the rest of the data stack. With this integration, company metrics are now part of a column-level lineage, spanning from data sources and data storage to transformation and BI tools.

Understand in detail how joint users benefit from this integration.

dbt metrics profile on Atlan

dbt metrics get their own 360° profile on Atlan. Source: Atlan


Building documentation standards right from dbt #


Atlan’s deep integration with dbt lets you create repeatable, metadata properties — table owners and verified tags — in your dbt models. It provides the base for sharing knowledge across your organization, by standardizing documentation for developers.

Business glossary and data dictionary — documentation for dbt data assets

Business glossary and data dictionary — documentation for dbt data assets. Source: Atlan


Expose dbt documentation to your entire team #


With features like certification, freshness, relevance, and popularity, integrated with a Google search-like search interface, Atlan supercharges your data discovery experience, saving you precious time sorting through documentation and scrambling for information on email and chat.

Search and discover assets through your entire data ecosystem

Search and discover assets through your entire data ecosystem. Source: Atlan


Bring dbt context to your tools (reverse metadata) #


Atlan’s Chrome extension brings dbt metadata where you work. If you’re in a BI dashboard, you don’t have to go searching for context in dbt.

Access dbt metadata in the tools that you use everyday

Access dbt metadata in the tools that you use every day. Source: Atlan


Enabling self-service in data consumers and producers #


Atlan enables teams across the business to meet their data needs by self-service, a feature at the core of the modern data stack and its evolution to date. Capabilities such as intelligent automation, personalization, and custom metadata make Atlan more intuitive, flexible, and valuable to every team.

Self-service data discovery for everyone who wants to understand their business better

Self-service data discovery for everyone who wants to understand their business better. Source: Atlan


The bottom line on dbt data catalog #

Creating a data asset like a data warehouse or a data lake isn’t enough. You must ensure that the data is visible, discoverable, and understandable by business and technical teams. Data catalogs play a significant role in making that happen.

The dbt data catalog is a good built-in tool for dbt users with no additional cost or development effort, but if you want a comprehensive solution that gives you data governance, data lineage, and advanced search and discovery features, you should definitely give Atlan a try.


Share this article

[Website env: production]