dbt is one of the most popular tools among businesses trying to implement newer data engineering patterns such as lakehouse and data mesh. Even the businesses revamping their existing data transformation infrastructure rely on dbt.
However, easy search and discovery become a huge problem for these teams when dealing with data at scale. This calls for a comprehensive data cataloging solution for dbt.
Here, we will take you through the various native features of the dbt data catalog, and talk about additional capabilities you must consider bringing into your stack — for comprehensive cataloging of dbt assets.
First, let’s start off by appreciating the importance of having a data catalog for dbt.
Importance of cataloging dbt assets
dbt, like many other tools in the modern data stack, was born out of a need to standardize model implementation for large-scale transformation workloads by better organization and templatization. However, to fully leverage DBT’s potential, the cataloging of assets is key.
Efficient documentation: With smaller, distributed, and autonomous teams, there’s a need for clarity on business definitions across the organization to ensure seamless collaboration. Efficient documentation is of the utmost importance in this scenario.
Accessible and discoverable documentation: In the world of data, efficient documentation isn't just plain old text; it must be interactive. Data users should be able to explore data sources, entities, relationships, models, constraints, lineages, etc., which are searchable and discoverable via a data catalog.
dbt, with the workloads it encourages and supports, needs such a data catalog. So, where do we get started? Let’s first look at dbt’s native catalog capabilities.
A Guide to Building a Business Case for a Data Catalog
dbt data catalog: Native capabilities
dbt offers you a way to generate data model documentation automatically. dbt publishes this documentation on a static website by default. You can get the static documentation up and running with a simple
dbs docs serve command.
This documentation provides you with the following features:
- Search and discovery for all data models in the dbt project
- Relationships between different entities and data models
- Finer details about columns, their types, allowed values, etc.
- The SQL script for any given entity or data model
- Table-level lineage for all data models in the dbt project
Let's understand some of these capabilities in detail:
Column-level details in a data model
With dbt data catalog, you get a descriptive tabular representation of all the database object columns with details, such as column description, data type, column-level tests, allowed values, and more.
For instance, here's the
Columns section for the
orders table in the
Check out how the allowed values are enriched with a description so that anyone reading through the documentation can clearly understand the column's purpose.
Search and discovery
Menu-based navigation is handy, but it still leaves room for many use cases where business users unaware of the database schema or the project structure find it challenging to search for what they're after.
Fortunately, dbt allows you to perform a full-text search on everything in the catalog using the search bar located at the top of the page, as shown in the image below:
The search feature also lets you search through specific subsets of information, such as names, descriptions, and tags.
Table-level data lineage
Finally, the lineage graph is another great feature of the dbt data catalog. You'll see a round-shaped, green-colored icon on the bottom right corner of the documentation page.
When pressed, this icon opens a full-size pop-up window with a lineage diagram describing the data transformation journey of your data, as shown in the image below.
The data lineage visualization builds upon the
depends on and
referenced by fields that dbt maintains for every data model based on your transformation workflow. You can customize the lineage graph by selecting the resources you want it to use to calculate the lineage; as shown in the image below.
dbt's data catalog covers quite a bit of ground to make it worthwhile for developers and some data teams, especially dbt developers. However, business teams might need more information about the data they're using.
In the next section, let's explore some better cataloging opportunities that aren’t possible with the dbt data catalog.
[Download] → Forrester Wave™: Enterprise Data Catalog for DataOps, Q2 2022
Other capabilities to consider while deploying a data catalog for dbt assets
dbt’s native data catalog ticks off some necessary features, such as search and discovery, table-level lineage, and metadata management.
It doesn't, however, touch upon the critical areas, such as data governance, which is why, despite being a great addition to dbt, it warrants the need for a comprehensive data cataloging solution.
Here are some other capabilities that you must look for while deploying a cataloging solution for your dbt assets:
- Data governance features, especially classification, tagging, and criticality
- Finer data lineage features, such as column-level lineage with custom metadata
- Automation of data cataloging workflows, personalization, and curation of metadata
- A holistic view of all the data across the business
- Ability to promote self-service for teams across all levels of business and technical capability
Some of these missing features are essential for a data cataloging solution to be a success for a business.
Data Catalog 3.0: The Modern Data Stack, Active Metadata, and DataOps
Atlan + dbt: Bridge the gap between your analytics engineers and business users
In addition to significant improvements on what dbt offers in data cataloging, Atlan provides value by adding state-of-the-art data governance, data lineage, search, and discovery features. This helps you get a 360° view of your data across the board.
Some popular capabilities amongst users include:
- Active data governance
- End-to-end column-level lineage
- dbt metrics as an asset with their own profiles
- Building documentation standards right from dbt
- Exposing dbt documentation to your entire team
- Bringing dbt context into other tools
- Enabling self-service in data consumers and producers
Let's look at some of Atlan's features in more detail.
Active data governance
Atlan's powerful data governance engine more than covers everything missing from the dbt data catalog. The extensive ownership and classification features help teams across the organization work with data more efficiently, considering all compliance and regulatory concerns. A unified permissions model allows you to integrate dbt with other data sources seamlessly.
End-to-end column-level lineage
To get complete context and a 360° view of your data, you need to see how it flows end-to-end, from one system to the next. While dbt provides table-level data lineage, that isn't enough for most business teams.
Atlan's comprehensive column-level data lineage feature allows you to track lineage at the finest level with as much context as possible. With a smooth and interactive user interface, you can easily find your way around data in any system across your organization.
A Demo of Atlan data catalog
dbt metrics as first-class citizens on Atlan
Atlan’s integration with the dbt Semantic Layer brings dbt’s rich metrics into the rest of the data stack. With this integration, company metrics are now a part of column-level lineage, spanning from data sources and data storage to transformation and BI tools.
Building documentation standards right from dbt
Atlan’s deep integration with dbt lets you create repeatable, metadata properties — table owners and verified tags — in your dbt models. It provides the base for sharing knowledge across your organization, by standardizing documentation for developers.
Expose dbt documentation to your entire team
With features like certification, freshness, relevance, and popularity, integrated with a Google search-like search interface, Atlan supercharges your data discovery experience, saving you precious time sorting through documentation and scrambling for information on email and chat.
Bring dbt context to your tools (reverse metadata)
Atlan's Chrome extension brings dbt metadata where you work. If you're in a BI dashboard, you don't have to go searching for context in dbt.
Enabling self-service in data consumers and producers
Atlan enables teams across the business to meet their data needs by self-service, a feature at the core of the modern data stack and its evolution to date. Capabilities such as intelligent automation, personalization, and custom metadata make Atlan more intuitive, flexible, and valuable to every team.
Creating a data asset like a data warehouse or a data lake isn't enough. You must ensure that the data is visible, discoverable, and understandable by business and technical teams. Data catalogs play a significant role in making that happen.
The dbt data catalog is a good built-in tool for dbt users with no additional cost or development effort, but if you want a comprehensive solution that gives you data governance, data lineage, and advanced search and discovery features, you should definitely give Atlan a try.
Atlan + dbt implementation: Related resources
- How to crawl dbt?
- What does Atlan crawl from dbt Core?
- What does Atlan crawl from dbt Cloud?
- How can I reuse my documentation from dbt?