dbt Data Catalog: Discussing Native Features Plus Potential to Level Up Collaboration and Governance with Atlan
Last updated on: March 27th, 2023, Published on: October 21st, 2022.
Share this article
dbt is a popular tool that helps analysts and engineers transform data in their warehouses more effectively. A data catalog for dbt can help you gather enough metadata and trust signals on your dbt models.
This can help downstream consumers discover, trust, and understand the data assets that are being prepared for them.
Active data catalogs go a step further and can even empower dbt users to work on dbt models with confidence. They provide full visibility of how any change impacts tables or dashboards downstream.
In this article, we will discuss dbt’s native data cataloging capabilities and provide an overview of additional capabilities that you should consider bringing into your stack to achieve comprehensive cataloging & governance of dbt assets.
First, let us appreciate the importance of having a dbt data catalog
Table of contents
- Importance of cataloging dbt assets
- Native capabilities of dbt data catalog
- Capabilities to consider while deploying a dbt data catalog
- Atlan + dbt: Bridge the gap between your analytics engineers and business users
- The bottom line on dbt data catalog
- Atlan + dbt implementation: Related Resources
Importance of cataloging dbt assets
dbt, like many other tools in the modern data stack, was born out of a need to standardize model implementation for large-scale transformation workloads by better organization and templatization. However, to fully leverage dbt’s potential, the cataloging of assets is key.
Some reasons why data cataloging for dbt is non-negotiable:
Efficient documentation: With smaller, distributed, and autonomous teams, there’s a need for clarity on business definitions across the organization to ensure seamless collaboration. Efficient documentation is of the utmost importance in this scenario.
Accessible and discoverable documentation: In the world of data, efficient documentation isn’t just plain old text; it must be interactive. Data users should be able to explore data sources, entities, relationships, models, constraints, lineages, etc., which are searchable and discoverable via a data catalog.
Automatic capturing of lineage: A data catalog solution for dbt can automatically capture data lineage - which can help analysts and engineers understand the impact of any change and ensure that downstream consumers can trust and understand the data assets being prepared for them. This can help prevent errors and inconsistencies, increase efficiency, and ultimately improve the quality of the data produced.
dbt, with the workloads it encourages and supports, needs such a data catalog. So, where do we get started? Let’s first look at dbt’s native catalog capabilities.
A Guide to Building a Business Case for a Data Catalog
Download free ebook
Native capabilities of dbt data catalog
dbt offers you a way to generate data model documentation automatically. dbt publishes this documentation on a static website by default. You can get the static documentation up and running with a simple
dbs docs serve command.
This documentation provides you with the following features:
- Search and discovery for all data models in the dbt project
- Relationships between different entities and data models
- Finer details about columns, their types, allowed values, etc.
- The SQL script for any given entity or data model
- Table-level lineage for all data models in the dbt project
Let’s understand some of these capabilities in detail:
Column-level details in a data model
With the dbt data catalog, you get a descriptive tabular representation of all the database object columns with details, such as column description, data type, column-level tests, allowed values, and more.
For instance, here’s the
Columns section for the
orders table in the
Check out how the allowed values are enriched with a description so that anyone reading through the documentation can clearly understand the column’s purpose.
Search and discovery
Menu-based navigation is handy, but it still leaves room for many use cases where business users unaware of the database schema or the project structure find it challenging to search for what they’re after.
Fortunately, dbt allows you to perform a full-text search on everything in the catalog using the search bar located at the top of the page, as shown in the image below:
The search feature also lets you search through specific subsets of information, such as names, descriptions, and tags.
Table-level data lineage
Finally, the lineage graph is another great feature of the dbt data catalog. You’ll see a round-shaped, green-colored icon on the bottom right corner of the documentation page.
When pressed, this icon opens a full-size pop-up window with a lineage diagram describing the data transformation journey of your data, as shown in the image below.
The data lineage visualization builds upon the
depends on and
referenced by fields that dbt maintains for every data model based on your transformation workflow. You can customize the lineage graph by selecting the resources you want it to use to calculate the lineage; as shown in the image below.
dbt’s data catalog covers quite a bit of ground to make it worthwhile for developers and some data teams, especially dbt developers. However, business teams might need more information about the data they’re using.
In the next section, let’s explore some better cataloging opportunities that aren’t possible with the dbt data catalog.
Capabilities to consider while deploying a dbt data catalog
dbt’s native data catalog ticks off some necessary features, such as search and discovery, table-level lineage, and metadata management.
It doesn’t, however, touch upon the critical areas, such as data governance, which is why, despite being a great addition to dbt, it warrants the need for a comprehensive data cataloging solution.
Here are some other capabilities that you must look for while deploying a cataloging solution for your dbt assets:
- Data governance features, especially classification, tagging, and criticality
- Finer data lineage features, such as column-level lineage with custom metadata
- Automation of data cataloging AI-powered workflows, personalization, and curation of metadata
- Ability to integrate across your data stack - right from the source layer to the consumption layer
- Bi-directional flow of metadata between your dbt data catalog and other preferred tools in your dbt workflow
- UI/UX that’s intuitive and encourages adoption across technical and non-technical data practitioners
Some of these missing features are essential for a data cataloging solution to drive business-critical outcomes.
Data Catalog 3.0: The Modern Data Stack, Active Metadata, and DataOps
Atlan + dbt: Bridge the gap between your analytics engineers and business users
In addition to significant improvements on what dbt offers in data cataloging, Atlan provides value by adding state-of-the-art data governance, data lineage, search, and discovery features. This helps you get a 360° view of your data across the board.
Some popular capabilities amongst users include:
- Active data governance
- End-to-end column-level lineage
- dbt metrics as an asset with their own profiles
- Embedded impact analysis in GitHub
- Building documentation standards right from dbt
- Exposing dbt documentation to your entire team
- Bringing dbt context into other tools
- Enabling self-service in data consumers and producers
Let’s look at some of Atlan’s features in more detail.
Active data governance
Atlan’s powerful data governance engine more than covers everything missing from the dbt data catalog. The extensive ownership and classification features help teams across the organization work with data more efficiently, considering all compliance and regulatory concerns. A unified permissions model allows you to integrate dbt with other data sources seamlessly.
End-to-end column-level lineage
To get complete context and a 360° view of your data, you need to see how it flows end-to-end, from one system to the next. While dbt provides table-level data lineage, that isn’t enough for most business teams.
Atlan’s comprehensive column-level data lineage feature allows you to track lineage at the finest level with as much context as possible. With a smooth and interactive user interface, you can easily find your way around data in any system across your organization.
Embedded impact analysis in GitHub
Atlan’s integration with GitHub solves collaboration challenges faced by data engineers who lack visibility around downstream usage of data assets and scramble to account for every change.
This integration enables data governance to move closer to the data creation process by helping data engineers understand how changes in a dbt model impact upstream assets.
With Atlan and GitHub, data engineers can identify and collaborate with asset owners, and changes impacting high-value assets can be approved or disapproved by stakeholders. Atlan brings lineage to GitHub, making it easy to see the impact of changes made to important data pipelines. Whenever someone opens a pull request to change a dbt model, the Atlan-GitHub action automatically creates a list of all downstream assets that will be impacted.
dbt metrics as first-class citizens on Atlan
Atlan’s integration with the dbt Semantic Layer brings dbt’s rich metrics into the rest of the data stack. With this integration, company metrics are now part of a column-level lineage, spanning from data sources and data storage to transformation and BI tools.
Understand in detail how joint users benefit from this integration.
Building documentation standards right from dbt
Atlan’s deep integration with dbt lets you create repeatable, metadata properties — table owners and verified tags — in your dbt models. It provides the base for sharing knowledge across your organization, by standardizing documentation for developers.
Expose dbt documentation to your entire team
With features like certification, freshness, relevance, and popularity, integrated with a Google search-like search interface, Atlan supercharges your data discovery experience, saving you precious time sorting through documentation and scrambling for information on email and chat.
Bring dbt context to your tools (reverse metadata)
Atlan’s Chrome extension brings dbt metadata where you work. If you’re in a BI dashboard, you don’t have to go searching for context in dbt.
Enabling self-service in data consumers and producers
Atlan enables teams across the business to meet their data needs by self-service, a feature at the core of the modern data stack and its evolution to date. Capabilities such as intelligent automation, personalization, and custom metadata make Atlan more intuitive, flexible, and valuable to every team.
The bottom line on dbt data catalog
Creating a data asset like a data warehouse or a data lake isn’t enough. You must ensure that the data is visible, discoverable, and understandable by business and technical teams. Data catalogs play a significant role in making that happen.
The dbt data catalog is a good built-in tool for dbt users with no additional cost or development effort, but if you want a comprehensive solution that gives you data governance, data lineage, and advanced search and discovery features, you should definitely give Atlan a try.
Atlan + dbt implementation: Related Resources
- How to crawl dbt?
- What does Atlan crawl from dbt Core?
- What does Atlan crawl from dbt Cloud?
- How to add impact analysis in GitHub?
- How can I reuse my documentation from dbt?
- Data Catalog: The Must-Have Tool for Data Leaders in 2023
- Best Alation Alternative: 5 Reasons Why Customers Choose Atlan
- AI Data Catalog: Exploring the Possibilities That Artificial Intelligence Brings to Your Metadata Applications & Data Interactions
- Best Collibra Alternative — 8 Reasons Why Future-Focused Data Teams Are Choosing Atlan
Share this article