dbt Data Contracts: Quick Primer With Notes on How to Enforce
Share this article
Besides transforming data in data warehouses and data lakes, dbt also contains rich
When applications read and write from tables in a relational database, they must comply with the table structure. The table structure you often see as an SQL DDL statement can be viewed as an automatically enforced contract between the application and the database.
The tools that enable data movement through data pipelines don’t always have a simple and integral option for enforcing a contract. Supporting the enforcement of such contracts is essential for data quality, governance, and accuracy. dbt supports the concept of data contracts using model contracts.
Modern data problems require modern solutions - Try Atlan, the data catalog of choice for forward-looking data teams! 👉 Book your demo today
This article will take you through how you can use model contracts to define and enforce schema (and even constraints, in some cases) while transforming data from one layer to another. You’ll learn how having data contracts in place works on the idea of failing fast and shifting left in data governance and how it benefits a business.
We’ll discuss the themes mentioned above under the following headings:
- Understanding dbt model contracts
- Enforcing a dbt model contract
- Comparing dbt model contracts and dbt tests
- Exploring external alternatives to dbt model contracts
Let’s get to it!
Table of contents #
- Understanding dbt model contracts
- Enforcing a dbt model contract
- Comparing dbt model contracts and dbt tests
- Exploring external alternatives to dbt model contracts
- Conclusion
- Related reads
Understanding dbt model contracts #
A dbt model is simply a templatized SQL query that transforms your data pipeline. The idea of having a data contract is to validate the shape and structure of the transformed data before it lands into the next layer. Model contracts provide a way to enforce contracts on these SQL-based dbt models (not available for Python-based dbt models, as of now).
Model contracts guarantee the shape of your data as it moves and transforms from one layer to the next. dbt currently supports three types of contracts:
- SQL models, i.e., the most basic component of a dbt DAG that contains logic for transforming data.
- dbt enhancements like Materializations and Incremental models
- Platform constraints to check for uniqueness, primary keys, foreign keys, etc., on platforms like Spark, Snowflake, Redshift, PostgreSQL, and more.
Using these contracts, you can enforce certain data quality guarantees as part of your dbt DAG workflow. Having such guarantees helps immensely when you have downstream consumers of the output of your dbt model.
Let’s now take a look at how to enforce a model contract.
Enforcing a dbt model contract #
Similar to how you define the structure of a table in a database, you can define the structure of the output of any dbt model. However, there’s a difference. In a database, you’ll use SQL to define the structure. In dbt, you’ll use a YAML file, as shown in the code snippet below:
models:
- name: dim_users
config:
contract:
enforced:truecolumns:
- name: user_id
data_type: int
constraints:
- type: primary_key
- name: user_name
data_type: string
constraints:
- type: not_null
- name: created_at
data_type: datetime
...
Once you create a model contract similar to the one shown in the snippet above, you can expect the shape of your dbt model output to be checked against the contract definition. If there’s a mismatch, the model run will not go through, and the data will not be loaded to the next layer.
If you have an xyz.yaml
file that contains an enforceable model contract for a dbt model and an xyz.sql
file, dbt will automatically run the model contract without any configuration changes to the workflow. Explicit configuration or DAG changes are only required when using a third-party tool for running tests and enforcing data contracts, which we’ll discuss towards the end of this article.
Comparing dbt model contracts and dbt tests #
You can also understand a model contract as a specific type of test that is very limited in its scope. Technically, you can perform the same tasks as dbt model contracts with dbt tests, especially using pre-packaged test suites like dbt-expectations
link. Still, by definition, a contract will remain a way to validate the output of a dbt model against a statically defined shape.
dbt model contracts are meant to be run before your model is built, i.e., to prevent data from loading if there’s an issue. dbt tests are meant to be run after your model has been built to check for data profiling, quality, and integrity issues with lower severity and a higher tolerance for error. To understand more about the differences between model contracts and tests, have a quick read-through of this discussion on the GitHub repository of the dbt-core
project.
While dbt tests can be customized quite a bit, dbt model contracts currently have limited support. In addition to validating every column’s name and data type, model contracts can also validate various constraints (depending on the platform), such as uniqueness, primary key, etc. Moreover, model contracts are only built to compare bare data types and not their granular attributes, such as precision and size. Any further level of granularity or customization and it will be better for you to use customized tests.
Exploring external alternatives to dbt model contracts #
API contracts have been around for several decades. Data contracts aren’t much different from API contracts. They use similar contract definition languages like JSON (and its variations), YAML, and Protobuf. You can use any of the established standards to define and enforce contracts on your dbt models, just not as part of dbt’s native functionality, which means you won’t be able to monitor the results of the validations against these contracts in dbt.
Nevertheless, you can still build these external data contract enforcers into your DAG and monitor the results either in the orchestrator-level logs or with a data observability tool. One such example is using the Great Expectations platform with its extensive list of expectations to build contracts in GX suites.
Integrating different tools to do different things gives you a siloed view of your data platform. To prevent that from happening, you need a platform that aggregates all this metadata and allows you to do more with it. You need a metadata activation platform like Atlan, which has the features and capabilities to bring all the metadata in one place and lets you automate and perform actions based on metadata changes.
Conclusion #
This article covered the basics of defining and enforcing data contracts in dbt. The use of data contracts fast-tracks the move towards better data quality. Not only that, data contracts also allow for better data cataloging, discovery, and governance, allowing data users to have confidence in the reliability of the underlying data and the structures holding it.
All your data quality, profiling, and schema validation workloads are based on metadata, which can be integrated into and activated by Atlan, an active metadata platform, to enable your business to realize the real potential of using metadata to drive data workloads.
You can learn more about how dbt model contracts and how dbt integrates with Atlan on dbt’s and Atlan’s official documentation.
dbt Data contracts: Related reads #
- dbt Data Catalog: Discussing Native Features Plus Potential to Level Up Collaboration and Governance with Atlan
- dbt Data Governance: How to Enable Active Data Governance for Your dbt Assets
- dbt Data Lineage: How It Works and How to Leverage It
- dbt Metadata Management: How to Bring Active Metadata to dbt
- Data contracts: What Are They? & How To Implement One?
- Data Contracts Open Questions You Need to Ask
- Snowflake + dbt: Supercharge your transformation workloads
Share this article