dbt Data Contracts: Quick Primer With Notes on How to Enforce

Updated March 21st, 2024

Share this article

Besides transforming data in data warehouses and data lakes, dbt also contains rich

When applications read and write from tables in a relational database, they must comply with the table structure. The table structure you often see as an SQL DDL statement can be viewed as an automatically enforced contract between the application and the database.

The tools that enable data movement through data pipelines don’t always have a simple and integral option for enforcing a contract. Supporting the enforcement of such contracts is essential for data quality, governance, and accuracy. dbt supports the concept of data contracts using model contracts.


Modern data problems require modern solutions - Try Atlan, the data catalog of choice for forward-looking data teams! 👉 Book your demo today


This article will take you through how you can use model contracts to define and enforce schema (and even constraints, in some cases) while transforming data from one layer to another. You’ll learn how having data contracts in place works on the idea of failing fast and shifting left in data governance and how it benefits a business.

We’ll discuss the themes mentioned above under the following headings:

  • Understanding dbt model contracts
  • Enforcing a dbt model contract
  • Comparing dbt model contracts and dbt tests
  • Exploring external alternatives to dbt model contracts

Let’s get to it!


Table of contents

  1. Understanding dbt model contracts
  2. Enforcing a dbt model contract
  3. Comparing dbt model contracts and dbt tests
  4. Exploring external alternatives to dbt model contracts
  5. Conclusion
  6. Related reads

Understanding dbt model contracts

A dbt model is simply a templatized SQL query that transforms your data pipeline. The idea of having a data contract is to validate the shape and structure of the transformed data before it lands into the next layer. Model contracts provide a way to enforce contracts on these SQL-based dbt models (not available for Python-based dbt models, as of now).

Model contracts guarantee the shape of your data as it moves and transforms from one layer to the next. dbt currently supports three types of contracts:

Using these contracts, you can enforce certain data quality guarantees as part of your dbt DAG workflow. Having such guarantees helps immensely when you have downstream consumers of the output of your dbt model.

Let’s now take a look at how to enforce a model contract.


Enforcing a dbt model contract

Similar to how you define the structure of a table in a database, you can define the structure of the output of any dbt model. However, there’s a difference. In a database, you’ll use SQL to define the structure. In dbt, you’ll use a YAML file, as shown in the code snippet below:

models:
  - name: dim_users
    config:
      contract:
        enforced:truecolumns:
      - name: user_id
        data_type: int
        constraints:
          - type: primary_key
      - name: user_name
        data_type: string
        constraints:
          - type: not_null
      - name: created_at
        data_type: datetime
      ...

Once you create a model contract similar to the one shown in the snippet above, you can expect the shape of your dbt model output to be checked against the contract definition. If there’s a mismatch, the model run will not go through, and the data will not be loaded to the next layer.

If you have an xyz.yaml file that contains an enforceable model contract for a dbt model and an xyz.sql file, dbt will automatically run the model contract without any configuration changes to the workflow. Explicit configuration or DAG changes are only required when using a third-party tool for running tests and enforcing data contracts, which we’ll discuss towards the end of this article.


Comparing dbt model contracts and dbt tests

You can also understand a model contract as a specific type of test that is very limited in its scope. Technically, you can perform the same tasks as dbt model contracts with dbt tests, especially using pre-packaged test suites like dbt-expectations link. Still, by definition, a contract will remain a way to validate the output of a dbt model against a statically defined shape.

dbt model contracts are meant to be run before your model is built, i.e., to prevent data from loading if there’s an issue. dbt tests are meant to be run after your model has been built to check for data profiling, quality, and integrity issues with lower severity and a higher tolerance for error. To understand more about the differences between model contracts and tests, have a quick read-through of this discussion on the GitHub repository of the dbt-core project.

While dbt tests can be customized quite a bit, dbt model contracts currently have limited support. In addition to validating every column’s name and data type, model contracts can also validate various constraints (depending on the platform), such as uniqueness, primary key, etc. Moreover, model contracts are only built to compare bare data types and not their granular attributes, such as precision and size. Any further level of granularity or customization and it will be better for you to use customized tests.


Exploring external alternatives to dbt model contracts

API contracts have been around for several decades. Data contracts aren’t much different from API contracts. They use similar contract definition languages like JSON (and its variations), YAML, and Protobuf. You can use any of the established standards to define and enforce contracts on your dbt models, just not as part of dbt’s native functionality, which means you won’t be able to monitor the results of the validations against these contracts in dbt.

Nevertheless, you can still build these external data contract enforcers into your DAG and monitor the results either in the orchestrator-level logs or with a data observability tool. One such example is using the Great Expectations platform with its extensive list of expectations to build contracts in GX suites.

Integrating different tools to do different things gives you a siloed view of your data platform. To prevent that from happening, you need a platform that aggregates all this metadata and allows you to do more with it. You need a metadata activation platform like Atlan, which has the features and capabilities to bring all the metadata in one place and lets you automate and perform actions based on metadata changes.


Conclusion

This article covered the basics of defining and enforcing data contracts in dbt. The use of data contracts fast-tracks the move towards better data quality. Not only that, data contracts also allow for better data cataloging, discovery, and governance, allowing data users to have confidence in the reliability of the underlying data and the structures holding it.

All your data quality, profiling, and schema validation workloads are based on metadata, which can be integrated into and activated by Atlan, an active metadata platform, to enable your business to realize the real potential of using metadata to drive data workloads.

You can learn more about how dbt model contracts and how dbt integrates with Atlan on dbt’s and Atlan’s official documentation.



Share this article

[Website env: production]