DataHub Data Contracts: Here’s Everything You Need to Know in 2025
Share this article
As organizations scale, data contracts become essential to ensure consistent, high-quality data pipelines. Data contracts serve as agreements between data producers and consumers that outline clear standards for data quality, integrity, reliability, and accessibility.
See How Atlan Simplifies Data Governance – Start Product Tour
A DataHub data contract acts as a commitment to data quality in the DataHub ecosystem. Without this agreement, data teams risk facing inconsistencies, misaligned expectations, and broken pipelines—leading to unreliable insights and wasted resources.
In this article, we’ll explore how to create and manage DataHub data contracts effectively. We’ll also discuss external tools that centralize metadata, enabling you to automate and perform actions based on metadata changes.
Table of contents #
- DataHub data contracts: What are they and what are their key characteristics?
- How to create DataHub data contracts
- How to run DataHub data contracts
- DataHub data contracts: Exploring external alternatives
- Summing up
- DataHub data contracts: Related reads
DataHub data contracts: What are they and what are their key characteristics? #
DataHub data contracts are formal agreements between a data asset’s producer and consumer. They define expectations about schema, semantics, SLA (service level agreements), etc.
Also, read → Data contracts 101
In DataHub, data contracts include assertions–data quality checks–about a data asset’s schema, freshness, and other aspects of data quality.
“Our vision of data contracts is a bundle of verifiable assertions on physical data assets representing a public producer commitment.” - DataHub
Note: In DataHub, an assertion is a data quality test that ensures data complies with specified rules, forming the foundation for verifying data contracts
The three defining characteristics of DataHub data contracts are:
- Verifiable: Based on actual data and not metadata, and that includes schema checks, column-level checks, or SLAs (but excludes documentation, ownership, or tags)
- Built on assertions: Includes specific checks like schema, freshness, volume, or custom column-level rules to evaluate the contract’s status.
- Producer-oriented: Each contract is tied to one physical data asset and managed by its producer.
As a result, DataHub data contracts help in streamlining data quality management. So, all stakeholders, from data engineers to business analysts, can trust the data they work with, knowing it meets predefined standards.
How to create DataHub data contracts #
Creating a data contract in DataHub involves defining schemas and setting validation rules. There are three ways to create DataHub data contracts:
- DataHub CLI (YAML)
- APIs
- DataHub UI
Let’s explore each alternative further.
1. Using the DataHub CLI #
Imagine your organization needs to establish a data contract for a “Customer” dataset. This dataset powers downstream analytics for customer segmentation.
This requires defining fields like Customer_ID
(string), Email
(string), Account_Status
(as enum), and Last_Updated
(timestamp). Here’s an example for defining a data contract using the CLI.
{
"schema": {
"fields": [
{
"name": "Customer_ID",
"type": "string",
"required": true,
"unique": true
},
{
"name": "Email",
"type": "string",
"required": true,
"format": "email"
},
{
"name": "Account_Status",
"type": "enum",
"values": ["Active", "Suspended", "Closed"],
"required": true
},
{
"name": "Last_Updated",
"type": "timestamp",
"required": true
}
]
}
}
To create the data contract in DataHub, you can run the following command:
datahub datacontract upsert -f contract_definition.yml
2. Using APIs #
Currently, DataHub integrates with dbt test and Great Expectations, making it easier to ingest metadata from these tools.
For all other tools, you can create and run your assertions outside of DataHub using third party tools and then publish the assertion results to DataHub using APIs.
3. Using DataHub UI #
The DataHub UI for creating data contracts is designed to make defining and managing data contracts straightforward and intuitive. You can specify schema requirements, define data quality rules (i.e., assertions), assign ownership, monitor contract status, create reusable contract templates, and more.
Note: Before creating assertions using DataHub UI, you must configure the assertions for freshness, schema, and data quality. These assertions are automatically available for the Acryl Observe module of DataHub Cloud. However, if you’re using the self-hosted DataHub account, then you must set these up yourself using a third-party tool.
Once you’ve configured the DataHub UI and set up the required assertions, here’s how you create a data contract:
- Navigate to the Dataset Profile for a specific dataset. Under the Validation tab, you can go to Data Contract and click Create.
- Next, select the assertions to be included in the contract and click Save.
- You’ll be able to see the contract in the UI. You can add further details to give more context about the contract, such as ownership details, tags, glossary terms, etc. Once done, you can save and activate the contract.
How to run DataHub data contracts #
There are three ways to run DataHub data contracts:
- If you are a DataHub Cloud customer and have access to Acryl Observe, you can schedule assertions on Datahub itself.
- You can use dbt Test and Great Expectations to run your data quality checks and then ingest them into DataHub.
With dbt Test, DataHub ingests the dbt run_results file, which contains the dbt test run results, and then translates it into assertion runs.
Meanwhile, for Great Expectations, DataHub recommends integrating the DataHubValidationAction directly into your Great Expectations Checkpoint. This helps in ingesting the assertion results into DataHub.
- You can use other third-party tools to run the data quality checks and publish them into DataHub using APIs.
Once your data contract is active, you can monitor it using the DataHub UI. You can see the results of the assertion checks, track data quality check history, and review other details about the data contract from the UI.
Additionally, DataHub also lets you get notifications via Slack messages (DMs or to a team channel), provided you’ve signed up for Acryl Observe.
DataHub data contracts: Exploring external alternatives #
While DataHub data contracts are powerful and tightly integrated with DataHub’s ecosystem, certain organizational needs may require external alternatives to complement or extend their capabilities.
For instance, most organizations often use a variety of tools for data processing, storage, and analysis (Snowflake, dbt, Databricks). An external solution can act as a unified control plane for data and metadata across all tools, ensuring consistency beyond DataHub.
Specifically, it can offer native integrations with Anomalo, Monte Carlo, Soda and other data quality tools, thereby empowering you to conduct granular testing for complex metrics. As a result, you can run more specialized testing frameworks or advanced validation capabilities tailored to specific industries or use cases.
Additionally, an external alternative would support a broader governance framework, enabling unified management of data contracts, policies, and quality metrics.
Also, read → Atlan vs. DataHub: Which tool offers better governance features?
By complementing DataHub with an external alternative that serves as a unified control plane, you can enhance the effectiveness of data contracts across your entire data estate.
Summing up #
Data contracts are vital for maintaining reliable, high-quality data pipelines. DataHub data contracts provide a solid foundation with schema validation, data quality checks, and producer accountability.
To extend these capabilities further, consider adopting a metadata activation platform like Atlan. By complementing DataHub with Atlan, you can automate governance, enhance collaboration, and create a unified framework for managing data assets and contracts.
You can learn more about creating and running data contracts using Atlan by exploring Atlan’s official documentation.
DataHub data contracts: Related reads #
- DataHub: LinkedIn’s Open-Source Tool for Data Discovery, Catalog, and Metadata Management
- DataHub Set Up Tutorial: A Step-by-Step Installation Guide Using Docker
- LinkedIn DataHub Demo: Explore DataHub in a Pre-configured Sandbox Environment
- Amundsen vs. DataHub: Which Data Discovery Tool Should You Choose?
- Atlan vs. DataHub: Which Tool Offers Better Collaboration and Governance Features?
- OpenMetadata vs. DataHub: Compare Architecture, Capabilities, Integrations & More
- Data Contracts Open Questions You Need to Ask
- dbt Data Contracts: Best Practices for Data Reliability
Share this article