Types of Data Lineage: Understand All Ways to View Your Data

June 27th, 2022

header image for Types of Data Lineage: Understand All Ways to View Your Data

Common Types of Data Lineage

  • Descriptive data lineage
  • Automated data lineage
  • Design lineage
  • Business lineage
  • Operational lineage

Different types of data lineage exist because there are multiple questions you may want to ask about data and multiple stakeholders who can benefit from data lineage visibility.

For example, a lineage view that solves for compliance may not essentially be the same one that solves for root-cause analysis or quality. So, how do we start thinking about the types of data lineage? Start with understanding how they're classified.

Data lineage can be classified based on:

  1. The way lineage is documented
  2. The techniques used to derive the lineage
  3. The requirements of the stakeholders who work with the lineage.

Here we discuss multiple types of data lineage – e.g., descriptive, automated, design, operational, etc. –  and the importance of each.


Classification of Data Lineage and Associated Types

  1. Lineage based on the method of documentation
    • Descriptive data lineage
    • Automated data lineage
  2. Lineage based on the choice of technique
    • Design lineage
    • Business lineage
    • Operational lineage
  3. Lineage based on persona-specific use cases
    • Business data lineage
    • Technical and design data lineage
  4. Data provenance

To understand these different types of data lineage, consider the example of a report that tracks the performance of a marketing campaign. The report is updated each week and records data about ad spending and user engagement, which the business, in turn, uses to measure marketing ROI.

Over the course of its lifetime, this data source may undergo a number of transformations. Not only is the report updated on a weekly basis, but information about the most recent sales is also appended periodically; and after the campaign ends, the report may be exported into a data warehouse for longer-term storage. These changes form the basis for tracking the data’s lineage.

Depending on how and why you decide to trace that lineage, you may end up with a different type of lineage.

Data Lineage Based on Method of Documentation

As noted in this resource by lineage expert Irina Steenbeek, we can have two types of data lineage from the perspective of how the lineage is documented: Descriptive data lineage and automated data lineage.

Descriptive data lineage

A descriptive data lineage is one that is generated manually. In the context of the marketing report described above, a descriptive data lineage could be a Word document or text file that records information about how the report was updated over time and how its contents were later exported to a data warehouse.

Automated data lineage

Alternatively, you could create an automated data lineage based on the report. This type of data lineage would be generated by data lineage tools that automatically trace the report’s changes and transformations over the course of its lifecycle, then make that information available including data change details, along with visualizations to help stakeholders understand how the data changed.


Lineages Based on Choice of Technique

Mandy Chessell notes that you can also classify types of data lineage basis the techniques used to generate them. The three main types include design lineage, business lineage, and operational lineage.

Design lineage

Design lineage focuses on identifying the data sources and flows that result in a given data state. For the marketing report, a design lineage would record details about which data sources form the report, how new data was appended to the report each week, and how the report contents are moved between different reporting systems.

Business lineage

A business lineage describes the origins and evolution of data in terms of business information. Instead of showing every component of each data flow, it filters and focuses on those that have direct business relevance – such as the source of data about ad spending, user engagement, and conversions. While this is similar to design lineage in some respects, the main difference is that business lineage focuses on helping make business-focused decisions – instead of design decisions about how to acquire and process information.

Screenshot illustrating business lineage

Business lineage helps in data discovery and verifying the integrity of data. Source: Atlan


Operational lineage

Operational lineage describes data movement and transformation based on which technical operations take place.

Technical lineage helps track data at deeper levels of granularity: Systems (database, applications, services), APIs, transformations, SQL queries, and table columns. Technical lineage helps with root cause analysis, debugging pipeline issues, guiding testing, and refactoring.

Screenshot showing operational lineage

Operational data lineage helps debugging issues, guiding testing, and refactoring pipelines. Source: Atlan


None of these types of data lineage is better or worse than the others. Instead, think of them as serving different purposes and offering different types of information.


Lineages Based on Persona-Specific Use Cases

Previously referred resource by Irinia Steenbeek also talks about a third way of classifying data lineage - it is to think in terms of who uses them.

This approach is similar to categorizing data lineage types based on the lineage generation technique because different techniques align with different use cases.

In general, there are two main types of personas – and, by extension, two types of data lineage – to consider here:

Business data lineage

If the data consumers are non-technical business users whose main goal is to understand how data impacts the business, you’ll typically produce a business lineage. As noted above, business data lineage avoids technical details and focuses on enabling easy data discovery, verifying the freshness and integrity of the data, following data flow into BI dashboards, and tracking changes to data and its downstream impact. These are instances of information that matter to business stakeholders, as opposed to technical teams.

Technical and design data lineage

In contrast, technical stakeholders, such as IT engineers and data scientists, are typically more interested in the technical and operational details of the data lineage. Technical lineage helps identify where the data originated from (systems, processes, datasets APIs), and where it’s used (BI/reporting, ML datasets).

This helps data architects to build better pipeline designs, understand dependencies, optimize ETL jobs, and ensure compliance with regulatory requirements related to data processing.

Because most businesses include both business-centric and technical stakeholders, you’ll usually need to produce both these types of data lineage tailored to multiple types of personas.

Business Lineage vs. Technical Lineage

The main difference between technical lineage and business lineage is that business lineage focuses on aspects of data origins and processing that affect business priorities, such as which business unit produced, consumed, or updated data. In contrast, technical lineage trace data lifecycles based on technical operations, like ETL logs, root cause analysis, impact analysis, and pipeline workflows.


[Download ebook] → Rethinking Data Governance for the Modern Data Stack


What Is Data Provenance?

It’s difficult to discuss types of data lineage without also thinking about data provenance.

Data provenance is information about the original source of data, such as who created the data, when it was created, and why it was created.

Data provenance details may be included as metadata that accompanies a file, database, or other data sources — the source of data, data types, size of data, version ids, and transformation steps.

Data Lineage vs. Data Provenance

Data provenance identifies the origins of data. In contrast, data lineage records the complete journey undertaken by data to arrive at its present form. Data provenance is therefore one component of data lineage. But it’s not the only component.

For example, for the marketing report described above, the data lineage would include full details about where the report originated, as well as how data in it was expanded over time and later exported to a database. But the report’s data provenance would detail only the report’s original creation. It would lack information about data appendages or the export operations that moved the data to a data warehouse.

Conclusion

Different types of data lineages serve different purposes, and they are generated in different ways. In many cases, businesses will need to leverage a variety of data lineage types and generation techniques to get the most value out of their data assets.

If you are evaluating a data lineage solution for your data stack do take Atlan for a spin. Atlan makes data lineage effortless, it provides an all-new way to experience data lineage — that's smooth, intuitive, and interactive.


Ebook cover - metadata catalog primer

Everything you need to know about modern data catalogs

Adopting a modern data catalog is the first step towards data discovery. In this guide, we explore the evolution of the data management ecosystem, the challenges created by traditional data catalog solutions, and what an ideal, modern-day data catalog should look like. Download now!