How to Track Lineage for Databricks Jobs in Unity Catalog

Quick answer: 5-Step Databricks Job Lineage Tracking Checklist

Here's how to configure job lineage tracking in 15-30 minutes:

1. Enable Unity Catalog in your workspace (one-time, 10-15 minutes)
2. Register tables in Unity Catalog metastore (varies by data volume)
3. Verify service principal has BROWSE permissions (2-3 minutes)
4. Run jobs that read/write Unity Catalog tables (automatic capture)
5. View job lineage in Catalog Explorer or query system tables (instant)

Without job lineage visibility, your teams waste hours tracking down which jobs broke when an upstream table changed. Unity Catalog solves this by automatically capturing job lineage when jobs read from or write to tables registered in the metastore. This captures most job dependencies automatically versus minimal coverage through manual documentation, reducing impact analysis from hours to minutes.

What is job lineage in Databricks and why does it matter?

Job lineage shows which Databricks jobs read from or write to specific tables. Unity Catalog captures this automatically at runtime, recording metadata including job_id, job_run_id, and notebook_id. You can trace downstream impact when jobs fail, understand data freshness across pipelines, and support compliance audits with complete data flow documentation.

Without job lineage visibility, you waste hours when a table schema change breaks downstream processes. You have to manually check job configurations, review error logs, and trace dependencies through tribal knowledge or outdated documentation. Organizations with robust lineage see measurably faster issue resolution compared to teams relying on manual tracking methods.

Unity Catalog solves this by tracking different levels of lineage automatically:

Lineage Type	What It Tracks	Use Case	Captured Automatically?
Table Lineage	Which tables read from or write to other tables	Impact analysis when tables change	Yes, for Unity Catalog tables
Column Lineage	Which columns feed into specific columns downstream	Compliance audits, PII tracking	Yes, for most operations (DBR 13.3+ for DLT)
Job Lineage	Which scheduled jobs consume or produce tables	Troubleshooting job failures, cost analysis	Yes, includes job_id and job_run_id
Notebook Lineage	Which ad-hoc notebook queries access tables	Development audits, usage tracking	Yes, includes notebook_id
Dashboard Lineage	Which BI dashboards query tables	Understanding report dependencies	Yes, for Databricks SQL dashboards

Here’s a practical scenario: You update a table schema in your bronze layer. Without lineage, you spend 2-3 hours reviewing job logs to find which downstream jobs broke. With job lineage, you click the Lineage tab in Catalog Explorer, see three downstream jobs consuming that table, and identify the failures in under a minute. This matters when you’re troubleshooting production incidents under time pressure.

Prerequisites: What you need before tracking job lineage

Before tracking job lineage, you need Unity Catalog enabled and tables registered in a Unity Catalog metastore. Databricks Runtime version requirements vary by feature: 11.3 LTS or above for streaming lineage, 13.3 LTS or above for column-level lineage on Lakeflow pipelines. You’ll also need BROWSE privilege on catalogs and must use Spark DataFrame or Databricks SQL interfaces for automatic capture.

Make sure you have:

Unity Catalog-enabled workspace (Premium tier required)
Tables registered in Unity Catalog metastore (not external tables referenced by path only)
Databricks Runtime 11.3 LTS or above for streaming lineage
Databricks Runtime 13.3 LTS or above for column-level lineage on Lakeflow Spark Declarative Pipelines
Service principal or user with BROWSE privilege on catalogs where you want to view lineage
Jobs using Spark DataFrame or Databricks SQL interfaces (JDBC queries don’t generate lineage automatically)

Unity Catalog captures lineage for all languages—Python, SQL, Scala, and R—without requiring code changes. Once you meet these prerequisites, lineage starts capturing automatically as soon as jobs run against Unity Catalog tables.

Step 1: How do you view job lineage in Catalog Explorer? (2-3 minutes)

View job lineage by navigating to a table’s Lineage tab in Catalog Explorer and selecting Jobs → Downstream. Job names appear under “Job Name” as consumers of the table. Click any job name to see details in the Job run panel, including job_id, last run time, and associated notebook. This shows you which jobs read from the table instantly.

Navigate to Catalog Explorer

Click “Catalog” in your workspace sidebar
Search for your table or browse through catalogs
Select the table to open its details page

View consuming jobs

Click the “Lineage” tab (shows related tables by default)
Click “Jobs” then “Downstream” to filter for consuming jobs
You’ll see a list of all jobs reading from this table with names and IDs

Each job entry shows the job name, job_id, and the timestamp of the last run. Click any job name to open the Job run panel with complete execution details. This panel includes the notebook associated with the job, runtime duration, and the user or service principal that triggered the run.

Visualize the lineage graph

Click “See Lineage Graph” to open a visual representation showing your table connected to consuming jobs. Click the arrows between nodes to open the Lineage connection panel, which displays the entity_metadata including job_info details. This graph view helps you understand the complete data flow from source tables through jobs to downstream tables.

See Cross-Platform Lineage in Action →

Step 2: How do you query job lineage programmatically? (5-10 minutes)

Query job lineage programmatically using the system.access.table_lineage system table, which includes an entity_metadata.job_info field with job_id and job_run_id for every job-related lineage event. Each read or write operation creates a lineage record that you can join with system.query.history for additional query details like execution duration and user information. Lineage data is retained for one year.

The table_lineage system table contains these key fields:

Field Name	Data Type	Description	Example Value
source_table_full_name	STRING	Full name of source table (catalog.schema.table)	`main.bronze.customer_events`
target_table_full_name	STRING	Full name of target table	`main.silver.customer_summary`
entity_metadata.job_info	STRUCT	Job metadata including job_id and job_run_id	`{job_id: 12345, job_run_id: 67890}`
event_time	TIMESTAMP	When lineage was captured	`2024-01-30 14:23:15`
event_date	DATE	Partition field for query performance	`2024-01-30`
workspace_id	BIGINT	Workspace where query ran	`8234567890123456`

Find all jobs consuming a specific table

SELECT 
  source_table_full_name,
  entity_metadata.job_info.job_id,
  entity_metadata.job_info.job_run_id,
  entity_metadata.notebook_id,
  event_time
FROM system.access.table_lineage
WHERE source_table_full_name = 'main.bronze.customer_events'
  AND entity_metadata.job_info IS NOT NULL
ORDER BY event_time DESC;

This query returns all jobs that have read from the customer_events table, with their job IDs and the times they accessed the data. You can filter by event_date to limit results to a specific time range for faster query performance.

Find all tables a specific job reads or writes

SELECT 
  source_table_full_name AS input_table,
  target_table_full_name AS output_table,
  event_time
FROM system.access.table_lineage
WHERE entity_metadata.job_info.job_id = 12345
  AND event_date >= CURRENT_DATE() - INTERVAL 30 DAYS
ORDER BY event_time DESC;

System tables store lineage for queries across all workspaces attached to the same metastore. This means a single query can reveal cross-workspace dependencies if your organization uses multiple workspaces sharing a metastore.

Join with query history for execution details

SELECT 
  tl.source_table_full_name,
  tl.entity_metadata.job_info.job_id,
  qh.statement_text,
  qh.execution_duration_ms,
  qh.user_name
FROM system.access.table_lineage tl
JOIN system.query.history qh
  ON tl.statement_id = qh.statement_id
WHERE tl.source_table_full_name = 'main.bronze.customer_events'
  AND tl.entity_metadata.job_info IS NOT NULL
  AND tl.event_date >= CURRENT_DATE() - INTERVAL 7 DAYS;

This combined query gives you both the lineage relationships and the query performance characteristics. You can use this for compliance audits that require documentation of who accessed data, when, and how long queries took.

Step 3: How do you view job lineage from the Jobs UI? (1-2 minutes)

Access job lineage from the Jobs UI by opening the Job details panel and clicking the upstream and downstream tables link. Unity Catalog-enabled workflows show lineage counts in Job details, Job run details, and Task run details panels. Click any table name to open Catalog Explorer with the full lineage graph showing all relationships.

Navigate from a job to its data dependencies

Click “Jobs & Pipelines” in your workspace sidebar
Select your job from the list to open Job details
Look for the “upstream and downstream tables” link with a count of tables
Click the link to see a list of all tables this job reads from or writes to

The same link appears in Job run details (for a specific execution) and Task run details (for individual tasks within a job). This gives you flexibility to check lineage at different granularity levels depending on whether you’re troubleshooting a specific run or analyzing job behavior over time.

See the complete lineage graph

When you click a table name in the lineage list, Databricks opens that table in Catalog Explorer. From there, you can use the Lineage tab to see not just this job’s relationship to the table, but all jobs and tables in the complete data flow. This reverse perspective—starting from the job instead of the table—helps when you’re asking “What data does this job depend on?” rather than “What jobs use this table?”

Step 4: How do you track lineage for jobs across multiple workspaces? (Advanced)

Track cross-workspace job lineage when multiple workspaces share the same Unity Catalog metastore, as lineage aggregates across all attached workspaces automatically. Lineage from one workspace becomes visible in any workspace sharing the metastore, though workspace-level objects like notebooks show masked details in other workspaces. You need BROWSE permissions on tables across workspaces to view their lineage relationships.

When your organization uses multiple workspaces, Unity Catalog provides a centralized view of data consumption patterns. A data engineering team in Workspace A can see that jobs in Workspace B consume their bronze tables, even without access to Workspace B itself. This matters for governance because you can track complete data lineage across organizational boundaries.

Here’s what you can see cross-workspace versus what remains workspace-specific:

Lineage Element	Visible Cross-Workspace?	Permission Required	Limitation
Table lineage	Yes	BROWSE on source and target catalogs	None—full visibility
Column lineage	Yes	BROWSE on source and target catalogs	None—full visibility
Job metadata	Partial	BROWSE on tables + workspace access	Job name visible, details masked
Notebook details	No	Workspace-level notebook permissions	Only notebook_id visible cross-workspace
Dashboard links	Partial	Dashboard-specific permissions	Dashboard name visible, query details masked

To see complete job details including execution history and configurations, you must log into the workspace where the job was created. This security model protects workspace-level resources while still providing visibility into data relationships at the metastore level.

Practical cross-workspace scenario

Your centralized data platform team manages a shared bronze layer in Workspace A. Analytics teams run jobs in Workspaces B, C, and D that consume these bronze tables. By querying system.access.table_lineage from Workspace A, you can see all consuming jobs across workspaces B, C, and D with their job IDs. You’ll see which workspaces generate the most downstream dependencies and can proactively communicate when making changes to shared tables.

Step 5: What are common issues when job lineage is missing? (Troubleshooting)

Job lineage may be missing if tables aren’t registered in Unity Catalog, jobs use JDBC connections instead of Spark DataFrame interfaces, or permissions are insufficient. DML operations like UPDATE, DELETE, and INSERT VALUES don’t generate lineage—this is a known limitation across all Databricks lineage methods. External tables referenced by path only show the path in lineage records instead of the table name.

If you don’t see job lineage when you expect it, work through these common causes:

Issue	Why It Happens	How to Fix	Verification Method
Missing lineage entirely	Table not in Unity Catalog metastore	Register table using `CREATE TABLE` in Unity Catalog catalog	Check table shows in Catalog Explorer
Incomplete job details	JDBC query bypassed Unity Catalog	Use Spark DataFrame or Databricks SQL interfaces	Verify query uses `.table()` not JDBC connection
Permission errors	No BROWSE privilege on catalog	Grant BROWSE using `GRANT BROWSE ON CATALOG catalog_name TO principal`	Run `SHOW GRANTS ON CATALOG catalog_name`
External table path only	Table referenced by cloud path, not name	Reference as `catalog.schema.table` not `s3://bucket/path`	Query lineage for `source_table_full_name` not `source_path`
DML operations not tracked	UPDATE/DELETE don’t generate lineage	No fix—use query history for DML audits	Check system.query.history for operation records

Tables not in Unity Catalog

The most common issue is trying to track lineage for tables that aren’t registered in a Unity Catalog metastore. Legacy Hive metastore tables don’t generate lineage even if they’re visible in your workspace. Migrate tables to Unity Catalog using CREATE TABLE catalog.schema.table AS SELECT * FROM hive_metastore.schema.table.

JDBC queries bypass lineage capture

When jobs use JDBC connections to query tables, Unity Catalog doesn’t capture the lineage because the query execution happens outside the Spark DataFrame interface. Update your jobs to use Spark’s native .table() method or Databricks SQL commands instead of JDBC drivers. This ensures queries flow through Unity Catalog’s lineage tracking.

Insufficient permissions block visibility

You might have permissions to run queries against tables but lack BROWSE privilege needed to view lineage. Have your workspace admin grant BROWSE on the parent catalog. Without this, the lineage is captured but remains invisible to you in Catalog Explorer and system table queries.

External tables by path

When you reference external tables using their cloud storage path (s3://bucket/path) instead of their registered name (catalog.schema.table), lineage records store only the path. You’ll need to query using the source_path or target_path fields instead of source_table_full_name. Best practice: Always reference tables by their Unity Catalog three-level namespace.

DML operation limitations

UPDATE, DELETE, and INSERT VALUES operations modify data in place without creating new lineage edges. This is a platform limitation, not a configuration issue. If you need to audit these operations for compliance, use system.query.history instead. That system table captures all query executions with statement_text showing the exact DML operation performed.

How Atlan Approaches Databricks Job Lineage

The Challenge

Unity Catalog provides native job lineage tracking within Databricks, but three limitations show up quickly once you scale beyond a single workspace. First, while the Catalog Explorer UI can show multiple hops of lineage, it defaults to a single level at a time, so tracing deep data flows means manually expanding node by node. Second, lineage stops at the Databricks boundary — you can’t see how jobs connect to external BI tools, ETL pipelines, or other data platforms in your stack. Third, Unity Catalog retains lineage edges for up to one year on a rolling basis, including relationships to tables that have been dropped and recreated with the same name. Because it doesn’t distinguish between CREATE OR REPLACE and INSERT behavior, historical edges to now‑deleted tables can linger in the graph.

Atlan’s Approach

Atlan complements Unity Catalog rather than replacing it. Under the hood, Atlan extracts Databricks lineage primarily from Unity Catalog system tables (and the Unity Catalog REST API where needed), then enhances that information with cross‑system visibility. Our active metadata approach generates column‑level lineage for tables, views, and materialized views across all jobs and languages running on Databricks clusters.

To address stale lineage, Atlan implements statement‑type aware merging powered by Databricks system tables such as system.query.history. When a table is recreated with CREATE or CREATE OR REPLACE, Atlan prunes outdated upstream edges and replaces them with the fresh set of inputs. For INSERT‑style patterns (including backfills), Atlan appends new edges without dropping existing ones. The result is a lineage graph that reflects the present state of your data flows, not a cluttered mix of current and obsolete relationships.

Atlan also builds on Unity Catalog’s cross‑workspace aggregation. When multiple workspaces share a metastore, Atlan can see job and table relationships across those workspaces and then extend that lineage beyond Databricks. You can trace a flow that starts with a Databricks job writing to Unity Catalog, continues through transformations in dbt, and ends in downstream consumption in Tableau, Power BI, or other BI tools — alongside lineage from Snowflake, Redshift, BigQuery, and more.

The Outcome

In recent internal benchmarks, Atlan’s enhanced Databricks lineage engine reduced processing time by ~50%. A previous implementation took ~50 minutes to process ~40,000 lineage records. With the new bloom‑filter‑based and DuckDB‑optimized engine, Atlan processed over 4 million lineage records in about 34 minutes, and cut a multi‑day transform step down to well under an hour. This performance profile lets Atlan handle very large Databricks environments while still publishing up‑to‑date lineage graphs.

Organizations like Kaizen Gaming, Babbel, Kueski, and EasyJet use Atlan’s enriched Databricks lineage to track data flows across their entire technology stack — from lakehouse pipelines in Databricks, to transformations in dbt, to dashboards in BI tools — instead of seeing only what happens inside a single Databricks workspace.

See how Atlan extends Databricks lineage across your entire stack

Book a Demo →

Frequently Asked Questions About Databricks Job Lineage

1. Does Databricks automatically track job lineage?

Yes, Unity Catalog automatically captures job lineage when jobs read from or write to tables registered in the metastore. Lineage includes job_id, job_run_id, and associated notebook_id for every operation. It tracks all languages (Python, SQL, Scala, R) without requiring code changes. However, jobs using JDBC connections or operating on tables outside Unity Catalog won’t have lineage captured. You must enable Unity Catalog first and ensure tables are registered in a Unity Catalog metastore rather than the legacy Hive metastore.

2. How do I see which jobs are using my table?

Navigate to your table in Catalog Explorer, click the Lineage tab, then select Jobs and Downstream. You’ll see all jobs consuming the table with job names, IDs, and last run times. Alternatively, query system.access.table_lineage system table filtering by source_table_full_name to get programmatic access to job lineage. The visual graph shows job-to-table relationships instantly. For reverse lookup from a job’s perspective, open Job details in the Jobs UI and click the upstream and downstream tables link.

3. What permissions do I need to view job lineage?

You need BROWSE privilege on the parent catalog of tables in the lineage graph. The catalog must be accessible from your workspace. For job and notebook details, you need permissions on those workspace objects per workspace access control settings. For Unity Catalog-enabled pipelines, you need CAN VIEW permission on the pipeline. Without proper permissions, lineage graphs hide table details even though the lineage relationships exist. Have your workspace admin grant BROWSE on relevant catalogs.

4. Why don’t I see lineage for UPDATE or DELETE operations?

Unity Catalog doesn’t capture lineage for UPDATE, DELETE, or INSERT VALUES operations. This is a platform limitation across all Databricks lineage methods (visual graph, system tables, and API), not a configuration issue. These DML operations modify data in place without creating new edges that connect source tables to target tables.

There’s no way to make these operations appear in Unity Catalog’s lineage system, so the recommended approach is to rely on system.query.history (and/or audit logs) for DML auditing. That system table records every query execution, including the full statement_text for UPDATE, DELETE, and other DML.

5. How long does Databricks retain job lineage data?

Unity Catalog retains table and column lineage data for one year in system tables such as system.access.table_lineage and system.access.column_lineage, on a rolling basis. After one year, lineage records are automatically purged from these system tables.

For longer‑term compliance or forensic needs, you should export lineage data to external storage (for example, via scheduled weekly or monthly queries) before the one‑year retention window closes. The Catalog Explorer lineage UI reflects the relationships available in that one‑year window — older relationships are removed as their system table records expire.

6. Can I track lineage for external tables?

Yes, but with important caveats. External tables that are registered in Unity Catalog do generate lineage records. However, if you reference external data only by its cloud storage path (for example, delta.“s3://bucket/path” or s3://bucket/path) instead of by its table name (catalog.schema.table), lineage records store only the path information.

In those path‑based cases, you may need to query on source_path or target_path fields instead of source_table_full_name. Best practice is to always reference external tables by their three‑level Unity Catalog name (catalog.schema.table). That ensures lineage appears correctly in Catalog Explorer and in system table queries.

7. How does Atlan’s job lineage differ from Unity Catalog’s native lineage?

Unity Catalog gives you native job, table, and column lineage inside Databricks; Atlan builds on top of that view rather than replacing it. Atlan pulls lineage primarily from Unity Catalog system tables, then connects it to lineage from tools like Snowflake, Redshift, BigQuery, dbt, Airflow, Tableau, and Power BI so you can see end‑to‑end flows across your whole stack. It also cleans up stale edges that Databricks retains within its one‑year window by using statement‑type‑aware logic (e.g., distinguishing CREATE OR REPLACE from INSERT) to keep only the current, valid upstream relationships for a table. In practice, Databricks stays your source of truth inside the lakehouse, while Atlan acts as a cross‑system lineage hub with a cleaner, present‑state view.

8. What’s the difference between job lineage and notebook lineage?

Job lineage tracks scheduled or triggered job runs and their table dependencies. Notebook lineage tracks ad-hoc queries run interactively in notebooks. Both are captured automatically by Unity Catalog when operations touch Unity Catalog tables. Job lineage includes job_id and job_run_id for workflow tracking and scheduling dependency analysis. Notebook lineage includes notebook_id for interactive analysis tracking and data exploration audits. A job running a notebook task populates both job_info and notebook_id fields in the same lineage record.

9. How do I query job lineage for a specific time period?

Filter system.access.table_lineage by the event_date column. Example: WHERE event_date >= CURRENT_DATE() - INTERVAL 7 DAYS. The event_date field is partitioned for query performance, so always include it in your WHERE clause for faster results. You can also filter by event_time for precise timestamps down to the second. Join with system.query.history table using statement_id for additional context like query duration, execution status, and the user who ran the query. This approach works for compliance audits requiring historical lineage analysis.

10. Can I see job lineage across multiple Databricks workspaces?

Yes, if workspaces share the same Unity Catalog metastore. Lineage aggregates across all attached workspaces automatically. Tables registered in the metastore are visible to users with BROWSE permissions across all workspaces. However, workspace-level objects like job details and notebooks show masked information when viewed from different workspaces. You’ll see job_id but not execution details or configuration. Log into the workspace where the job was created to see full details including job configurations, execution history, and associated notebooks.

11. Why is my job lineage showing stale edges to deleted tables?

In native Databricks, Unity Catalog keeps lineage edges for up to one year on a rolling basis. If you drop and recreate a table with the same name, Databricks does not distinguish that recreation from a regular INSERT pattern. As a result, historical edges pointing to now‑deleted or superseded tables can persist in the lineage graph until those records age out of the one‑year window.

Atlan addresses this by applying statement‑type and history‑aware pruning on top of the Databricks lineage data. When Atlan sees lineage associated with a CREATE or CREATE OR REPLACE statement for a target table, it removes older upstream edges for that target and replaces them with the latest set from query history. For pure INSERT workflows — including intentional backfills — Atlan appends new edges instead of pruning, so you can preserve those histories when you need them.

12. Does job lineage work with Delta Live Tables pipelines?

Yes. Unity Catalog captures lineage for Lakeflow Spark Declarative Pipelines (Delta Live Tables), including table‑ and column‑level lineage when you’re on supported runtimes (column lineage for DLT requires Databricks Runtime 13.3 LTS or above). Lineage records for these pipelines include identifiers such as the pipeline ID and update identifiers in the metadata.

Atlan ingests this DLT lineage via Unity Catalog’s lineage system tables and combines it with other Databricks lineage (jobs, notebooks, dashboards) and cross‑system lineage. That lets you see Delta Live Tables pipelines alongside batch jobs and downstream BI tools in a single, unified lineage graph.

Ready to implement job lineage tracking in your environment?

Unity Catalog makes job lineage automatic. You set it up once in 15-30 minutes and gain instant visibility into data dependencies across your Databricks workspaces. You’ve now configured job lineage tracking that captures most dependencies automatically without code changes. As your jobs run, lineage builds continuously. Use Catalog Explorer for quick visual checks when troubleshooting production incidents. Use system table queries for programmatic analysis, compliance reporting, and integration with external monitoring tools.

Your lineage data becomes more valuable over time. After a few weeks, you’ll have historical patterns showing which jobs access which tables during different time periods. This helps with capacity planning, cost optimization, and understanding seasonal data flow patterns. For end-to-end lineage beyond Databricks that connects job lineage across your BI tools, ETL pipelines, and other data platforms, explore how Atlan integrates with your complete data stack.

Get end-to-end lineage across Databricks and your entire data stack

Book a Personalized Demo →

Share this article

Atlan is the next-generation platform for data and AI governance. It is a control plane that stitches together a business's disparate data infrastructure, cataloging and enriching data with business context and security.

Book a Demo Start Tour

Databricks Lineage: Complete Guide to Data Lineage in Databricks
What Is Data Lineage & Why Is It Important?
Automated Data Lineage: Making Lineage Work For Everyone
How AI-Ready Data Lineage Activates Trust & Context in 2025
Data Lineage Tracking: Best Practices & Tools
Column-Level Lineage: Why It Matters for Data Governance
Databricks Unity Catalog: Complete Guide
Databricks Data Contracts: Build Trusted Data Products
Data Governance for AI: Challenges & Best Practices
Data Catalog: Does Your Business Really Need One?
What is data governance & why does it matter?