BigQuery Data Lineage: How It Works, Limitations & When to Extend

Q: How long does it take for BigQuery data lineage to appear?

Plan for 30 minutes to 24 hours after your BigQuery job completes for lineage to appear in Dataplex. The standard processing time is 30 minutes, but the maximum can stretch to 24 hours depending on the volume and complexity of your operations.

Q: Does BigQuery support column-level data lineage?

Yes, but only within BigQuery, and with important constraints. Dataplex now supports column-level lineage for BigQuery jobs, so you can see how specific columns flow between BigQuery tables for supported DML and DDL operations (for example, CREATE TABLE, INSERT, UPDATE, MERGE, DELETE, and SELECT with a destination table).

Q: What's the difference between BigQuery lineage and data provenance?

Data lineage shows the complete flow and transformations of data over time, while data provenance specifically refers to the original source or first instance of that data. BigQuery lineage captures both—it shows you where data comes from (provenance) and how it transforms through queries and jobs (lineage).

Q: Can BigQuery lineage track dbt transformations?

Yes, indirectly—when dbt runs generate BigQuery query jobs, those jobs create lineage in Dataplex. You'll see lineage between the tables dbt creates and their dependencies, but you won't see dbt model names, the actual .sql files, or dbt-specific metadata like model descriptions and tests.

Q: Why isn't my BigQuery lineage showing up?

The most common cause is that you enabled the Data Lineage API but haven't waited the required 30 minutes to 24 hours for lineage to appear. Second most common is that the Data Lineage API isn't enabled for BOTH your active project and your compute project where jobs actually run.

Q: How much does BigQuery data lineage cost?

BigQuery data lineage through Dataplex follows Google Cloud's standard pricing for Dataplex Universal Catalog. There's no separate charge for lineage capture itself—you pay for the underlying Dataplex service based on metadata operations and storage.

Q: Can I export BigQuery lineage data?

Yes—access BigQuery lineage programmatically through the Data Lineage API using Python, Java, or REST calls. The API lets you query lineage relationships, export them to JSON format, and integrate with your own tools or governance systems.

Q: Does BigQuery lineage work with external tables?

BigQuery captures lineage for external tables—data stored in Cloud Storage, Drive, or Bigtable—when you query them through BigQuery. The system treats the external table as a source dependency in your lineage graph. However, lineage stops at the external table boundary.

Q: How does BigQuery lineage handle BigQuery routines (stored procedures, functions)?

BigQuery doesn't directly track lineage for routines. When you use a stored procedure or user-defined function in a query, lineage records relationships between the tables the routine reads and the tables your query writes—but the routine itself won't appear as a node in the lineage graph.

Q: When does it make sense to use both native BigQuery lineage and Atlan?

Use both when you want the free, automatic lineage within BigQuery for basic table dependencies, plus comprehensive cross-platform lineage for the full picture. Native lineage gives you BigQuery-internal visibility at no extra cost. Atlan then stitches that together with your broader data estate.

Quick answer: How BigQuery Data Lineage works

BigQuery Data Lineage is a managed service powered by Google Cloud Dataplex that automatically tracks data movement and transformations. It parses SQL syntax from completed jobs to map dependencies across your warehouse.

When your BigQuery dashboard breaks Friday afternoon, you need to trace the root cause fast—but native lineage can take up to 24 hours to appear. BigQuery data lineage through Dataplex automatically tracks table-level dependencies for queries, jobs, and transformations within BigQuery. However, it operates with delays, lacks column-level granularity, and doesn't extend to dbt models or BI tools. You'll learn when native lineage is sufficient for your needs, when you need to extend with a third-party control plane, and how to implement lineage effectively.

What It Is: BigQuery data lineage via Dataplex automatically captures table-level dependencies from query jobs, transformations, and data operations.
How It Works: Tracks query jobs, table operations, and transformations by intercepting lineage events and storing them in Dataplex Universal Catalog.
Key Limitation: 30 minutes to 24 hours delay for lineage to appear, plus limited cross-tool coverage outside BigQuery.

Below: What is BigQuery lineage, How it works, Limitations, When to extend.

See Lineage in Action — Product Tour →Book a Demo

BigQuery data lineage at a glance

Aspect	Details
What It Is	BigQuery data lineage via Dataplex automatically captures table-level dependencies from query jobs, transformations, and data operations
How It Works	Tracks query jobs, table operations, and transformations by intercepting lineage events and storing them in Dataplex Universal Catalog
What It Tracks	CREATE TABLE, INSERT, UPDATE, DML/DDL statements, query jobs, and table relationships within BigQuery
What It Doesn’t Track	External sources (CRM, databases), dbt models (without manual integration), BI dashboards (Looker, Tableau), column-level dependencies
Retention Period	30 days for deleted resources
Delay	30 minutes to 24 hours for lineage to appear depending on volume and complexity

What is BigQuery data lineage?

BigQuery data lineage is a Dataplex feature that automatically tracks how data moves through BigQuery by capturing table-level dependencies from query jobs, transformations, and data operations. It’s part of the Dataplex Universal Catalog—not a standalone BigQuery feature—and tracks relationships between tables to enable impact analysis. Lineage appears within 24 hours after your BigQuery job completes, capturing events whenever you run query jobs using DDL or DML statements.

BigQuery captures lineage automatically when you run jobs that create or modify tables. Unlike manual documentation that quickly becomes outdated, native lineage updates itself as your data pipelines execute. This automatic capture covers query jobs, load operations, and transformations—giving you visibility into table dependencies without additional engineering effort.

Data lineage tracks the origin, transformations, and movement of data over time, forming a map of your data’s journey through your systems. In BigQuery’s implementation, the system intercepts three core elements: processes (the jobs themselves), runs (individual executions of those jobs), and events (the lineage metadata describing relationships between tables). When you execute a CREATE TABLE AS SELECT statement, BigQuery captures which source tables are fed into your new table, storing this relationship for future analysis.

Operation Type	Examples	Lineage Captured?
Query jobs	CREATE TABLE AS SELECT, MERGE	Yes
Load jobs	bq load, Storage Write API	Yes
Copy jobs	bq cp, dataset copy	Yes
DML operations	INSERT, UPDATE, DELETE	Yes
DDL operations	CREATE TABLE, ALTER TABLE	Yes
BigQuery routines	Stored procedures, UDFs	Partial (tables only, not routine node)

How does BigQuery data lineage work?

BigQuery data lineage works by intercepting lineage events whenever you create or transform tables through query jobs, then storing these relationships in Dataplex Universal Catalog for visualization and API access. The system uses a three-step process: you enable the Data Lineage API for your project, BigQuery automatically reports lineage events when jobs execute, and Dataplex processes and stores lineage for querying through the console graph or API. The Data Lineage API serves as the single entry point for accessing all lineage information.

Here’s how the capture process works in practice:

1. Enable the Data Lineage API

Enable the API on a per-project basis in Google Cloud Console. This activation tells BigQuery to start capturing lineage events for all jobs in that project going forward.

2. BigQuery automatically reports lineage events

When you run jobs, every time you execute a query that creates or modifies a table, BigQuery generates a lineage event describing the source tables, transformation logic, and destination table.

3. Dataplex processes and stores lineage

Lineage is stored in its Universal Catalog. The lineage platform processes incoming data and stores it in query-optimized databases, making it available for visualization in the console or programmatic access via API.

This automatic reporting happens in the background—you don’t need to modify your queries or add instrumentation code. When you write CREATE TABLE sales_summary AS SELECT * FROM raw_sales WHERE date > '2024-01-01', BigQuery captures that sales_summary depends on raw_sales without any additional configuration.

The system uses a hierarchical information model to organize lineage data. At the top level, a Process represents a data processing job (like a scheduled query). Each execution of that process creates a Run, which captures the specific tables involved in that execution. Events within each run describe the actual lineage relationships—which source tables fed into which destination tables. Assets are the data entities themselves (tables, views), and Lineage Links connect assets to show dependencies.

Component	Function	Example
Process	Data processing job or pipeline	Scheduled BigQuery query, dbt run
Run	Single execution of a process	Query job ID abc123 executed at 10:15 AM
Event	Lineage metadata for that run	Table A → Table B relationship captured
Asset	Data entity being tracked	`project.dataset.sales_table`
Lineage Link	Dependency relationship	sales_summary reads from raw_sales

What data sources does BigQuery lineage track?

BigQuery lineage tracks table-level dependencies within BigQuery itself—including native tables, external tables connected via BigQuery, and transformations from query jobs—but doesn’t automatically extend to upstream data sources, dbt models, or downstream BI tools outside the BigQuery ecosystem. When you use Dataplex more broadly, it supports lineage from Cloud Composer pipelines, Dataproc jobs, and Vertex AI workflows. The critical limitation: lineage stops at BigQuery’s boundary unless you manually extend it.

Native BigQuery operations generate lineage automatically. When you query a table, create a view, or run a transformation, BigQuery captures those relationships without configuration. External tables (data stored in Cloud Storage, Drive, or Bigtable) also generate lineage when you query them through BigQuery—the system sees the external table as a source dependency.

What doesn’t get tracked automatically? Your dbt transformations won’t appear in native lineage unless dbt generates BigQuery query jobs (in which case you’ll see table relationships but not dbt-specific metadata like model names or documentation). Looker and Tableau dashboards reading from BigQuery won’t show up in lineage graphs. Fivetran and Airbyte loads into BigQuery don’t generate native lineage events. Source databases feeding data into BigQuery remain invisible to Dataplex lineage.

You can manually extend lineage through the Data Lineage API, which lets you record custom lineage using the OpenLineage standard. This requires writing code to capture lineage from your ETL tools and push it to Dataplex. For teams running modern data stacks with multiple tools, this manual integration quickly becomes a maintenance burden.

Data Source Type	Automatically Tracked?	Notes
BigQuery tables	Yes	Full automatic lineage
BigQuery external tables	Yes	Tracked when queried via BigQuery
Cloud Composer pipelines	Yes	When using Dataplex integration
dbt transformations	Partial	Tables tracked, but not dbt-specific metadata
Looker/Tableau dashboards	No	Requires third-party integration
Fivetran/Airbyte loads	No	No automatic lineage capture
Source databases (PostgreSQL, Oracle)	No	Stops at BigQuery boundary

See how Atlan extends BigQuery lineage across your full data estate

Take Product Tour →

How do you enable BigQuery data lineage?

Enable BigQuery data lineage by turning on the Data Lineage API for your project in Google Cloud Console, which activates automatic lineage capture for all BigQuery jobs in that project going forward. This operates on a per-project basis—not per-service—and requires specific IAM permissions (Service Usage Admin to enable the API). After enabling the API, lineage information is automatically reported for multiple Google Cloud services in the project, including BigQuery, Dataproc, and Data Fusion.

The setup breaks down into three steps:

1. Verify prerequisites and enable APIs

Confirm billing is enabled for your project and you have Owner or Editor roles. Navigate to the Dataplex API page in Google Cloud Console and enable three APIs: Dataplex API, BigQuery API, and Data Lineage API. You can also enable these via command line:

gcloud services enable dataplex.googleapis.com bigquery.googleapis.com datalineage.googleapis.com

2. Grant IAM roles

Grant roles to users who need to view lineage. At minimum, assign the Data Lineage Viewer role (roles/datalineage.viewer) to see lineage graphs. Most users also need Data Catalog Viewer (roles/datacatalog.viewer) and BigQuery Data Viewer (roles/bigquery.dataViewer) to see table details within lineage nodes. Grant these in both your active project (where you view lineage) and compute project (where jobs run).

3. Run a query to test lineage capture

Execute a simple CREATE TABLE AS SELECT statement in BigQuery. Wait 30 minutes to 24 hours for lineage to populate. Navigate to Dataplex > Process History in Cloud Console to view the lineage graph for your query job.

Role	Permission	Purpose
Service Usage Admin	`serviceusage.services.enable`	Enable Data Lineage API
Data Lineage Viewer	`datalineage.*`	View lineage graphs and relationships
Data Catalog Viewer	`datacatalog.taxonomies.get`	Access catalog metadata
BigQuery Data Viewer	`bigquery.tables.get`	See table details in lineage nodes
BigQuery Resource Viewer	`bigquery.jobs.get`	View job information for lineage

The most common setup mistake: enabling the API in your active project but not in the compute project where jobs actually run. If lineage isn’t appearing, verify the API is enabled in both locations and that you have the correct viewer roles assigned in both projects.

What are BigQuery data lineage limitations?

BigQuery data lineage through Dataplex gives you automatic visibility into how data moves between tables inside BigQuery. But four practical limitations still matter in day-to-day operations: processing delay before lineage appears, constrained column-level lineage, no automatic coverage of external tools like dbt or BI platforms, and 30-day retention for deleted resources.

The timing limitation matters most for incident response. Lineage can take from roughly 30 minutes up to 24 hours to show up in Dataplex, depending on volume and complexity. During this window, you’re effectively blind to dependencies—you can’t quickly identify which upstream table caused downstream failures. Teams that need near-real-time impact analysis (“If I change this table, what breaks?”) can’t rely on native lineage alone for production debugging.

BigQuery now supports column-level lineage for supported BigQuery jobs, but with important constraints. Column-level links are collected only for certain job types (for example, CREATE TABLE, INSERT, UPDATE, MERGE, DELETE, SELECT with a destination table) and only within BigQuery. Column lineage is not collected for load jobs, routines, or upstream external tables, and it isn’t exposed via a dedicated API. That means you can see how columns flow across some BigQuery transformations, but you still can’t use native tools to trace column-level data end-to-end across external sources, ETL, and BI—or programmatically query column-level relationships at scale. For compliance teams tracking PII or sensitive data, this partial coverage is often not enough.

BigQuery lineage also stops at the platform boundary unless you extend it manually through the Data Lineage API or OpenLineage integrations. You can trace data within BigQuery’s ecosystem, but you can’t see the full path from Salesforce (source) → Fivetran (ingestion) → BigQuery (transformation) → dbt (modeling) → Tableau or Looker (visualization). Each tool in your stack becomes a blind spot unless you invest in custom lineage capture and stitching.

Additional constraints include:

BigQuery routines (stored procedures, UDFs) don’t appear as nodes in the lineage graph—you only see table-to-table relationships.
Graph traversal is limited to 20 levels of depth and 10,000 links per direction, which can cause issues for deeply nested pipelines.
Column-level lineage isn’t collected if a job creates more than 1,500 column-level links, which affects very wide tables.
There’s no native SQL visualization of individual queries, and no built-in impact analysis or root-cause functionality—lineage shows relationships but doesn’t help you interpret “what breaks” when something changes.

Limitation	Impact	Workaround / Alternative
24-hour delay	Can’t do near-real-time impact analysis or fast incident debugging	Use third-party tools for fresher lineage and impact analysis
Limited column lineage	Only some BigQuery jobs tracked; gaps for load jobs, routines, externals; no dedicated API	Use third-party platforms for full, API-accessible column-level lineage
External tool gaps	No automatic visibility into dbt, BI tools, or upstream sources	Manual API / OpenLineage integration or a cross-platform control plane
30-day retention	Lineage disappears shortly after resource deletion	Export lineage data via API or control plane before deletion
No impact analysis features	Graph shows relationships but not “what breaks”	Build custom tooling or use metadata / lineage platforms
Routine tracking gaps	Stored procedures don’t appear as nodes in the lineage graph	Parse routine definitions separately or via external tooling
Graph depth & link limits	20-level, 10K-link maximum per direction	Simplify pipeline architecture or partition lineage views

Unlike Snowflake and Databricks, which provide direct system table access for querying lineage, BigQuery requires you to use either the Dataplex console UI or the Data Lineage API for programmatic access. This adds friction to automation workflows where you want to query lineage data alongside other metadata using SQL.

When is native BigQuery lineage sufficient vs. when do you need a third-party tool?

Native BigQuery lineage is sufficient when your data pipeline lives entirely within BigQuery, you can tolerate 24-hour lineage delays, and table-level dependencies meet your compliance needs, but you need a third-party control plane like Atlan when you require column-level tracking, real-time impact analysis, or unified lineage across BigQuery, dbt, BI tools, and upstream sources. The decision hinges on your stack complexity and operational requirements.

You can rely on native lineage when your data team works exclusively in BigQuery. Dataplex now provides column-level lineage for supported BigQuery jobs, but only inside BigQuery and with important gaps (no coverage for load jobs, routines, external tables, or downstream BI tools, and no dedicated API). If you need end-to-end, cross-platform column-level lineage, you still need a control plane like Atlan.

You’ll need to extend when your modern data stack spans multiple platforms. Most analytics teams today run BigQuery + dbt (transformation) + Looker or Tableau (visualization) + Fivetran or Airbyte (ingestion), with data flowing from Salesforce, PostgreSQL, or other sources. Native lineage can’t track this end-to-end flow. When executives ask “Where does this dashboard number come from?” and the answer spans five tools, table-level BigQuery lineage won’t give you the complete picture.

Real-time requirements break native lineage immediately. If you need to answer “What breaks if I change this table?” in under 30 minutes—for incident response, production deployments, or high-velocity analytics work—the 24-hour delay makes native lineage unusable. Organizations using metadata platforms with robust lineage see measurably faster issue resolution compared to teams relying solely on delayed native tracking.

Column-level lineage becomes critical for compliance, AI governance, and data quality use cases. When you need to prove that customer PII flows from your CRM through transformations to final reports—or trace which columns feed into AI model features—table-level lineage isn’t granular enough. Regulatory audits often require column-level data flow documentation that native BigQuery can’t yet provide in general availability.

Scenario	Native BigQuery Lineage	Third-Party Platform (Atlan)
Single-platform (BQ only)	✓ Sufficient for basic table/column views	Optional enhancement
Modern stack (BQ + dbt + BI)	✗ Gaps outside BigQuery	✓ Required for full cross-platform visibility
Column-level needs (end-to-end)	✗ Limited to BigQuery jobs only, no API	✓ Required for complete, API-accessible coverage
Real-time / near-real-time impact analysis	✗ Multi-hour delays	✓ Required for fast incident response and deployments
AI model / feature lineage	✗ Partial (BQ tables only)	✓ Required (source → warehouse → BI/AI)
Compliance audit trails	✓ Basic table-level + partial column-level	✓ Column-level detail across tools and platforms
Root cause in < 30 minutes	✗ Not guaranteed with native delays	✓ Near-real-time lineage plus impact analysis views

Feature	BigQuery	Snowflake	Databricks
Column-level lineage	Private preview only	GA via Access History	GA via system tables
Processing delay	30 min to 24 hours	Near real-time	Near real-time
SQL access (system tables)	No (API only)	Yes	Yes
Visual representation	Dataplex console graph	Snowsight lineage	Unity Catalog UI
Cross-platform visibility	No	No	No
Cost	Dataplex pricing	Included	Included

Many teams adopt a hybrid approach: keep native BigQuery lineage enabled for free table-level visibility within BigQuery, then layer a control plane like Atlan on top for column-level detail, real-time impact analysis, and cross-platform stitching. This gives you the best of both worlds—automatic capture within BigQuery plus comprehensive visibility across your full stack.

How Atlan extends BigQuery data lineage across your full data estate

The Challenge

When your executive dashboard breaks Friday afternoon, the 24-hour lineage delay means you’re debugging blind until Monday. You can see that sales_summary depends on some upstream table in BigQuery, but you can’t trace the data flow from Salesforce (where the sale was recorded) through Fivetran (which loaded it) into BigQuery, then through your dbt transformation models, and finally to the Tableau dashboard executives are staring at. Column-level tracking sits in private preview, leaving your team manually tracing data flows through complex transformations to answer compliance questions. Dataplex lineage stops at BigQuery’s boundary—you can’t see the full picture across your modern data stack. For teams running production analytics on tight SLAs, these gaps turn every incident into an hours-long investigation.

Atlan’s Approach

Atlan complements BigQuery’s native lineage by automatically parsing SQL query logs and metadata across your stack to build end-to-end, column-level lineage. The platform stitches together BigQuery tables, dbt models, BI dashboards, and upstream sources into a unified view, connecting your Salesforce account to that executive dashboard through every transformation step. Where Dataplex operates with multi-hour delays, Atlan surfaces near-real-time impact analysis; customers report cutting impact-analysis cycles from weeks down to under 30 minutes. With 100+ connectors and OpenLineage support, Atlan gives modern data teams the control plane they need for cross-platform visibility, column-level detail, and AI-ready governance.

The Outcome

Teams using Atlan alongside BigQuery’s native lineage report saving up to 95% of the time they previously spent on manual discovery and documentation, turning multi-week investigations into minutes. Organizations like Nasdaq, Dropbox, and Workday use this cross-platform visibility to maintain compliance while moving faster. Gartner recognized Atlan as a Leader in the 2025 Magic Quadrant for Metadata Management Solutions, with the highest scores for lineage and impact analysis capabilities. Many teams keep Dataplex enabled for free BigQuery-internal lineage, then layer Atlan on top for column-level detail, real-time updates, and visibility across their broader stack, from databases through BigQuery to BI tools and AI models.

FAQ

How long does it take for BigQuery data lineage to appear?

Plan for 30 minutes to 24 hours after your BigQuery job completes for lineage to appear in Dataplex. The standard processing time is 30 minutes, but the maximum can stretch to 24 hours depending on the volume and complexity of your operations. This delay frustrates teams who need real-time impact analysis—you can’t quickly identify which upstream change broke a downstream dashboard when lineage takes hours to populate. There’s no way to force faster processing or prioritize specific jobs. This delay is one of the most common complaints on Stack Overflow and Reddit from data engineers trying to debug production issues.

Does BigQuery support column-level data lineage?

Yes, but only within BigQuery, and with important constraints. Dataplex now supports column-level lineage for BigQuery jobs, so you can see how specific columns flow between BigQuery tables for supported DML and DDL operations (for example, CREATE TABLE, INSERT, UPDATE, MERGE, DELETE, and SELECT with a destination table). Column-level links are visible in the Dataplex and BigQuery lineage UIs alongside table-level relationships.

What’s the difference between BigQuery lineage and data provenance?

Data lineage shows the complete flow and transformations of data over time, while data provenance specifically refers to the original source or first instance of that data. BigQuery lineage captures both—it shows you where data comes from (provenance) and how it transforms through queries and jobs (lineage). In practice, teams use these terms interchangeably when discussing data tracking. For example, lineage shows the full path: your dashboard pulls from a dbt model, which reads from a BigQuery table, which was loaded from a source database. Provenance would point specifically to that source database as the origin.

Can BigQuery lineage track dbt transformations?

Yes, indirectly—when dbt runs generate BigQuery query jobs, those jobs create lineage in Dataplex. You’ll see lineage between the tables dbt creates and their dependencies, but you won’t see dbt model names, the actual .sql files, or dbt-specific metadata like model descriptions and tests. The lineage graph shows table relationships without the dbt context that makes those relationships meaningful to your team. For complete dbt lineage that includes model documentation, tags, and transformation logic, use platforms that parse dbt’s manifest.json file alongside BigQuery query logs. This integration gives you the full picture—both the BigQuery table dependencies and the dbt semantic layer.

Why isn’t my BigQuery lineage showing up?

The most common cause: you enabled the Data Lineage API but haven’t waited the required 30 minutes to 24 hours for lineage to appear. Second most common: check if the Data Lineage API is enabled for BOTH your active project (where you’re viewing lineage) and your compute project (where jobs actually run). Third, verify you have the Data Lineage Viewer role assigned—without proper IAM permissions, the console won’t show lineage graphs. Fourth, BigQuery lineage doesn’t capture recurring load jobs from BigQuery Data Transfer Service, so you’d see this gap if you’re expecting lineage from scheduled imports. If you’re still blocked after checking these issues, Google’s troubleshooting guide covers permission errors and API enablement problems in detail.

How much does BigQuery data lineage cost?

BigQuery data lineage through Dataplex follows Google Cloud’s standard pricing for Dataplex Universal Catalog. There’s no separate charge for lineage capture itself—you pay for the underlying Dataplex service based on metadata operations and storage. In practice, lineage costs are minimal compared to your BigQuery compute and storage costs. Most teams find Dataplex catalog pricing negligible in their overall GCP bill. Budget for Dataplex when planning your data governance stack, but don’t expect lineage to be your primary expense. Check current Google Cloud pricing documentation for specific Dataplex rates, as pricing can vary by region and usage volume.

Can I export BigQuery lineage data?

Yes—access BigQuery lineage programmatically through the Data Lineage API using Python, Java, or REST calls. The API lets you query lineage relationships, export them to JSON format, and integrate with your own tools or governance systems. You can build custom reports showing data flow, feed lineage into your data catalog, or automate impact analysis workflows. The API uses a hierarchical model with processes, runs, and events that you query to retrieve lineage links between tables. Google provides Python code examples in their documentation showing common lineage queries like finding all downstream dependencies for a table or identifying the source lineage for a specific dataset.

Does BigQuery lineage work with external tables?

BigQuery captures lineage for external tables—data stored in Cloud Storage, Drive, or Bigtable—when you query them through BigQuery. The system treats the external table as a source dependency in your lineage graph. However, lineage stops at the external table boundary. You won’t see lineage back to the original Cloud Storage bucket, the specific parquet file, or the object that contains your data unless you manually create that relationship via the API. Upstream column-level lineage isn’t collected for external tables either. For complete end-to-end lineage that extends beyond BigQuery to your source storage systems, you need a cross-platform lineage tool that connects multiple systems.

How does BigQuery lineage handle BigQuery routines (stored procedures, functions)?

BigQuery doesn’t directly track lineage for routines. When you use a stored procedure or user-defined function in a query, lineage records relationships between the tables the routine reads and the tables your query writes—but the routine itself won’t appear as a node in the lineage graph. You’ll see a direct connection between source and destination tables without the routine as an intermediary step. This creates lineage gaps in pipelines that rely heavily on stored procedures for business logic. To track routine-level lineage showing how stored procedures transform data, you’d need to parse the routine definitions separately or use a tool that comprehensively analyzes BigQuery metadata including routine code.

When does it make sense to use both native BigQuery lineage and Atlan?

Use both when you want the free, automatic lineage within BigQuery for basic table dependencies, plus comprehensive cross-platform lineage for the full picture. Native lineage gives you BigQuery-internal visibility at no extra cost—you can see which BigQuery tables depend on other BigQuery tables. Atlan then stitches that together with your broader data estate, showing the complete flow from source databases through Fivetran or Airbyte ingestion, into BigQuery, through dbt transformations, and out to Tableau or Looker dashboards. Many teams keep Dataplex enabled and layer Atlan on top specifically for column-level detail and real-time impact analysis. This hybrid approach gives you baseline visibility from BigQuery while filling the gaps in ecosystem coverage and granularity.

What IAM permissions do I need to view BigQuery lineage?

You need three key permissions to view BigQuery lineage graphs. First, the Data Lineage Viewer role (roles/datalineage.viewer) lets you see lineage graphs and relationships in the Dataplex console. Second, the Data Catalog Viewer role (roles/datacatalog.viewer) gives you access to catalog metadata that populates lineage details. Third, BigQuery Data Viewer permission (bigquery.tables.get) lets you see table details within lineage nodes—without this, you’ll see generic node names but not the actual table information. Grant these roles in both your active project (where you’re viewing lineage) and your compute project (where jobs run). Without compute project permissions, you’ll see the error “Fetching lineage failed due to missing permissions.” Request these permissions from your GCP administrator if you’re blocked.

Can BigQuery lineage track data quality issues upstream?

BigQuery lineage shows you which tables feed into downstream assets, giving you the map to trace data quality issues back to their source—but it doesn’t automatically flag quality problems. Lineage provides the roadmap; you need data quality tools like dbt tests, Soda, or Monte Carlo to identify the actual issues. The powerful combination is lineage plus quality monitoring: when a data quality test fails on a downstream reporting table, use lineage to identify which upstream source table introduced the bad data. Then you can quickly isolate whether the problem originated in your raw data load, during transformation, or in the source system itself. Platforms that integrate both lineage and quality alerts streamline this investigation.

When should you extend BigQuery lineage beyond Dataplex?

Start with native lineage to understand your BigQuery dependencies at no extra cost. BigQuery’s native lineage through Dataplex gives you automatic table-level tracking within BigQuery—a solid foundation for basic compliance and impact visibility. This automatic capture works well for teams whose data pipeline lives entirely within BigQuery and who can work within the 24-hour lineage delay.

Modern data teams typically outgrow table-level, delayed lineage as their stack grows beyond BigQuery to include dbt transformations, BI tools, and diverse upstream sources. When your executive asks “Where does this dashboard number come from?” and the answer spans Salesforce, Fivetran, BigQuery, dbt, and Tableau, native lineage shows only the BigQuery segment of that journey. Metadata management enables data catalogs, lineage, and AI-driven use cases that require visibility across platforms.

Native lineage is your starting point. Extend with a control plane like Atlan when you need column-level detail for compliance, real-time impact analysis for production debugging, or cross-platform governance connecting BigQuery to your broader ecosystem. You don’t have to choose between native and third-party—most teams use both strategically, letting Dataplex capture BigQuery internals while using a metadata platform for the complete picture. Learn how Atlan extends BigQuery lineage across your full data estate.

See Why Atlan Is a Leader — Book a Demo →

Share this article

Atlan is the next-generation platform for data and AI governance. It is a control plane that stitches together a business's disparate data infrastructure, cataloging and enriching data with business context and security.

Book a Demo Start Tour

BigQuery Data Lineage: Complete guide to BigQuery lineage with Atlan
Column-Level Lineage: Why column-level detail matters for compliance and AI
Data Lineage Impact Analysis: How to use lineage for faster root cause analysis
Data Lineage Solutions: Compare top lineage tools for modern data stacks