How to Set Up Snowflake Data Lineage? Step-by-Step Guide for 2025
Share this article
Tracking the flow of data in Snowflake has become essential for data teams seeking to improve transparency and accountability across systems. A frequent question from data practitioners is, “How can I implement data lineage in Snowflake to better understand the flow of my data?”
This guide provides a step-by-step breakdown of how to set up Snowflake data lineage, manage roles and permissions, and address key challenges. Whether you’re a data engineer, business analyst, or platform architect, implementing Snowflake data lineage can significantly enhance your operational efficiency and ensure compliance.
See How Atlan Simplifies Data Governance – Start Product Tour
Quick Steps to Set Up Snowflake Data Lineage:
- Create a database role for data lineage.
- Create a database user and assign it the role.
- Identify the tables and views for which you’ll track data lineage.
- Assign relevant read permissions to the database role.
- Assign the role to the database user.
- Configure the Snowflake connector and start crawling lineage metadata.
Snowflake offers powerful features to provide full visibility into your virtual warehouses. Alongside the data dictionary and access logs, Snowflake’s query histories also enable you to track data flow and usage across the system. This metadata gives you the ability to search, discover, and activate insights more effectively. To get the most out of Snowflake’s lineage capabilities, consider integrating a third-party data catalog or lineage tool like Atlan for enhanced visualization and discoverability.
Table of contents #
- Prerequisites to setting up data lineage for Snowflake
- Steps to set up data lineage for Snowflake
- How to set up data lineage in Atlan for Snowflake
- Snowflake and Atlan integration: Customer success across industries
- FAQs on Atlan’s Snowflake data lineage capabilities
- Snowflake Data Lineage: Related reads
Prerequisites to setting up data lineage for Snowflake #
When setting up a data lineage tool for Snowflake, you’ll need to tick a few networking, infrastructure, and security checkboxes:
- Reachability — Make sure that the data lineage tool can connect Snowflake, i.e., it has proper connectivity. You might need to consider handling PrivateLink, NACLs, VPNs, etc., to make this work.
- Encryption — Use Snowflake’s data dictionary to get metadata securely into your data lineage tool. Some detailed methods of fetching data lineage contain highly sensitive data (detailed schema information) that could expose your organization to cyber attacks.
- Infrastructure — Ensure that your data catalog has enough computing power and memory to address data crawling, previewing, and querying operations.
Steps to set up data lineage for Snowflake #
If you don’t have a working connection with Snowflake from your data catalog or lineage tool, let’s quickly walk through the initial setup steps in brief:
Step 1. Create a database role for data lineage #
Snowflake’s access control layer works with users and roles. Whatever permissions you have to grant, you grant them to a role. Then you assign a role to a user. You can also assign roles to other roles, making role hierarchies. In this case, you’ll create a new role called data_lineage_role
, using the following command:
CREATE OR REPLACE ROLE data_lineage_role;
There are many ways to get lineage metadata from Snowflake. You’ll need to grant permissions to this role based on which method(s) you choose. Before going into grants, let’s create a database user to which you’ll assign this role.
Step 2. Create a database user #
If you already have a data_catalog_user
(as prescribed in this tutorial), use the same user; otherwise, create a new one. In addition to the data_catalog_role
, we’ll also assign the data_lineage_role
to the same user. Here are the commands you can use to create a database user in Snowflake:
# Method 1: With password
CREATE USER data_lineage_user PASSWORD='<password>' DEFAULT_ROLE=data_lineage_role DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>';
# Method 2: With public key
CREATE USER data_lineage_user RSA_PUBLIC_KEY='<rsa_public_key>' DEFAULT_ROLE=data_lineage DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>';
Alternatively, you can use SSO. There are two ways to do authentication on Snowflake: using browser-based SSO or your identity provider’s native SSO (only available for Okta).
Step 3. Identify tables and views you’ll be using for inferring data lineage from Snowflake #
To find out what permissions you need to grant to the data_lineage_role
, you need to understand the different methods of fetching lineage metadata from Snowflake. You’ll also need to consider the level of support your data catalog or lineage tool has for these methods, as some of the operations involved in fetching metadata involve advanced SQL parsing, data flattening, and sophisticated querying to infer table-level and column-level lineage.
Here’s a basic comparison of the function and level of detail of three different data sources for lineage metadata in Snowflake:
SCHEMA.OBJECT | FUNCTION | LEVEL OF DETAIL |
---|---|---|
INFORMATION_SCHEMA.OBJECT_DEPENDENCIES |
Captures how different Snowflake objects are dependent on one another. | Low |
ACCOUNT_USAGE.ACCESS_HISTORY |
Contains queries for DML operations. Helps with column-level lineage. | High |
ACCOUNT_USAGE.QUERY_HISTORY |
Logs every query in the last 365 days. | High |
Please note that the QUERY_HISTORY
and ACCESS_HISTORY
objects are also available in the READER_ACCOUNT_USAGE
schema.
Step 4. Assign relevant read permissions to the database role #
If you want to grant access to all three objects mentioned in the previous section, use the following set of GRANT
statements:
# To access dependencies between Snowflake objects
GRANT USAGE ON WAREHOUSE <warehouse_name> TO ROLE data_lineage_role;
# To get access logs for DML operations, and how columns changed because of the operations AND
# To get every query run in the past 365 days
GRANT USAGE, MONITOR ON WAREHOUSE <warehouse_name> TO ROLE data_lineage_role;
In addition to this, the INFORMATION_SCHEMA
has a lot of other objects that make Snowflake’s internal data dictionary, such as TABLES
, COLUMNS
, etc. You can also use the metadata from those objects to make more sense of data lineage.
Notice how the permissions are granted on a WAREHOUSE
level. That’s right, and you will need to individually grant the USAGE
or MONITOR
privilege to each virtual warehouse in your Snowflake account. The alternative is to grant the permissions from the ACCOUNTADMIN
role to the data_lineage_role
. Snowflake highly recommends that you NOT do that, but if you still want to, here’s how you would do it:
USE ROLE ACCOUNTADMIN;
GRANT IMPORTED PRIVILEGES ON DATABASE snowflake TO ROLE data_lineage_role;
Additionally, if you are dealing with cloned accounts in Snowflake, you’ll need to grant permissions for their access too.
Step 5. Assign the database role to the database user #
Once you’re done assigning all the relevant permissions to the role, you’ll need to assign the role to the data_lineage_user
using the following GRANT
statement:
GRANT ROLE data_lineage_role TO USER data_lineage_user;
You should now be ready to connect to your Snowflake account from your data catalog or lineage tool.
Step 6. Configure the Snowflake connector and start crawling lineage metadata #
To configure the Snowflake connector, log into your data catalog or lineage tool and find the Snowflake connector. Enter the database user credentials into that connector, and you should be all set. If you cannot connect to your Snowflake warehouse, you’ll probably need to check if you missed any networking or security steps. You can use the SnowCD (Snowflake Connectivity Diagnostic) tool to evaluate your network connectivity.
Once you resolve any connectivity issues, you can start crawling lineage metadata. Most data catalog or lineage tools provide you with an option to run the crawler in three different ways:
- Ad-hoc crawl (manual crawl using a CLI command or the data catalog console)
- Scheduled crawl (E.g., based on a cron expression)
- Event-based crawl (E.g., crawl triggered from an event that the data catalog can listen to)
After putting your Snowflake lineage metadata into your data catalog or lineage tool, you can identify other data sources and connect them with your data catalog to get a fuller picture of the flow of data and its lineage in your data platform.
Business outcomes from Snowflake data lineage #
Setting up a data lineage tool for Snowflake will improve the data development and consumption experience significantly by ensuring that:
- Data developers have more insight into the flow of data to understand the repercussions of data movement, transformation, archival, etc.
- Data developers have more context when they’re writing a new data workload or fixing issues and bugs in an existing one.
- Business users have the context of how data flows from the business applications and third-party integrations into the data platform for better, more meaningful reporting and analytics.
These are just a few examples. There are many more things that data lineage solves. Head over to this article to know more.
How to set up data lineage in Atlan for Snowflake #
Atlan is an active metadata platform that takes care of data lineage, in addition to data cataloging, search, and discovery for your Snowflake data platform. In addition, it gives you a rich interface to preview and query data from your Snowflake warehouses, making it a one-stop shop for all your data needs. To set up data lineage for Snowflake in Atlan, you can go through the following steps:
- Create role in Snowflake
- Create a user
- Grant role to the user
- Choose the metadata fetching method
- Grant permissions
- Allowlist the Atlan IP
Snowflake and Atlan integration: Customer success across industries #
Atlan Named a Leader in The Forrester Wave™, has proven its ability to empower organizations across diverse industries, from banking to healthcare, fintech, and manufacturing, have successfully integrated Snowflake with Atlan to modernize their data stack, streamline data governance, enable self-service access to data, and get the data AI-ready. Here’s how these leading companies have transformed their data operations using this powerful combination.
Austin Capital Bank (Banking) is a fast-growing, product-centric bank that adopted Snowflake and Atlan to modernize their data stack. The integration provided a seamless way to manage data access while ensuring governance. As Ian Bass, Head of Data & Analytics, put it, “Atlan gave us a simple way to see who has access to what."
Scripps Health (Healthcare) leveraged the Snowflake-Atlan integration to manage sensitive healthcare data while adhering to HIPAA requirements. With Atlan tapping into Snowflake’s powerful metadata, they gained end-to-end visibility. “Since Atlan is virtualized on Snowflake, security is no longer a concern,” says Victor Wilson, Data Architect.
Tala (FinTech) uses Snowflake as part of their data stack and integrates it with Atlan, dbt, and Looker. By automating the sync of dbt documentation into Snowflake through Atlan, Tala streamlines its data processes. This allows business users to access a unified data dictionary within Atlan, making data easily understandable.
Aliaxis (Manufacturing), a global leader in water solutions, integrated Atlan with their Snowflake-powered data warehouse to enhance data visibility. Atlan serves as their primary point of reference for data-related queries, acting as a “bridge” to understand data within Snowflake. “If there’s any question you have about data in Snowflake, go to Atlan,” shares Nestor Jarquin, Global Data & Analytics Lead.
These stories highlight the transformative power of Snowflake and Atlan for businesses looking to enhance their data capabilities. Want to see how this integration can work for you? [Book a demo today] and discover the impact of Atlan + Snowflake for your organization!
FAQs on Atlan’s Snowflake data lineage capabilities #
What is Snowflake data lineage? #
Snowflake data lineage refers to the process of tracking and visualizing how data moves and transforms within Snowflake. It helps teams understand the complete lifecycle of data—from its source to its final destination.
Why is data lineage important in Snowflake? #
Data lineage is crucial because it provides transparency into data flows, enabling teams to ensure data quality, meet compliance requirements, and troubleshoot issues faster by understanding how data is used and transformed across Snowflake.
How can Atlan help set up Snowflake data lineage? #
Atlan integrates seamlessly with Snowflake to automatically capture, visualize, and track data lineage. With Atlan, users can get a holistic view of data pipelines, transformations, and usage patterns without manual setup or complex configurations.
What benefits does Snowflake data lineage offer to businesses using Atlan? #
By integrating Atlan with Snowflake, businesses can:
- Gain a clear understanding of data transformations and workflows.
- Ensure better data governance and compliance.
- Improve reporting accuracy by ensuring data integrity.
- Empower teams with contextual information for better decision-making and data utilization.
How easy is it to set up Snowflake data lineage with Atlan? #
Setting up data lineage with Atlan in Snowflake is simple. Atlan automatically captures and visualizes lineage across tables, columns, and queries in Snowflake, helping teams effortlessly manage their data environments without requiring extensive configuration.
Can Atlan display column-level lineage for Snowflake? #
Yes, Atlan provides column-level lineage for Snowflake, allowing users to track how data at the granular level is transformed and used across various datasets. This helps data teams ensure precision and accuracy in their data analysis.
How does Atlan handle changes in Snowflake’s data structure for lineage tracking? #
Atlan continuously syncs with Snowflake, ensuring that changes in data pipelines, tables, or transformations are reflected in the data lineage views. This real-time tracking helps maintain an up-to-date lineage view without manual interventions.
Snowflake Data Lineage: Related reads #
- Snowflake Cortex: Everything We Know So Far and Answers to FAQs
- Snowflake Copilot: Here’s Everything We Know So Far About This AI-Powered Assistant
- Polaris Catalog from Snowflake: Everything We Know So Far
- Snowflake Cost Optimization: Typical Expenses & Strategies to Handle Them Effectively
- Snowflake Horizon for Data Governance: Here’s Everything We Know So Far
- Snowflake Data Cloud Summit 2024: Get Ready and Fit for AI
- How to Set Up a Data Catalog for Snowflake: A Step-by-Step Guide
- How to Set Up Snowflake Data Lineage: Step-by-Step Guide
- How to Set Up Data Governance for Snowflake: A Step-by-Step Guide
- Snowflake + AWS: A Practical Guide for Using Storage and Compute Services
- Snowflake X Azure: Practical Guide For Deployment
- Snowflake X GCP: Practical Guide For Deployment
- Snowflake + Fivetran: Data movement for the modern data platform
- Snowflake + dbt: Supercharge your transformation workloads
- Snowflake Metadata Management: Importance, Challenges, and Identifying The Right Platform
- Snowflake Data Governance: Native Features, Atlan Integration, and Best Practices
- Snowflake Data Dictionary: Documentation for Your Database
- Snowflake Data Access Control Made Easy and Scalable
- Glossary for Snowflake: Shared Understanding Across Teams
- Snowflake Data Catalog: Importance, Benefits, Native Capabilities & Evaluation Guide
- Snowflake Data Mesh: Step-by-Step Setup Guide
- Managing Metadata in Snowflake: A Comprehensive Guide
- How to Query Information Schema on Snowflake? Examples, Best Practices, and Tools
- Snowflake Summit 2023: Why Attend and What to Expect
- Snowflake Summit Sessions: 10 Must-Attend Sessions to Up Your Data Strategy
Share this article