Snowflake Data Lineage: A Step-by-Step How to Guide

Updated September 28th, 2023
header image

Share this article

Snowflake has powerful features to support full visibility into what’s going on in your virtual warehouses. In addition to the data dictionary and access logs, this includes query histories too. The idea of all this metadata is to enable you to search, discover, and, in essence, activate, your metadata. You can set up a third-party data catalog or lineage tool to do that.

In this step-by-step guide, you’ll learn how to set up a data lineage tool for Snowflake.

Modern data problems require modern solutions - Try Atlan, the data catalog of choice for forward-looking data teams! 👉 Book your demo today

To fetch metadata necessary to infer or build data lineage from Snowflake, you’ll need to follow the following steps:

  1. Create a database role for data lineage
  2. Create a database user
  3. Identify tables and views you’ll be using for inferring data lineage from Snowflake
  4. Assign relevant read permissions to the database role
  5. Assign the database role to the database user
  6. Configure the Snowflake connector and start crawling lineage metadata

Tracking the flow of data is one of the most common challenges, even in the most mature of data platforms. Multiple data layers, cloud platforms, processing engines, and storage locations don’t make it easier.

Tracking data lineage has real benefits in making lives easier for business users, data analysts, analytics engineers, and data engineers. If you feel your current Snowflake-based data platform isn’t providing you enough visibility into the flow and lineage of your data 👉 Talk to us.

Table of contents #

  1. Prerequisites to setting up data lineage for Snowflake
  2. DSteps to set up data lineage for Snowflake
  3. How to set up data lineage in Atlan for Snowflake

Prerequisites to setting up data lineage for Snowflake #

When setting up a data lineage tool for Snowflake, you’ll need to tick a few networking, infrastructure, and security checkboxes:

  1. Reachability — Make sure that the data lineage tool can connect Snowflake, i.e., it has proper connectivity. You might need to consider handling PrivateLink, NACLs, VPNs, etc., to make this work.
  2. Encryption — Use Snowflake’s data dictionary to get metadata securely into your data lineage tool. Some detailed methods of fetching data lineage contain highly sensitive data (detailed schema information) that could expose your organization to cyber attacks.
  3. Infrastructure — Ensure that your data catalog has enough computing power and memory to address data crawling, previewing, and querying operations.

Steps to set up data lineage for Snowflake #

If you don’t have a working connection with Snowflake from your data catalog or lineage tool, let’s quickly walk through the initial setup steps in brief:

Step 1. Create a database role for data lineage #

Snowflake’s access control layer works with users and roles. Whatever permissions you have to grant, you grant them to a role. Then you assign a role to a user. You can also assign roles to other roles, making role hierarchies. In this case, you’ll create a new role called data_lineage_role, using the following command:

CREATE OR REPLACE ROLE data_lineage_role;

There are many ways to get lineage metadata from Snowflake. You’ll need to grant permissions to this role based on which method(s) you choose. Before going into grants, let’s create a database user to which you’ll assign this role.

Step 2. Create a database user #

If you already have a data_catalog_user (as prescribed in this tutorial), use the same user; otherwise, create a new one. In addition to the data_catalog_role, we’ll also assign the data_lineage_role to the same user. Here are the commands you can use to create a database user in Snowflake:

# Method 1: With password
CREATE USER data_lineage_user PASSWORD='<password>' DEFAULT_ROLE=data_lineage_role DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>';

# Method 2: With public key
CREATE USER data_lineage_user RSA_PUBLIC_KEY='<rsa_public_key>' DEFAULT_ROLE=data_lineage DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>';

Alternatively, you can use SSO. There are two ways to do authentication on Snowflake: using browser-based SSO or your identity provider’s native SSO (only available for Okta).

Step 3. Identify tables and views you’ll be using for inferring data lineage from Snowflake #

To find out what permissions you need to grant to the data_lineage_role, you need to understand the different methods of fetching lineage metadata from Snowflake. You’ll also need to consider the level of support your data catalog or lineage tool has for these methods, as some of the operations involved in fetching metadata involve advanced SQL parsing, data flattening, and sophisticated querying to infer table-level and column-level lineage.

Here’s a basic comparison of the function and level of detail of three different data sources for lineage metadata in Snowflake:

INFORMATION_SCHEMA.OBJECT_DEPENDENCIES Captures how different Snowflake objects are dependent on one another. Low
ACCOUNT_USAGE.ACCESS_HISTORY Contains queries for DML operations. Helps with column-level lineage. High
ACCOUNT_USAGE.QUERY_HISTORY Logs every query in the last 365 days. High

Please note that the QUERY_HISTORY and ACCESS_HISTORY objects are also available in the READER_ACCOUNT_USAGE schema.

Step 4. Assign relevant read permissions to the database role #

If you want to grant access to all three objects mentioned in the previous section, use the following set of GRANT statements:

# To access dependencies between Snowflake objects
GRANT USAGE ON WAREHOUSE <warehouse_name> TO ROLE data_lineage_role;

# To get access logs for DML operations, and how columns changed because of the operations AND
# To get every query run in the past 365 days
GRANT USAGE, MONITOR ON WAREHOUSE <warehouse_name> TO ROLE data_lineage_role;

In addition to this, the INFORMATION_SCHEMA has a lot of other objects that make Snowflake’s internal data dictionary, such as TABLES, COLUMNS, etc. You can also use the metadata from those objects to make more sense of data lineage.

Notice how the permissions are granted on a WAREHOUSE level. That’s right, and you will need to individually grant the USAGE or MONITOR privilege to each virtual warehouse in your Snowflake account. The alternative is to grant the permissions from the ACCOUNTADMIN role to the data_lineage_role. Snowflake highly recommends that you NOT do that, but if you still want to, here’s how you would do it:


Additionally, if you are dealing with cloned accounts in Snowflake, you’ll need to grant permissions for their access too.

Step 5. Assign the database role to the database user #

Once you’re done assigning all the relevant permissions to the role, you’ll need to assign the role to the data_lineage_user using the following GRANT statement:

GRANT ROLE data_lineage_role TO USER data_lineage_user;

You should now be ready to connect to your Snowflake account from your data catalog or lineage tool.

Step 6. Configure the Snowflake connector and start crawling lineage metadata #

To configure the Snowflake connector, log into your data catalog or lineage tool and find the Snowflake connector. Enter the database user credentials into that connector, and you should be all set. If you cannot connect to your Snowflake warehouse, you’ll probably need to check if you missed any networking or security steps. You can use the SnowCD (Snowflake Connectivity Diagnostic) tool to evaluate your network connectivity.

Once you resolve any connectivity issues, you can start crawling lineage metadata. Most data catalog or lineage tools provide you with an option to run the crawler in three different ways:

  1. Ad-hoc crawl (manual crawl using a CLI command or the data catalog console)
  2. Scheduled crawl (E.g., based on a cron expression)
  3. Event-based crawl (E.g., crawl triggered from an event that the data catalog can listen to)

After putting your Snowflake lineage metadata into your data catalog or lineage tool, you can identify other data sources and connect them with your data catalog to get a fuller picture of the flow of data and its lineage in your data platform.

Business outcomes from Snowflake data lineage #

Setting up a data lineage tool for Snowflake will improve the data development and consumption experience significantly by ensuring that:

  • Data developers have more insight into the flow of data to understand the repercussions of data movement, transformation, archival, etc.
  • Data developers have more context when they’re writing a new data workload or fixing issues and bugs in an existing one.
  • Business users have the context of how data flows from the business applications and third-party integrations into the data platform for better, more meaningful reporting and analytics.

These are just a few examples. There are many more things that data lineage solves. Head over to this article to know more.

How to set up data lineage in Atlan for Snowflake #

Atlan is an active metadata platform that takes care of data lineage, in addition to data cataloging, search, and discovery for your Snowflake data platform. In addition, it gives you a rich interface to preview and query data from your Snowflake warehouses, making it a one-stop shop for all your data needs. To set up data lineage for Snowflake in Atlan, you can go through the following steps:

  1. Create role in Snowflake
  2. Create a user
    1. With a password in Snowflake
    2. With a public key in Snowflake
    3. Managed through your identity provider (IdP)
  3. Grant role to the user
  4. Choose the metadata fetching method
    1. Information schema (recommended)
    2. Account usage (alternative)
  5. Grant permissions
    1. To crawl existing assets
    2. To crawl future assets
    3. To mine query history for lineage
    4. To preview and query existing assets
    5. To preview and query future assets
  6. Allowlist the Atlan IP

Share this article

[Website env: production]