Snowflake Data Lineage: A Step-by-Step How to Guide
Share this article
Snowflake has powerful features to support full visibility into what’s going on in your virtual warehouses. In addition to the data dictionary and access logs, this includes query histories too. The idea of all this metadata is to enable you to search, discover, and, in essence, activate, your metadata. You can set up a third-party data catalog or lineage tool to do that.
In this step-by-step guide, you’ll learn how to set up a data lineage tool for Snowflake.
Modern data problems require modern solutions - Try Atlan, the data catalog of choice for forward-looking data teams! 👉 Book your demo today
To fetch metadata necessary to infer or build data lineage from Snowflake, you’ll need to follow the following steps:
- Create a database role for data lineage
- Create a database user
- Identify tables and views you’ll be using for inferring data lineage from Snowflake
- Assign relevant read permissions to the database role
- Assign the database role to the database user
- Configure the Snowflake connector and start crawling lineage metadata
Tracking the flow of data is one of the most common challenges, even in the most mature of data platforms. Multiple data layers, cloud platforms, processing engines, and storage locations don’t make it easier.
Tracking data lineage has real benefits in making lives easier for business users, data analysts, analytics engineers, and data engineers. If you feel your current Snowflake-based data platform isn’t providing you enough visibility into the flow and lineage of your data 👉 Talk to us.
Table of contents
- Prerequisites to setting up data lineage for Snowflake
- DSteps to set up data lineage for Snowflake
- How to set up data lineage in Atlan for Snowflake
Prerequisites to setting up data lineage for Snowflake
When setting up a data lineage tool for Snowflake, you’ll need to tick a few networking, infrastructure, and security checkboxes:
- Reachability — Make sure that the data lineage tool can connect Snowflake, i.e., it has proper connectivity. You might need to consider handling PrivateLink, NACLs, VPNs, etc., to make this work.
- Encryption — Use Snowflake’s data dictionary to get metadata securely into your data lineage tool. Some detailed methods of fetching data lineage contain highly sensitive data (detailed schema information) that could expose your organization to cyber attacks.
- Infrastructure — Ensure that your data catalog has enough computing power and memory to address data crawling, previewing, and querying operations.
Steps to set up data lineage for Snowflake
If you don’t have a working connection with Snowflake from your data catalog or lineage tool, let’s quickly walk through the initial setup steps in brief:
Step 1. Create a database role for data lineage
Snowflake’s access control layer works with users and roles. Whatever permissions you have to grant, you grant them to a role. Then you assign a role to a user. You can also assign roles to other roles, making role hierarchies. In this case, you’ll create a new role called
data_lineage_role, using the following command:
CREATE OR REPLACE ROLE data_lineage_role;
There are many ways to get lineage metadata from Snowflake. You’ll need to grant permissions to this role based on which method(s) you choose. Before going into grants, let’s create a database user to which you’ll assign this role.
Step 2. Create a database user
If you already have a
data_catalog_user (as prescribed in this tutorial), use the same user; otherwise, create a new one. In addition to the
data_catalog_role, we’ll also assign the
data_lineage_role to the same user. Here are the commands you can use to create a database user in Snowflake:
# Method 1: With password CREATE USER data_lineage_user PASSWORD='<password>' DEFAULT_ROLE=data_lineage_role DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>'; # Method 2: With public key CREATE USER data_lineage_user RSA_PUBLIC_KEY='<rsa_public_key>' DEFAULT_ROLE=data_lineage DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>';
Step 3. Identify tables and views you’ll be using for inferring data lineage from Snowflake
To find out what permissions you need to grant to the
data_lineage_role, you need to understand the different methods of fetching lineage metadata from Snowflake. You’ll also need to consider the level of support your data catalog or lineage tool has for these methods, as some of the operations involved in fetching metadata involve advanced SQL parsing, data flattening, and sophisticated querying to infer table-level and column-level lineage.
Here’s a basic comparison of the function and level of detail of three different data sources for lineage metadata in Snowflake:
|SCHEMA.OBJECT||FUNCTION||LEVEL OF DETAIL|
|Captures how different Snowflake objects are dependent on one another.||Low|
|Contains queries for DML operations. Helps with column-level lineage.||High|
|Logs every query in the last 365 days.||High|
Please note that the
ACCESS_HISTORY objects are also available in the
Step 4. Assign relevant read permissions to the database role
If you want to grant access to all three objects mentioned in the previous section, use the following set of
# To access dependencies between Snowflake objects GRANT USAGE ON WAREHOUSE <warehouse_name> TO ROLE data_lineage_role; # To get access logs for DML operations, and how columns changed because of the operations AND # To get every query run in the past 365 days GRANT USAGE, MONITOR ON WAREHOUSE <warehouse_name> TO ROLE data_lineage_role;
In addition to this, the
INFORMATION_SCHEMA has a lot of other objects that make Snowflake’s internal data dictionary, such as
COLUMNS, etc. You can also use the metadata from those objects to make more sense of data lineage.
Notice how the permissions are granted on a
WAREHOUSE level. That’s right, and you will need to individually grant the
MONITOR privilege to each virtual warehouse in your Snowflake account. The alternative is to grant the permissions from the
ACCOUNTADMIN role to the
data_lineage_role. Snowflake highly recommends that you NOT do that, but if you still want to, here’s how you would do it:
USE ROLE ACCOUNTADMIN; GRANT IMPORTED PRIVILEGES ON DATABASE snowflake TO ROLE data_lineage_role;
Additionally, if you are dealing with cloned accounts in Snowflake, you’ll need to grant permissions for their access too.
Step 5. Assign the database role to the database user
Once you’re done assigning all the relevant permissions to the role, you’ll need to assign the role to the
data_lineage_user using the following
GRANT ROLE data_lineage_role TO USER data_lineage_user;
You should now be ready to connect to your Snowflake account from your data catalog or lineage tool.
Step 6. Configure the Snowflake connector and start crawling lineage metadata
To configure the Snowflake connector, log into your data catalog or lineage tool and find the Snowflake connector. Enter the database user credentials into that connector, and you should be all set. If you cannot connect to your Snowflake warehouse, you’ll probably need to check if you missed any networking or security steps. You can use the SnowCD (Snowflake Connectivity Diagnostic) tool to evaluate your network connectivity.
Once you resolve any connectivity issues, you can start crawling lineage metadata. Most data catalog or lineage tools provide you with an option to run the crawler in three different ways:
- Ad-hoc crawl (manual crawl using a CLI command or the data catalog console)
- Scheduled crawl (E.g., based on a cron expression)
- Event-based crawl (E.g., crawl triggered from an event that the data catalog can listen to)
After putting your Snowflake lineage metadata into your data catalog or lineage tool, you can identify other data sources and connect them with your data catalog to get a fuller picture of the flow of data and its lineage in your data platform.
Business outcomes from Snowflake data lineage
Setting up a data lineage tool for Snowflake will improve the data development and consumption experience significantly by ensuring that:
- Data developers have more insight into the flow of data to understand the repercussions of data movement, transformation, archival, etc.
- Data developers have more context when they’re writing a new data workload or fixing issues and bugs in an existing one.
- Business users have the context of how data flows from the business applications and third-party integrations into the data platform for better, more meaningful reporting and analytics.
These are just a few examples. There are many more things that data lineage solves. Head over to this article to know more.
How to set up data lineage in Atlan for Snowflake
Atlan is an active metadata platform that takes care of data lineage, in addition to data cataloging, search, and discovery for your Snowflake data platform. In addition, it gives you a rich interface to preview and query data from your Snowflake warehouses, making it a one-stop shop for all your data needs. To set up data lineage for Snowflake in Atlan, you can go through the following steps:
- Create role in Snowflake
- Create a user
- Grant role to the user
- Choose the metadata fetching method
- Grant permissions
- Allowlist the Atlan IP
Share this article