How to Set Up a Data Catalog for Snowflake: A Step-by-Step Guide

Updated June 22nd, 2023
header image

Share this article

Snowflake exposes its internal data dictionary via a custom-built INFORMATION_SCHEMA on top of the SQL-92 ANSI standard. To fully use the internal data dictionary, you’ll need to use an external data catalog tool to make it more accessible and valuable for your organization.

In this step-by-step guide, you’ll learn how to set up a data catalog for Snowflake.

To fetch the metadata necessary to build a data catalog for Snowflake, you’ll need to go through the following steps:

  1. Create a database role in Snowflake
  2. Create a database user
  3. Identify the Snowflake databases, schemas, and objects that you want to crawl
  4. Assign relevant permissions to the role
  5. Assign the role to the database user
  6. Start crawling data using a manual or a scheduled run

Organizations have different needs and demands when it comes to metadata usage. Bigger organizations often face problems of data silos, spread-out metadata, the inability to discover what data exists and where, and more. If you feel your current data platform needs help with solving any of these problems or more with the help of metadata that you already have: Talk to us.


Table of contents

  1. Prerequisites to setting up a data catalog for Snowflake
  2. Steps to set up data catalog for Snowflake
  3. Business outcomes from cataloging Snowflake data
  4. How to deploy Atlan for Snowflake

Prerequisites to setting up a data catalog for Snowflake

When setting up a data catalog for Snowflake, you’ll need to tick a few networking, infrastructure, and security checkboxes:

  1. Reachability — Make sure that the data catalog can reach Snowflake. Both these tools can be in different networks, so you might need to set up VPNs, peering connections, NACLs, etc., to ensure proper function.
  2. Encryption — The data dictionary gives you a deep insight into application and system design. You want the metadata to be securely fetched and stored from your Snowflake account.
  3. Infrastructure — Ensure your data catalog has enough compute and memory to address data crawling, previewing, and querying operations.

Steps to set up data catalog for Snowflake

Step 1: Create a database role in Snowflake


Every database and data warehouse has different implementations of namespaces, roles, and permissions. The steps to set up a data catalog will vary based on the implementation. Snowflake has two access control models (DAC and RBAC) that work side-by-side to provide a very flexible and granular control structure for your data. In Snowflake, you can only assign permissions to roles, not users directly. Here’s how you create a role that has OPERATE and USAGE permissions on a Snowflake warehouse:


CREATE OR REPLACE ROLE data_catalog_role;
GRANT OPERATE, USAGE ON WAREHOUSE "<warehouse-name>" TO ROLE data_catalog_role;

Granting OPERATE on your warehouse enables the role to view all queries previously run or currently running on the warehouse. It also allows the role to start, stop, suspend, and resume a warehouse. Make sure to review that your grants are not overly permissive for the data catalog.

Step 2: Create a database user


Once you’ve created a database role, create a database user and attach the role to it. You can create a database user in one of three ways: a password, public key, or SSO.


# Method 1: With password
CREATE USER data_catalog_user PASSWORD='<password>' DEFAULT_ROLE=data_catalog_role DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>';

# Method 2: With public key
CREATE USER data_catalog_user RSA_PUBLIC_KEY='<rsa_public_key>' DEFAULT_ROLE=data_catalog_role DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>';

Using SSO, there are two ways to do authentication on Snowflake: using browser-based SSO or your identity provider’s native SSO (only available for Okta).

Step 3: Identify the Snowflake databases, schemas, and objects that you want to crawl


The process of data movement, cleaning, and transformation is often very messy. You end up creating a lot of playground databases and schemas for running internal experiments and testing queries.

Ideally, all this should be in a development environment, but that’s not always the case. Hence, you must identify which databases and schemas to fetch data assets from.

Before making the initial connection, explore your data assets by using commands like SHOW DATABASES, SHOW SCHEMAS, SHOW OBJECTS, SHOW TABLES, and so on. You can alternatively query the INFORMATION_SCHEMA directly.

Often database and object names have a prefix that signifies a stage in the data pipeline or a level of maturity of the data they contain. You can use these indicators to finalize the assets you want to fetch.

Vet the list of databases, schemas, and objects to be crawled because, in the next step, you’ll need to grant read permissions to your data_catalog_role to enable the data catalog to start fetching metadata from your Snowflake warehouse.

Step 4: Assign relevant permissions to the role


There are two ways you can go about assigning permissions to the data_catalog_role. Your choice should be based on the capabilities of your data catalog. Identify if you only want to:

  1. Catalog data assets and view structural metadata
  2. Catalog data assets, view structural metadata, preview data, and run queries against data assets.

This is where Snowflake offers two other schemas, ACCOUNT_USAGE and READER_ACCOUNT_USAGE, in addition to the INFORMATION_SCHEMA, that contain database metadata. You can choose one or the other depending on what your data catalog supports.

Grant access for crawling metadata

Using the ACCOUNT_USAGE schema, you’ll need to grant additional access to DATABASES, SCHEMATA, TABLES, VIEWS, COLUMNS, and PIPES for the data catalog to be able to fetch metadata. You can also use one or more of Snowflake’s default ACCOUNT_USAGE roles: OBJECT_VIEWER, USAGE_VIEWER, GOVERNANCE_VIEWER, and SECURITY_VIEWER. Similarly, to use the READER_ACCOUNT_USAGE schema for fetching query history, login history, etc., you can assign the READER_USAGE_VIEWER role to your data_catalog_role.

Given that using INFORMATION_SCHEMA is the most common and extensive method for crawling metadata; let’s look at it in more detail. Using the INFORMATION_SCHEMA, you can use the following GRANT statements to provide crawling access for current data assets to the data_catalog_role:


GRANT USAGE ON DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT USAGE ON ALL SCHEMAS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL EXTERNAL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL MATERIALIZED VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT SELECT ON ALL STREAMS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT MONITOR ON PIPE "<pipe_name>" TO ROLE data_catalog_role;

The aforementioned GRANT statements assign the data_catalog_role privileges to fetch the metadata for all the current data assets. To be able to bring the metadata for all the future data assets, too, you’ll need to specify the explicitly FUTURE keyword in the GRANT statements, as shown below:


GRANT USAGE ON FUTURE SCHEMAS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE EXTERNAL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE MATERIALIZED VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT SELECT ON FUTURE STREAMS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT MONITOR ON FUTURE PIPES IN DATABASE "<database_name>" TO ROLE data_catalog_role;

If you stop here, you’ll be in a position to list all data assets for the <database_name> database, but you won’t be able to preview or query the data from your data catalog.

Grant access for previewing and querying data

Many data catalogs, such as Atlan, provide you with an integrated IDE so that you can preview the data and even work on it by writing queries, saving them, and sharing them with your team. You can allow access to previewing and querying data from your data catalog by running the same statements that you ran in the previous step with a minor difference. Instead of GRANT REFERENCES, you’ll need to GRANT SELECT on TABLES, EXTERNAL TABLES, VIEWS, and MATERIALIZED VIEWS. Note that you’ll need to do the same thing twice, once for current data assets and once for future data assets. Here’s an example of granting SELECT on TABLES, present and future:


GRANT SELECT ON ALL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT SELECT ON FUTURE TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;

Now, you should be able to preview and query your Snowflake data from the data catalog.

Consider granting other relevant privileges

There are a host of other grants and privileges that you can assign to the data_catalog_role. What you need will eventually depend on the capabilities of your data catalog. For example, if your data catalog has data lineage capabilities, you might need to grant enhanced privileges, such as IMPORTED PRIVILEGES to the SNOWFLAKE database to the data_catalog_role.

Step 5: Assign the role to the database user


Once you’re done assigning all the relevant permissions to the role, you’ll need to assign the role to the data_catalog_role using the following GRANT statement:

GRANT ROLE data_catalog_role TO USER data_catalog_user;

You should now be ready to connect to your Snowflake account from your data catalog.

Step 6: Configure the Snowflake connector and start crawling data


Once you’ve run the GRANT statements, you’ll be in a position to test whether you’re able to fetch data from your data catalog. To configure the Snowflake connector in your data catalog, you must provide the connector with the database credentials.

Most connectors can test the connection before you go on to the next step and start crawling the data from your Snowflake database. If you fail to connect to your Snowflake warehouse, you’ll probably need to check if any networking or security considerations were missed. You can use the SnowCD (Snowflake Connectivity Diagnostic) tool to evaluate your network connectivity.

Once you resolve any connectivity issues, you can start crawling data. Most data catalogs provide you with an option to run the crawler in three different ways:

  1. Ad-hoc crawl (manual crawl using a CLI command or the data catalog console)
  2. Scheduled crawl (E.g., based on a cron expression)
  3. Event-based crawl (E.g., crawl triggered from an event that the data catalog can listen to)

After putting your Snowflake data into your data catalog, you can identify other data sources and connect them with your data catalog.


Business outcomes from cataloging Snowflake data

Setting up a data catalog for Snowflake tremendously helps your business derive value from your data as it will allow you to:

  • Search and discover all your data assets in Snowflake using a visual interface; some catalogs, like Atlan, will also have a rich IDE for you to work with your data.
  • Tag, classify, and govern your data assets and enrich them with business context — all from a central location.
  • Do more with your data with add-on features like data lineage, data quality & profiling, and observability.

There are many more things that implementing a data catalog for Snowflake solves. Head over to this article to know more.


How to deploy Atlan for Snowflake

Atlan is an active metadata platform that solves for data cataloging, search, and discovery for your Snowflake data platform among other data tools. It also provides you with a rich IDE to work with your Snowflake data directly, making it a one-stop shop for all your data needs. To set up a data catalog for Snowflake in Atlan, you can go through the following steps:

  1. Create role in Snowflake
  2. Create a user
    1. With a password in Snowflake
    2. With a public key in Snowflake
    3. Managed through your identity provider (IdP)
  3. Grant role to the user
  4. Choose the metadata fetching method
    1. Information schema (recommended)
    2. Account usage (alternative)
  5. Grant permissions
    1. To crawl existing assets
    2. To crawl future assets
    3. To mine query history for lineage
    4. To preview and query existing assets
    5. To preview and query future assets
  6. Allowlist the Atlan IP

Share this article

[Website env: production]