How to Set Up a Data Catalog for Snowflake? (2024 Guide)
Share this article
Snowflake exposes its internal data dictionary via a custom-built INFORMATION_SCHEMA
on top of the SQL-92 ANSI standard. To fully use the internal data dictionary, you’ll need to use an external data catalog tool to make it more accessible and valuable for your organization.
In this step-by-step guide, you’ll learn how to set up a data catalog for Snowflake.
See How Atlan Simplifies Data Cataloging – Start Product Tour
To fetch the metadata necessary to build a data catalog for Snowflake, you’ll need to go through the following steps:
- Create a database role in Snowflake
- Create a database user
- Identify the Snowflake databases, schemas, and objects that you want to crawl
- Assign relevant permissions to the role
- Assign the role to the database user
- Start crawling data using a manual or a scheduled run
Organizations have different needs and demands when it comes to metadata usage. Bigger organizations often face problems of data silos, spread-out metadata, the inability to discover what data exists and where, and more. If you feel your current data platform needs help with solving any of these problems or more with the help of metadata that you already have: Talk to us.
Table of contents #
- Prerequisites to setting up a data catalog for Snowflake
- Steps to set up data catalog for Snowflake
- Business outcomes from cataloging Snowflake data
- How to deploy Atlan for Snowflake
- Snowflake Data Catalog Setup: Related reads
Prerequisites to setting up a data catalog for Snowflake #
When setting up a data catalog for Snowflake, you’ll need to tick a few networking, infrastructure, and security checkboxes:
- Reachability — Make sure that the data catalog can reach Snowflake. Both these tools can be in different networks, so you might need to set up VPNs, peering connections, NACLs, etc., to ensure proper function.
- Encryption — The data dictionary gives you a deep insight into application and system design. You want the metadata to be securely fetched and stored from your Snowflake account.
- Infrastructure — Ensure your data catalog has enough compute and memory to address data crawling, previewing, and querying operations.
Steps to set up data catalog for Snowflake #
Step 1: Create a database role in Snowflake #
Every database and data warehouse has different implementations of namespaces, roles, and permissions. The steps to set up a data catalog will vary based on the implementation. Snowflake has two access control models (DAC and RBAC) that work side-by-side to provide a very flexible and granular control structure for your data. In Snowflake, you can only assign permissions to roles, not users directly. Here’s how you create a role that has OPERATE
and USAGE
permissions on a Snowflake warehouse:
CREATE OR REPLACE ROLE data_catalog_role;
GRANT OPERATE, USAGE ON WAREHOUSE "<warehouse-name>" TO ROLE data_catalog_role;
Granting OPERATE on your warehouse enables the role to view all queries previously run or currently running on the warehouse. It also allows the role to start, stop, suspend, and resume a warehouse. Make sure to review that your grants are not overly permissive for the data catalog.
Step 2: Create a database user #
Once you’ve created a database role, create a database user and attach the role to it. You can create a database user in one of three ways: a password, public key, or SSO.
# Method 1: With password
CREATE USER data_catalog_user PASSWORD='<password>' DEFAULT_ROLE=data_catalog_role DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>';
# Method 2: With public key
CREATE USER data_catalog_user RSA_PUBLIC_KEY='<rsa_public_key>' DEFAULT_ROLE=data_catalog_role DEFAULT_WAREHOUSE='<warehouse_name>' DISPLAY_NAME='<display_name>';
Using SSO, there are two ways to do authentication on Snowflake: using browser-based SSO or your identity provider’s native SSO (only available for Okta).
Step 3: Identify the Snowflake databases, schemas, and objects that you want to crawl #
The process of data movement, cleaning, and transformation is often very messy. You end up creating a lot of playground databases and schemas for running internal experiments and testing queries.
Ideally, all this should be in a development environment, but that’s not always the case. Hence, you must identify which databases and schemas to fetch data assets from.
Before making the initial connection, explore your data assets by using commands like SHOW DATABASES
, SHOW SCHEMAS
, SHOW OBJECTS
, SHOW TABLES
, and so on. You can alternatively query the INFORMATION_SCHEMA
directly.
Often database and object names have a prefix that signifies a stage in the data pipeline or a level of maturity of the data they contain. You can use these indicators to finalize the assets you want to fetch.
Vet the list of databases, schemas, and objects to be crawled because, in the next step, you’ll need to grant read permissions to your data_catalog_role
to enable the data catalog to start fetching metadata from your Snowflake warehouse.
Step 4: Assign relevant permissions to the role #
There are two ways you can go about assigning permissions to the data_catalog_role
. Your choice should be based on the capabilities of your data catalog. Identify if you only want to:
- Catalog data assets and view structural metadata
- Catalog data assets, view structural metadata, preview data, and run queries against data assets.
This is where Snowflake offers two other schemas, ACCOUNT_USAGE
and READER_ACCOUNT_USAGE
, in addition to the INFORMATION_SCHEMA
, that contain database metadata. You can choose one or the other depending on what your data catalog supports.
Grant access for crawling metadata
Using the ACCOUNT_USAGE
schema, you’ll need to grant additional access to DATABASES
, SCHEMATA
, TABLES
, VIEWS
, COLUMNS
, and PIPES
for the data catalog to be able to fetch metadata. You can also use one or more of Snowflake’s default ACCOUNT_USAGE
roles: OBJECT_VIEWER
, USAGE_VIEWER
, GOVERNANCE_VIEWER
, and SECURITY_VIEWER
. Similarly, to use the READER_ACCOUNT_USAGE
schema for fetching query history, login history, etc., you can assign the READER_USAGE_VIEWER
role to your data_catalog_role
.
Given that using INFORMATION_SCHEMA
is the most common and extensive method for crawling metadata; let’s look at it in more detail. Using the INFORMATION_SCHEMA
, you can use the following GRANT
statements to provide crawling access for current data assets to the data_catalog_role
:
GRANT USAGE ON DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT USAGE ON ALL SCHEMAS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL EXTERNAL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON ALL MATERIALIZED VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT SELECT ON ALL STREAMS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT MONITOR ON PIPE "<pipe_name>" TO ROLE data_catalog_role;
The aforementioned GRANT
statements assign the data_catalog_role
privileges to fetch the metadata for all the current data assets. To be able to bring the metadata for all the future data assets, too, you’ll need to specify the explicitly FUTURE
keyword in the GRANT
statements, as shown below:
GRANT USAGE ON FUTURE SCHEMAS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE EXTERNAL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT REFERENCES ON FUTURE MATERIALIZED VIEWS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT SELECT ON FUTURE STREAMS IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT MONITOR ON FUTURE PIPES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
If you stop here, you’ll be in a position to list all data assets for the <database_name>
database, but you won’t be able to preview or query the data from your data catalog.
Grant access for previewing and querying data
Many data catalogs, such as Atlan, provide you with an integrated IDE so that you can preview the data and even work on it by writing queries, saving them, and sharing them with your team. You can allow access to previewing and querying data from your data catalog by running the same statements that you ran in the previous step with a minor difference. Instead of GRANT REFERENCES
, you’ll need to GRANT SELECT
on TABLES
, EXTERNAL TABLES
, VIEWS
, and MATERIALIZED VIEWS
. Note that you’ll need to do the same thing twice, once for current data assets and once for future data assets. Here’s an example of granting SELECT
on TABLES
, present and future:
GRANT SELECT ON ALL TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
GRANT SELECT ON FUTURE TABLES IN DATABASE "<database_name>" TO ROLE data_catalog_role;
Now, you should be able to preview and query your Snowflake data from the data catalog.
Consider granting other relevant privileges
There are a host of other grants and privileges that you can assign to the data_catalog_role
. What you need will eventually depend on the capabilities of your data catalog. For example, if your data catalog has data lineage capabilities, you might need to grant enhanced privileges, such as IMPORTED PRIVILEGES
to the SNOWFLAKE
database to the data_catalog_role
.
Step 5: Assign the role to the database user #
Once you’re done assigning all the relevant permissions to the role, you’ll need to assign the role to the data_catalog_role
using the following GRANT
statement:
GRANT ROLE data_catalog_role TO USER data_catalog_user;
You should now be ready to connect to your Snowflake account from your data catalog.
Step 6: Configure the Snowflake connector and start crawling data #
Once you’ve run the GRANT
statements, you’ll be in a position to test whether you’re able to fetch data from your data catalog. To configure the Snowflake connector in your data catalog, you must provide the connector with the database credentials.
Most connectors can test the connection before you go on to the next step and start crawling the data from your Snowflake database. If you fail to connect to your Snowflake warehouse, you’ll probably need to check if any networking or security considerations were missed. You can use the SnowCD (Snowflake Connectivity Diagnostic) tool to evaluate your network connectivity.
Once you resolve any connectivity issues, you can start crawling data. Most data catalogs provide you with an option to run the crawler in three different ways:
- Ad-hoc crawl (manual crawl using a CLI command or the data catalog console)
- Scheduled crawl (E.g., based on a cron expression)
- Event-based crawl (E.g., crawl triggered from an event that the data catalog can listen to)
After putting your Snowflake data into your data catalog, you can identify other data sources and connect them with your data catalog.
Business outcomes from cataloging Snowflake data #
Setting up a data catalog for Snowflake tremendously helps your business derive value from your data as it will allow you to:
- Search and discover all your data assets in Snowflake using a visual interface; some catalogs, like Atlan, will also have a rich IDE for you to work with your data.
- Tag, classify, and govern your data assets and enrich them with business context — all from a central location.
- Do more with your data with add-on features like data lineage, data quality & profiling, and observability.
There are many more things that implementing a data catalog for Snowflake solves. Head over to this article to know more.
How to deploy Atlan for Snowflake #
Atlan is an active metadata platform that solves for data cataloging, search, and discovery for your Snowflake data platform among other data tools. It also provides you with a rich IDE to work with your Snowflake data directly, making it a one-stop shop for all your data needs. To set up a data catalog for Snowflake in Atlan, you can go through the following steps:
- Create role in Snowflake
- Create a user
- Grant role to the user
- Choose the metadata fetching method
- Grant permissions
- Allowlist the Atlan IP
Snowflake Data Catalog Setup: Related reads #
- Snowflake Cortex: Everything We Know So Far and Answers to FAQs
- Snowflake Copilot: Here’s Everything We Know So Far About This AI-Powered Assistant
- Polaris Catalog from Snowflake: Everything We Know So Far
- Snowflake Cost Optimization: Typical Expenses & Strategies to Handle Them Effectively
- Snowflake Horizon for Data Governance: Here’s Everything We Know So Far
- Snowflake Data Cloud Summit 2024: Get Ready and Fit for AI
- How to Set Up a Data Catalog for Snowflake: A Step-by-Step Guide
- How to Set Up Snowflake Data Lineage: Step-by-Step Guide
- How to Set Up Data Governance for Snowflake: A Step-by-Step Guide
- Snowflake + AWS: A Practical Guide for Using Storage and Compute Services
- Snowflake X Azure: Practical Guide For Deployment
- Snowflake X GCP: Practical Guide For Deployment
- Snowflake + Fivetran: Data movement for the modern data platform
- Snowflake + dbt: Supercharge your transformation workloads
- Snowflake Metadata Management: Importance, Challenges, and Identifying The Right Platform
- Snowflake Data Governance: Native Features, Atlan Integration, and Best Practices
- Snowflake Data Dictionary: Documentation for Your Database
- Snowflake Data Access Control Made Easy and Scalable
- Glossary for Snowflake: Shared Understanding Across Teams
- Snowflake Data Catalog: Importance, Benefits, Native Capabilities & Evaluation Guide
- Snowflake Data Mesh: Step-by-Step Setup Guide
- Managing Metadata in Snowflake: A Comprehensive Guide
- How to Query Information Schema on Snowflake? Examples, Best Practices, and Tools
- Snowflake Summit 2023: Why Attend and What to Expect
- Snowflake Summit Sessions: 10 Must-Attend Sessions to Up Your Data Strategy
Share this article