A Snowflake data catalog can make your data cloud assets easy to search, discover, govern, access, and use.
Modern data catalogs are equipped with searchable business glossary, cross-system automated data lineage, granular row and column-level access controls, visual query builders, and more. These capabilities are essential to help you democratize your Snowflake data warehouse and make sense of the data.
We’ll explore the need for a Snowflake data catalog, evaluation criteria for modern data catalogs, and essential capabilities that make cataloging a breeze.
What is a Snowflake Data Catalog?
A Snowflake data catalog acts as the access, control and collaboration plane for your Snowflake data assets.
The Snowflake data cloud has made large-scale data compute and storage easy and affordable. However, exploring all that data, profiling it and knowing how to use it isn’t straightforward.
That’s why it’s important to set up a data catalog to make an inventory of all the tables and views within Snowflake, summarize the context behind each asset, and navigate through them effortlessly.
Read more → Data catalog 101
Download → Forrester Wave™: Enterprise Data Catalog for DataOps, Q2 2022
Why is a data catalog important for Snowflake data cloud?
Most organizations use Snowflake to house numerous databases with several thousand tables, columns, and views.
Data pours in every second from various applications. Several data consumers across teams use this data to answer their questions and make decisions.
In such a scenario, it’s crucial to know:
- What data you have and where does it come from?
- What does each asset mean?
- Who’s using what data and for what purpose?
- When was a table or a column last updated?
- Which are the most and least used columns?
- How are the various data assets related?
- If you run a query or transform data, which applications, dashboards, or reports will get affected?
The best way to answer these questions is to organize all of your data assets, along with the metadata, under one roof. That’s where a data catalog for the Snowflake data cloud can help.
The benefits of setting up a catalog for your Snowflake data cloud
Here are some of the benefits of setting up a Snowflake data catalog:
- Find the right data and metadata with easy data search and discovery
- Get end-to-end visibility with a 360-degree profile and comprehensive glossaries for every data asset
- Trace and track data flow with automated, cross-system lineage
- Eliminate data silos and improve data collaboration with modern data cataloging capabilities such as chats, notes, READMEs, tags, and shareable SQL queries
- Establish proper data governance with granular role-based access controls, automatic PII classification and tagging, and auto-propagation of policies through lineage
Read more → The many benefits of a data catalog
Data Catalog 3.0: The Modern Data Stack, Active Metadata, and DataOps
Does Snowflake come with native data catalog capabilities?
There aren’t any native data cataloging capabilities in Snowflake. Before proceeding, let’s see how Snowflake handles metadata — the building block of a modern data catalog.
State of metadata in Snowflake
Metadata provides more context on data and the most popular types include technical, operational, and business metadata. It’s the glue that brings data teams together.
Various tools in the data stack use different fields to track all that metadata. For Snowflake, these include three popular types of metadata fields:
- Object types: Any asset on Snowflake is called an object. So, object types can include schema tables, view functions, users, roles, tags, and account databases, to name a few.
- Object definitions: These include user-defined and external functions, and policies for masking, row access, or sessions.
- Object properties: These include column names, comments, and tag values.
Snowflake also maintains other metadata, such as queries and schema of files in internal stages.
Now let’s see how Snowflake manages metadata to support data discovery and lineage.
Data discovery and lineage in Snowflake
Snowflake supports data discovery using:
- Account Usage Views: Account usage views include metadata such as object type and usage metrics
- Information Schema: A read-only data dictionary with a list of table functions and views for all objects, along with the object type
While both schemas sound similar, the difference is in the kind of objects they include and the retention period.
Snowflake also enables lineage via access history logs and object dependencies (to show the downstream impact of data transformations).
What’s missing in Snowflake’s existing data discovery capabilities
However, there are two key challenges:
- You need an engineer or a Snowflake expert to query the required data in the form of Account usage views or Information Schema. This approach isn’t self-serve or scalable.
- The interface isn’t user-friendly and doesn’t house all data (i.e., non-Snowflake data) under one roof
That’s why setting up an enterprise data catalog for the Snowflake data cloud is vital to understand and use data properly. So, let’s explore the key tenets of an ideal data catalog for Snowflake.
The Ultimate Guide to Evaluating an Enterprise Data Catalog
Essential components of the best data catalogs for Snowflake
The ideal data catalog for Snowflake would be the home for all kinds of metadata — technical, business, operational, social, and custom. Moreover, the data catalog should support:
The catalog should integrate natively with Snowflake and fetch metadata from either the Information Schema or Account Usage Views. The entire setup should take minutes, not months.
Keyword search for data discovery
The catalog should act as a single source of truth for all of your assets. In addition, searching for data assets should be intuitive (think Google Search) and come with recommendations and advanced filtering capabilities. Moreover, you should be able to search through metrics, glossaries, dashboards, READMEs, and more.
The catalog should be equipped with a business glossary that offers 360° context for every data asset. A business glossary is a knowledge network for your business, where you can create and interpret relationships between definitions, metrics, and assets.
Automated column-level lineage
The catalog should offer column-level data lineage to trace data flow, transformations, and impact on downstream applications for all data — Snowflake and non-Snowflake assets. You can also propagate policies through the visual lineage map — for instance, a “Critical” tag or a column description from your dashboard to upstream source tables.
The best data catalog for Snowflake is one that weaves into your daily workflows, making it easy to share data and request access to critical assets. With embedded collaboration, you can leave notes for your teammates, raise support tickets, and look up metadata at a glance, without leaving the catalog platform.
Active data governance
A decentralized, community-led approach to data governance is the key to making it work. The data catalog should support automatic classification, tagging, and masking of sensitive data assets and auto-propagation of policies through lineage mapping. Moreover, active data governance ensures that you can customize your policies depending on data domains, user roles, and projects.
A demo of Atlan Data Catalog for Snowflake
Snowflake data catalog tools
Different kinds of data catalog tools are available in the market for the Snowflake data cloud. While you can categorize them based on their capabilities, architecture, and more, we’ll look at it from the lens of metadata — active vs passive metadata management.
Active vs passive catalog tools
- Passive data catalogs: Passivemetadata is mostly technical metadata (i.e., schema, data types, models, owner name, and so on). Passive data catalogs bring metadata from various tools and house it in yet another tool, which becomes yet another silo — akin to “expensive shelfware”.
- Active data catalogs: Active metadata tells you everything that happens to a data asset. This includes descriptive metadata — operational, business and social, in addition to technical metadata. Active data catalogs support two-way movement of metadata and send enriched metadata back into every tool in the data stack. So, you don’t have to switch between apps and instead, find context using the tools that are already a part of your daily workflows.
Read more → The future of data catalogs is active
Here’s a table summarizing the differences between active and passive data catalogs.
|Aspect||Passive data catalog||Active data catalog|
|Self serve||You need a technical expert to run queries that pull the data you need and grant you access. This process can take days, weeks, or even months in large enterprises.||Any user can search for the data they need via a Google-like interface and get all the context via 360-degree data asset profiles, business glossary, data quality metrics, and more.|
|Business use cases||Passive data catalogs mostly consolidate technical metadata, which can be difficult to interpret for business users. Moreover, the technical experts lack the necessary business context to match the needs of business users, leading to multiple discussions across several teams.||Active data catalogs can be built as per your needs — based on business domains, projects, or even user roles. This approach ensures that the various tools, systems, and teams talk to each other, getting rid of data silos and making sure that every data asset comes with the necessary business context.|
|Collaboration||Sharing data and discussing the various fields within tables involves multiple back-and-forth across various communication channels.||Active data catalogs integrate within your daily workflow. So, you can discuss columns on Slack, share tables with a link, and offer suggestions to enrich the context of each data asset.|
|Automation||Several tasks in passive data catalogs — adding tags, configuring access policies, masking sensitive assets — are manual. This approach isn’t practical or scalable.||Active data catalogs automate data classification, tagging, policy propagation, and compliance requirements with programmable bots and end-to-end lineage mapping. This simplifies data quality checks, data governance, and regulatory compliance.|
How to evaluate a data catalog tool for Snowflake
You should start the evaluation process by developing an evaluation criteria framework that maps your needs and helps you rank the tools available in the market.
A key criterion should be native integration with your Snowflake data cloud.
The next step is to check out demos of your shortlisted solutions and execute proofs of concept (POCs) that test your top use cases.
While running the POCs, you should look at:
- The tool’s architecture
- The setup process
- The tool’s ability to crawl metadata from the Snowflake data cloud and other tools in your modern data stack
- Ease of adoption for technical and business users
It’s best to talk to the service provider as much as possible to clarify all of your concerns.
Are you looking to implement a data discovery and data catalog solution for your organization — you might want to check out Atlan.
Snowflake data catalog: Related resources
- Snowflake data dictionary — Documentation for your Snowflake data warehouse
- How to manage data governance for Snowflake data warehouse
- Snowflake data access control made easy and scalable
- Glossary for Snowflake — Shared understanding across teams