Snowflake Data Catalog: Expanding Native Functionality with Unified Metadata Control Plane

Updated October 29th, 2024

Share this article

Snowflake’s native data cataloging capabilities are centered around the ACCOUNT_USAGE schema, which serves as its primary technical data catalog.

Snowflake extends this functionality through integrations with catalog backends such as AWS Glue, Iceberg REST, object storage, and Polaris.
See How Atlan Simplifies Data Cataloging – Start Product Tour

While these tools provide great solutions for managing core technical metadata, they fall short when it comes to delivering a unified view across your entire data ecosystem. For a comprehensive metadata strategy, you need a broader solution that unifies metadata from Snowflake and other systems into a centralized control plane.


Table of contents #

  1. Snowflake’s native data cataloging features
  2. Atlan as the control plane for Snowflake
  3. How organizations making the most out of their data using Atlan
  4. Summary
  5. FAQs on Snowflake data catalog
  6. Snowflake data catalog: Related Resources

Snowflake’s native data cataloging features #

Snowflake provides several native data cataloging features, enabling you to leverage collected metadata for searching, discovering, and governing all your data assets. These features focus on the following key areas:

  • Foundational technical data catalog with all the structural and operational data
  • Logging and auditing all the activity within the Snowflake ecosystem to monitor security and cost, among other things
  • Data classification, quality metrics, object tagging, etc., covering the aspects of governance, quality, and lineage

Snowflake stores and manages both technical and operational metadata within the ACCOUNT_USAGE schema under the SNOWFLAKE database.

Additional schemas, such as DATA_SHARING_USAGE, MONITORING, and TELEMETRY, capture metadata for specific use cases. Technical metadata encompasses the structure, definitions, and properties of Snowflake objects like tables, views, policies, functions, and stored procedures. In contrast, operational metadata focuses on Snowflake’s internal management of infrastructure, data processing, movement, and other operations.

Snowflake also provides fine-grained logging of data access and usage across your organization. However, Polaris, the open-source version of Snowflake’s catalog, lacks full feature parity with the Snowflake catalog. Some features, such as controlling row- and column-level access through an RBAC model, are still in early stages of development within Polaris.

All metadata captured by Snowflake is stored in a comprehensive metadata database, allowing you to derive query analytics, data lineage, and quality metrics, among other insights. Additionally, metadata for Snowflake features such as data classification, data quality and monitoring metrics, object tagging, and masking policies is carefully tracked.

However, Snowflake’s data cataloging features primarily serve as a technical data catalog or, at best, a data dictionary with basic business descriptions, tagging, and categorization.

What it doesn’t provide is a full-fledged metadata control plane that integrates with all systems in your data stack, offering complete control over the search, discovery, and governance of your data assets. This is where Atlan comes into play as the unified metadata control plane for your entire data ecosystem.

Snowflake’s latest advancement, the Polaris Catalog, represents a significant shift in data cataloging capabilities by enabling open-source, cross-platform data management.

Launched in July, 2024, Polaris supports Apache Iceberg, an open table format increasingly popular for its ability to manage data stored across various compute engines, including Apache Flink, Spark, Trino, and others.

Snowflake’s goal with Polaris is to create a unified, vendor-neutral catalog that aligns with its commitment to interoperability, allowing organizations to manage data within Snowflake alongside other platforms like AWS, Microsoft Azure, and Google Cloud without traditional data "lock-in”.


Atlan as the control plane for Snowflake #

The rapid growth of data tools and technologies, along with the explosion in data volume, has triggered an evolution of metadata into big data. This shift makes a Lakehouse approach to metadata essential for gaining full control and maximizing its value.

Such an approach enables the creation of advanced capabilities, including complex automation workflows and end-to-end data ecosystem enablement, all built on top of a lakehouse-driven metadata control plane. Atlan has embraced this approach to handle Snowflake metadata effectively.

Atlan leverages all of Snowflake’s metadata, as discussed in the previous section. While Snowflake captures and manages this core data, Atlan enables you to extract value from it by helping you find relevant, trustworthy data assets, collaborate with peers through shared workspaces, and ensure data sharing is in compliance with your organization’s policies. These capabilities are crucial in today’s data- and AI-driven world.

Bringing cataloging, discovery, and governance together #


Built on top of the crawled metadata, Atlan offers several features that help you seamlessly discover, trust, and govern your AI-ready data:

  • Discovery – Provides an intuitive user interface that allows you to search and discover data assets across all your source and target systems, including Snowflake, by tapping into its internal metadata layer.
  • Governance – Enables data governance based on your organization’s operational model, extending Snowflake’s native governance features. It supports different governance models for various teams, accommodating compliance and regulatory requirements.
  • Classification – Enhances data discovery and trust by applying certification and verification tags to data assets. This feature integrates with Snowflake’s native data classification and tagging, using a two-way syncing mechanism for Snowflake tags.
  • Ownership – Facilitates the design and implementation of a data asset ownership model across your organization, which is particularly useful for data mesh architectures. While Snowflake assigns OWNERSHIP to securable objects, Atlan’s ownership model applies uniformly across your entire data ecosystem.
  • Freshness – Automatically builds trust by signaling the freshness of data. This feature leverages Snowflake’s object metadata and customized queries to infer data freshness in tables or views.
  • Cost Optimization – Uses usage pattern metadata to identify dormant or stale data assets, automating their deprecation to save on storage and compute costs. This feature helps streamline your data ecosystem by reducing unnecessary data complexity.
  • Lineage – Maps the flow of data throughout your platform, providing visibility into how data is transformed from ingestion to consumption. Atlan extracts lineage metadata from Snowflake using both the ACCOUNT_USAGE schema and the INFORMATION_SCHEMA.

Additionally, Atlan supports use cases such as automation, business glossary management, metadata activation, and personalization, which you can explore further here.


How organizations making the most out of their data using Atlan #

The recently published Forrester Wave report compared all the major enterprise data catalogs and positioned Atlan as the market leader ahead of all others. The comparison was based on 24 different aspects of cataloging, broadly across the following three criteria:

  1. Automatic cataloging of the entire technology, data, and AI ecosystem
  2. Enabling the data ecosystem AI and automation first
  3. Prioritizing data democratization and self-service

These criteria made Atlan the ideal choice for a major audio content platform, where the data ecosystem was centered around Snowflake. The platform sought a “one-stop shop for governance and discovery,” and Atlan played a crucial role in ensuring their data was “understandable, reliable, high-quality, and discoverable.”

For another organization, Aliaxis, which also uses Snowflake as their core data platform, Atlan served as “a bridge” between various tools and technologies across the data ecosystem. With its organization-wide business glossary, Atlan became the go-to platform for finding, accessing, and using data. It also significantly reduced the time spent by data engineers and analysts on pipeline debugging and troubleshooting.

A key goal of Atlan is to help organizations maximize the use of their data for AI use cases. As generative AI capabilities have advanced in recent years, organizations can now do more with both structured and unstructured data—provided it is discoverable and trustworthy, or in other words, AI-ready.

Tide’s Story of GDPR Compliance: Embedding Privacy into Automated Processes #


  • Tide, a UK-based digital bank with nearly 500,000 small business customers, sought to improve their compliance with GDPR’s Right to Erasure, commonly known as the “Right to be forgotten”.
  • After adopting Atlan as their metadata platform, Tide’s data and legal teams collaborated to define personally identifiable information in order to propagate those definitions and tags across their data estate.
  • Tide used Atlan Playbooks (rule-based bulk automations) to automatically identify, tag, and secure personal data, turning a 50-day manual process into mere hours of work.

Book your personalized demo today to find out how Atlan can help your organization in establishing and scaling data governance programs.


Summary #

The modern data ecosystem is increasingly diverse, often spanning multiple cloud platforms and on-premises environments. Managing governance, compliance, and context across these systems requires a unified control plane—a single go-to platform that addresses all of your organization’s data needs, regardless of the data consumer’s role. Atlan provides this unified control plane, seamlessly integrating with data ecosystems centered around Snowflake or any other SaaS or PaaS data platform. By doing so, Atlan becomes a critical component in helping your organization unlock the full value of its data.


FAQs on Snowflake data catalog: #

What is a “Snowflake data catalog”? #


A Snowflake data catalog is a structured collection of metadata within the Snowflake platform that helps users organize, discover, and manage data assets. Utilizing schemas like ACCOUNT_USAGE, the data catalog provides technical and operational insights, supporting data governance, security, and quality control across Snowflake.

How does Snowflake catalog data? #


Snowflake catalogs data through its ACCOUNT_USAGE schema and other dedicated schemas like DATA_SHARING_USAGE, MONITORING, and TELEMETRY. These schemas capture technical metadata about data structure and properties and operational metadata on usage, access, and infrastructure, enhancing data tracking and management.

What benefits does a data catalog provide in Snowflake? #


A data catalog in Snowflake centralizes metadata, making it easier to search and manage data assets. This leads to improved data governance, data quality monitoring, streamlined compliance, and efficient resource usage through better data discoverability and transparency.

How secure is Snowflake’s data catalog? #


Snowflake’s data catalog is secure, providing fine-grained access logging, auditing, and role-based access control (RBAC). These features help maintain strict governance over data access and usage, ensuring sensitive data is only accessible to authorized users.

How does a data catalog enhance data governance in Snowflake? #


Snowflake’s data catalog enhances data governance by organizing metadata and providing tools for classification, logging, and tagging. This structured approach helps in tracking data lineage, auditing data access, and managing compliance, thereby supporting robust data governance practices.

Can I automate data cataloging in Snowflake? #


Yes, data cataloging in Snowflake can be automated using integrations with third-party tools like Atlan, which helps unify and streamline metadata collection across platforms, reducing manual tasks and enhancing data catalog accuracy.



Share this article

[Website env: production]