Snowflake Metadata Management — Discovery, Lineage, and Governance

July 31th, 2022

header image for Snowflake Metadata Management — Discovery, Lineage, and Governance

Activate your Snowflake metadata with Atlan

Identify, discover, search, and access your snowflake data better with Atlan active metadata platform. Metadata helps improve the familiarity, findability, understandability, accuracy, and trustworthiness of Snowflake data assets.


What is metadata management?

Metadata management is the key to adding actionable context to the assets in your Snowflake data warehouse.

Metadata management is a part of the data governance discipline which involves processes, policies, workflows, and technology to identify, organize, and surface metadata to data consumers.

Modern metadata management doesn’t stop with just defining data and making it accessible, it also solves DataOps use cases such as workflow management, automation, observability, tool integration, and change management.


[Download ebook] What is Active Metadata and Why Does it Matter?


What is Snowflake?

Snowflake is a cloud-native data warehouse primarily used for batch data ingestion and data analytics of both structured and unstructured data from diverse sources.

Quoting Snowflake’s website:

Snowflake is a fully managed service that’s simple to use but can power a near-unlimited number of concurrent workloads. Snowflake is your solution for data warehousing, data lakes, data engineering, data science, data application development, and securely sharing and consuming shared data.

The crux of what makes Snowflake different from other warehouses is it decouples both storage and compute. This means you can spin up and down machines on demand based on the analytics workload.

Snowflake uses cloud-based object storage solutions such as AWS S3, Azure, and GCP for data storage. Some of the customers of Snowflake include: Dropbox, Doordash, Hubspot, Adobe, and fitbit.

Learn more - Best cloud data warehouse solutions: A comparison and evaluation guide

Snowflake Architecture diagram

Snowflake Architecture diagram. Source: Snowflake


Importance of Snowflake metadata management

Faster access to insights: Metadata management makes it easy to search, filter, and find data assets by various criteria.

End-to-end visibility: Metadata gives you complete visibility into the lifecycle of a data asset — its source, where it is used, who uses it, the transformations on the data, etc.

Improved data quality and trust: Metadata helps evaluate, understand and trust the data for its relevancy and fit for use.

Improved operational efficiency: Metadata enables engineers to better design workflow automation — crawling, ingestion, and ETL. Metadata not only helps surface data issues but also helps resolve them by enabling root cause analysis.

Compliance and regulations: Metadata management helps audit the implementation of regulatory policies to meet compliance standards (GDPR, HIPPA)

Improves ROI on data: Improved productivity and operational efficiency help create more ROI from data management. The self-serve nature of modern metadata management increases the opportunities to extract more value from the data assets.

Snowflake metadata management: Storage and access

Snowflake stores all the metadata in a centralized component called Cloud Services.

Snowflake automatically creates metadata for data residing both externally (S3, Azure, GCP) and internally (within Snowflake), stores it as a key-value pair (dictionary), and makes it available via the Information Schema.

The Information Schema provides metadata that can be broadly classified into:

Views: Provide metadata about the database itself — Schema, views, tables, columns, file types, referential constraints, and usage/access privilege.

Table functions: Provide metadata about historical information and usage — Database usage and refresh history, query history, policy history, and warehouse load and metering history.

Snowflake Metadata Management: Challenges

Even though Snowflake exposes metadata via INFORMATION_SCHEMA , the only way to access the schema/metadata is by writing SQL queries. This means the metadata is not accessible to everyone, say a typical business user.

As the definition of what is considered metadata is expanding, metadata exposed by Snowflake might be very limiting. If you consider bringing in custom metadata from ETL logs, quality checks, and pipeline error alerts then you might need a dedicated metadata management solution to tap into full potential of metadata.

Even though Snowflake restricts access to sensitive data through DAC and RBAC, the complex requirements of modern data teams require more granular and automated ways to classify PIIs and propagate them downstream.


A Guide to Building a Business Case for a Data Catalog

Download free ebook


Atlan: Active metadata management for Snowflake

Gone are the days when the scope of what is considered metadata is limited — information about schemas, tables, views, models — and when metadata remained passive, siloed, and incomplete.

The crux of active metadata is its openness and interoperability. Atlan makes it possible to effortlessly move metadata across your data stack — data lakes, warehouses, BI tools, pipelines, ETL — and helps embed rich context on the tools you are already familiar with.

Always on: Active metadata is a framework that continuously listens, collects, and processes metadata from ETL logs, SQL query history, quality metrics, and usage statistics. Atlan’s open API infrastructure allows you to expand the scope of what is traditionally considered to be metadata. This opens up a whole lot of metadata use cases such as discovery, lineage, observability, and monitoring.

Intelligent and action-oriented: Atlan automates building and updating lineage by parsing SQL queries. Use the API to build your own bots to update asset descriptions; classify, tag, and propagate sensitive assets such as PII, HIPAA, and GDPR; alert end users on stale and anomalous data assets.

Learn more - Active metadata: The key building block of a modern data stack

A framework for an active metadata platform. Source: Atlan

A framework for an active metadata platform. Source: Atlan


Snowflake Metadata Management with Atlan



Snowflake metadata management use cases

Data Catalog

Metadata adds value to your Snowflake data assets by giving structure and meaning and thereby making identification and discovery easier. Metadata is the engine behind the rich search experience. It enables the users to find the right data faster, creating a positive user experience that directly improves trust and user adoption of data catalogs.

Snowflake metadata use case: Data discovery through data catalogs. Source: Atlan

Snowflake metadata use case: Data discovery through data catalogs. Source: Atlan

Data Lineage

Data lineage leverages Snowflake metadata to track and visualize the journey of your data assets— from ingestion to BI dashboards. Lineage enables data to be discovered at all points within its lifecycle. Data lineage provides the visibility needed to trace and troubleshoot data quality issues and fix broken pipelines using root cause analysis.

Snowflake metadata use case: Track the data from its source to BI dashboard through Lineage. Source: Atlan

Snowflake metadata use case: Track the data from its source to BI dashboard through Lineage. Source: Atlan

Data Dictionary

Snowflake database storage provides a rich source of metadata. A data dictionary uses metadata to describe the data. A data dictionary is a documentation for all data assets in snowflake. It provides information like table names, table descriptions, relationships; column names and their referential constraints; data types, classifications, data profiling, and SQL queries attached with the data asset.

Snowflake metadata use case: Documentation for your database through data dictionary. Source: Atlan

Snowflake metadata use case: Documentation for your database through data dictionary. Source: Atlan

Governance

Metadata provides the framework for automating data governance and compliance for the Snowflake warehouse. Atlan helps automate classifying sensitive (PII) and private user data. The classification is then propagated both downstream and upstream using data lineage. Atlan also helps establish data access controls at an individual user and group/role level.

Snowflake metadata use case: Automate classification and access control through data governance. Source: Atlan

Snowflake metadata use case: Automate classification and access control through data governance. Source: Atlan

Collaboration

No more frustration searching for table context across applications. Link Slack threads and Jira discussions directly to the data asset in contention — let everyone be on the same page. Collaboration establishes trust, encourages participation which directly drives adoption, and encourages finding new opportunities to extract value from data.

Snowflake metadata use case: Embedded collaboration with data assets and team members in the tools you are familiar with. Source: Atlan

Snowflake metadata use case: Embedded collaboration with data assets and team members in the tools you are familiar with. Source: Atlan

DataOps

Snowflake metadata helps DataOps engineers to design and architect data pipelines and manage data quality. Atlan uses metadata to help detect and surface data anomalies and alert downstream business users. Orchestration and ETL logs as metadata are used to optimize and fix data flow issues. Provenance as metadata is used to find, retire and archive unused data sources and thereby reducing storage and compute costs.

Atlan active metadata management use cases. Source: Atlan

Atlan active metadata management use cases. Source: Atlan


Casestudy: Atlan + Snowflake — Metadata Management At Wework



Types of metadata supported by Atlan

Technical metadata

It is metadata about the Snowflake database itself. It describes schemas, tables, columns, size, data types, relationships, classification, referential integrity, etc.

Operational metadata

It gives information about transformations, SQL queries, ETL logs, pipeline error notifications, and data quality audit results.

Business metadata

It is a collection of business terms, definitions, KPIs, and metrics that helps associate and understand the physical data assets linked with them.

Social metadata

It is metadata generated by the users of data, which includes: Chat messages, tasks, tickets, notes, READMEs, upvotes, verifications, and shared SQL queries

Learn more: Types of metadata: How each helps with faster data discovery and better insights

Bring your own custom metadata from ETL, pipeline orchestration and observability tools. Source: Atlan

Bring your own custom metadata from ETL, pipeline orchestration and observability tools. Source: Atlan

Atlan: A Snowflake validated metadata management solution

Atlan is the first data catalog and metadata management solution validated by Snowflake’s technology validation program.

Atlan is more than a metadata management and data cataloging tool. Atlan is built by data engineers solving for the evolving needs of the modern data teams which include faster discovery, transparent data flow, robust governance, and collaboration built on open infrastructure and an easy-to-use user interface.

The deep integration and the open API enable Atlan to solve other modern metadata use cases such as DataOps, workflow management, and pipeline automation.

Atlan has been named a leader in The Forrester Wave™: Enterprise Data Catalogs for DataOps, Q2 2022.

The report states,

“Atlan is the tool of choice for DataOps and data product deployment. Atlan’s vision is to create frictionless data product deployment through a single metadata and data automation platform.”


Free Guide: Find the Right Data Catalog in 5 Simple Steps.

This step-by-step guide shows how to navigate existing data cataloging solutions in the market. Compare features and capabilities, create customized evaluation criteria, and execute hands-on Proof of Concepts (POCs) that help your business see value. Download now!