Snowflake Data Governance: Features, Frameworks & Best practices

Updated August 01st, 2023
header image

Share this article


Quick answer:

TL;DR? Here are the highlights of this article and what to expect from it:

  • Snowflake offers data governance capabilities such as:
    • Column-level security
    • Row-level access
    • Object tag-based masking
    • Data classification
    • Oauth
  • Data governance in Snowflake can be improved with a Snowflake-validated data governance solution. Such a solution would:
    • Handle governance for data from multiple sources (non-Snowflake)
    • Enable data lineage
    • Enhance data discovery
    • Embed collaboration
    • Empower cross-functional teams
  • This article delves into the specifics of data governance for Snowflake assets and improving it so that you can manage your entire data estate.


Atlan: A modern snowflake data governance workspace

Atlan is a single data governance plane for all your Snowflake data assets.

Atlan helps build a robust data governance system by:

  • Protecting sensitive data at scale.
  • Automating consistent data access policies across your entire data ecosystem.
  • Providing transparency into data lifecycle through lineage.
  • Establishing trust in data through context and collaboration.

Table of contents

  1. What is Snowflake?
  2. What is data governance in Snowflake?
  3. Benefits of governing Snowflake data assets
  4. Data governance capabilities in snowflake
  5. 5 key challenges of implementing data governance in Snowflake
  6. Snowflake data governance with Atlan
  7. Atlan: A Snowflake validated data governance solution

What is Snowflake?

Snowflake’s platform enables a wide variety of workloads and applications on any cloud, including data warehouses, data lakes, data pipelines, and collaboration as well as business intelligence, data science, and data analytics applications.

Quoting Snowflake’s website:

Snowflake is a fully managed service that’s simple to use but can power a near-unlimited number of concurrent workloads. Snowflake is your solution for data warehousing, data lakes, data engineering, data science, data application development, and securely sharing and consuming shared data.

Snowflake stands out because it decouples both storage and compute. This means you can spin up and down machines on demand based on the analytics workload.

Quoting Snowflake from their S-1 form,

Our platform solves the decades-old problem of data silos and data governance. Leveraging the elasticity and performance of the public cloud, our platform enables customers to unify and query data to support a wide variety of use cases. It also provides frictionless and governed data access so users can securely share data inside and outside of their organizations, generally without copying or moving the underlying data.

Snowflake is a cloud-agnostic platform that can distribute data across regions as well as across cloud providers such as AWS, Azure, and GCP for data storage. Some of the customers of Snowflake include Dropbox, Doordash, Hubspot, Adobe, and Fitbit.

Snowflake Architecture diagram

Snowflake Architecture diagram. Source: Snowflake



What is data governance in snowflake?

Data governance is a set of standards, procedures, models, and guidelines around people, processes, and technologies that detail how data is to be properly managed, accessed, and used. Good data governance helps ensure the availability, quality, integrity, and security of organizational data.

Quoting The Data Governance Institute(DGI):

Data Governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.


Benefits of governing Snowflake data assets

Quoting Mckinsey & Company:

Leading firms have eliminated millions of dollars in cost from their data ecosystems and enabled digital and analytics use cases worth millions or even billions of dollars. Data governance is one of the top three differences between firms that capture this value and firms that don’t.

Here are the benefits of governing Snowflake data assets:

Data quality


Data governance ensures that data throughout its lifecycle is accurate, consistent, fresh, and complete.

Data access


Data governance helps provide flexible, and scalable access policies to data users.

Security and compliance


Data governance monitors and reduces the risk of exposing private data. It enables data assets to be auditable across their life cycle and thus helping businesses comply with regulations like GDPR, HIPAA, etc.

Life cycle management


Governance helps set policies for data creation, data retention, and data deletion.

Trust


Governance improves the trust and confidence in data and thus helps increase the ROI of your data assets

Data quality and data availability: Key benefits of having a robust data governance program

Data quality and data availability: Key benefits of having a robust data governance program. Source: Mckinsey Digital



Data governance capabilities in Snowflake

Snowflake provides the following data governance features out of the box:


  1. Column-level security
  2. Row-level access policies
  3. Object tag-based masking policies
  4. Data classification
  5. Object dependencies
  6. Oauth

1. Column-level governance with data masking


Column-level governance lets you add data masking policy within a table or a view through Dynamic data masking and External tokenization.

Learn more: Column-level data governance for Snowflake tables and views

2. Row-level governance


Snowflake uses row-level policies to control what rows are returned in the query result — SELECT, UPDATE, DELETE, and MERGE statements.

Learn more: Row access policies for Snowflake data

3. Object tagging


Data stewards use tags to track sensitive data for security, privacy, compliance, discovery, and resource usage. Tags become powerful when you attach them with access policies. This helps manage and scale data governance in Snowflake easier.

Learn more: Object tag-based access policies

4. Data classification


Snowflake samples all data assets and classifies them as tags. The broad classification use cases include PII, data access, policies, and anonymization.

Learn more: Introduction to Snowflake data classification

5. Object dependencies


Object dependencies enable you to track dependencies within Snowflake data assets. Dependencies are useful for performing an impact analysis, being compliant, and maintaining data integrity.

Learn more: Object dependencies and their use-cases

6. OAuth


Snowflake supports both built-in and third-party OAuth for the authentication and authorization of users.

Learn more: Introduction to OAuth in Snowflake


5 key challenges of implementing data governance in Snowflake

The following can be listed as the 5 key challenges of implementing effective governance across snowflake data assets:

  1. Metadata scope limitations
  2. Governance for data from multiple sources (non-Snowflake)
  3. Lack of data lineage
  4. Data discovery challenges
  5. Governance management for non-technical / business users

1. Metadata scope limitations


The scope of what is considered and tracked as metadata is expanding, especially with the adoption of the modern data stack. Teams can now use ETL logs, data quality metrics, workflow errors, etc. as metadata to improve and automate data governance. So you’ll need a centralized metadata management tool to track and take action on metadata.

2. Governance for data from multiple sources (non-Snowflake)


Data governance, especially on the cloud, proliferates across multiple tools and processes ranging from ingestion, ETL, data quality, and business intelligence(BI). So the need for a centralized metadata and data governance platform is a must-have to get a complete hold on governance.

3. Lack of data lineage


Data governance in essence ensures that high-quality data exists for analysis throughout the life-cycle of the data. Lineage helps track this by helping visualize the journey of the data from the source to the dashboard.

4. Data discovery challenges


The two core components of a good data governance system are “availability” and “usability” of data. The lack of a data catalog and a business glossary in Snowflake might make it difficult to find, use, and collaborate on data.

5. Governance management for non-technical/business users


Data governance setup and management on Snowflake are entirely done through writing SQL queries. This makes it harder for non-technical/business users to use Snowflake for governance.

CREATE OR REPLACE TAG Classification;
ALTER TAG Classification set comment =
    "Tag Tables or Views with one of the following classification values:
    'Confidential', 'Restricted', 'Internal', 'Public'";

CREATE OR REPLACE TAG PII;
ALTER TAG PII set comment = "Tag Tables or Views with PII with one or more
    of the following values: 'Phone', 'Email', 'Address'";

CREATE OR REPLACE TAG SENSITIVE_PII;
ALTER TAG SENSITIVE_PII set comment = "Tag Tables or Views with Sensitive PII
    with one or more of the following values: 'SSN', 'DL', 'Passport',
	'Financial', 'Medical'";

Creating a PII tag classification on Snowflake by writing SQL queries. Source: The Snowflake Definitive Guide, by, Joyce Kay Avila.


Snowflake data governance with Atlan: Flexible and scalable

Atlan helps transform data governance from a complex, bureaucratic process into a simple, community-driven approach. With custom classification, programmable PII bots, and automated classification propagation, Atlan makes data governance management easy at scale.

Listed below are some of the key data governance use cases that Atlan helps solve:

Data discovery and search: Catalog


Atlan crawls and catalogs all your data assets in your Snowflake data warehouse and drives self-service data discovery — Atlan also crawls other data sources like Bigquery, Redshift, Databricks, etc.

Atlan’s Google-like search lets you find relevant data assets, documents, BI dashboards, and queries and understand the associated context between different data assets in a business-friendly user interface.

The powerful search filters let you slice and dice your search results based on data sources, tables, columns, business glossary, owners, and classification tags.

Learn more: How to search and discover assets in Atlan

Snowflake Data Governance: Atlan Data catalog facilitates metadata discovery across your Snowflake warehouse. Source: Atlan

Snowflake Data Governance: Atlan Data catalog facilitates metadata discovery across your Snowflake warehouse. Source: Atlan


A Guide to Building a Business Case for a Data Catalog

Download ebook


Single source of truth: Centralized documentation


With Atlan as the central metadata management system, there is no need for end-users to log into different systems to find and understand the data. This reduces the time to value of any data project.

Atlan’s data dictionary and a business glossary help crowdsource the tribal knowledge and create a unified taxonomy of data assets across your Snowflake warehouse. The business glossary maps physical data elements like databases, tables, columns, and SQL queries to business terms, definitions, metrics, KPIs, calculations, and reports.

Learn more: How to create and manage a business glossary in Atlan

A centralized knowledge bank that explains key business terms and concepts. Source: Atlan

Snowflake Data Governance: A centralized knowledge bank that explains key business terms and concepts. Source: Atlan

Transparency and Traceability: Data Lineage


Atlan’s automated data lineage helps businesses to meet regulatory requirements with ease. Key compliance needs like data provenance, data usage, data access, data transformation, and data archival/deletion can be tracked end-to-end from the source to the BI dashboard using data lineage.

Atlan actively parses SQL queries and builds column-level lineage. This granular visualization helps identify both upstream and downstream dependencies of a data asset. Data users can visually backtrack and identify the source and logic of a data asset and thereby helping reduce the total volume of data requests and increasing analyst productivity.

Learn more: End-to-end life cycle visualization with Atlan’s data lineage

Snowflake Data Governance: Atlan Data lineage helps you understand the journey of the data from its data source to dashboards

Snowflake Data Governance: Atlan Data lineage helps you understand the journey of the data from its data source to dashboards. Source: Atlan

DataOps: Agile data science for scale

Data lineage helps DataOps engineers understand the data dependencies better, thus helping with better pipeline workflow design. Tracing data across its lifecycle helps identify and fix issues with root cause analysis (RCA) and impact analysis.

Technical metadata like workflow errors and data quality logs help alert downstream users of data reliability.

Learn more: Solve DataOps use cases with Atlan’s custom metadata


Get a Demo of Atlan Data Governance for Snowflake


Classification of sensitive data: Be compliant-ready


Atlan auto classifies PII data like email, name, phone number, and credit card information.

As the number of compliances (GDPR, HIPPAA, CCPA, etc.) companies must adhere to is growing - protecting sensitive data is becoming ever more challenging. Altan auto propagates all classifications downstream such that every table that is derived from the column is tagged with the same classification. Atlan also helps mask sensitive data through hashing, nullifying, and redacting.

Learn more: How to add the classification to a Snowflake data asset

Automate classification and access control through data governance.

Snowflake Data Governance: Automate classification and access control through data governance. Source: Atlan

Scale governance better: Granular access control


Atlan lets you manage data access governance through role-based access — Administrators, members, and guests — and access policies that control access to certain data assets. Access policies let you define granular access via:

  • Metadata policies: Control who can view, update, add and delete metadata.
  • Data policies: Control what users can do with the data — querying data, hiding data, etc.
  • Glossary policies: Create, update, delete, and add classifications to business terms.

To scale data governance better, Atlan also lets you define access policies and user experience based on:

  • Personas: Personas define policies to control which users can (or cannot) take certain actions on specific assets — examples include marketing, sales, data engineering, etc.
  • Purposes: Purposes control permissions based on which users and groups can view, edit, and query assets tagged with that classification. For example, a finance team can tag all the assets related to their analysis as “finance” and create policies around the tag.

Atlan integrates with your existing single sign-on and user management systems like Okta, JumpCloud, OneLogin, Google, and Azure SSO.

Learn more: Identity and access management in Atlan for Snowflake assets

Learn more: Seamlessly scale governance for Snowflake data with Personas and Purposes.

Snowflake Data Governance: Scale access control with Personas and Purposes

Snowflake Data Governance: Scale access control with Personas and Purposes. Source: Atlan

Embedded collaboration: Context on demand without switching costs


Atlan provides you with a decentralized and community-driven approach to data governance; Instead of being an afterthought, governance is now part of your daily workflows.

With Atlan, you can learn more about any data asset without ever switching the application and breaking your flow. For example, information about when the column was updated last on your BI dashboard is just a click away.

Atlan also helps democratize governance by integrating with your collaboration and project management tools like Slack and Jira. With Atlan’s slack bot, you can look up any data asset right within your chat interface. Data users can initiate Slack conversations around any data asset right within Atlan.

Learn more: Seamlessly collaborate around your Snowflake data assets with Slack and Jira integration.

Embedded collaboration with data assets and team members in the tools you are familiar with

Snowflake Data Governance: Embedded collaboration with data assets and team members in the tools you are familiar with. Source: Atlan

Governance Reporting


Atlan’s reporting center gives you a quick snapshot of important governance metrics like:

  • Total data assets crawled and cataloged — the breakdown of assets by data sources
  • A breakdown of asset categories — SQL assets, BI assets, process assets, etc.
  • Asset drill down by Persona, Owners, and Groups.
  • Total assets that have been verified and certified.

Learn more: Track Snowflake data governance metrics on Atlan’s Reporting Center

Snowflake Data Governance: A snapshot of data governance status through Reporting center.

Snowflake Data Governance: A snapshot of data governance status through Reporting center. Source: Atlan


Atlan: A Snowflake-validated data governance solution

If you are evaluating and looking to deploy best-in-class data access governance for your Snowflake data warehouse - give Atlan a spin.

Atlan is the first data catalog and metadata management solution approved by the Snowflake Ready Validation Program.

Quoting, Bob Muglia, Former CEO of Snowflake:

“Atlan’s unique, collaboration-first approach for the modern data stack helps to break down organizational silos and empower cross-functional teams to work together to make better business decisions.”

Atlan is more than a metadata management and data cataloging tool. Atlan is built by data engineers for solving the evolving needs of modern data teams. Atlan’s capabilities include faster discovery, transparent data flow, robust governance, and collaboration built on open infrastructure and an easy-to-use user interface.

The deep integration and the open API enable Atlan to solve other modern data governance use cases across DataOps, workflow management, and pipeline automation.

Atlan has been named a leader in The Forrester Wave™: Enterprise Data Catalogs for DataOps, Q2 2022.

The report states,

“Atlan is the tool of choice for DataOps and data product deployment. Atlan’s vision is to create frictionless data product deployment through a single metadata and data automation platform.”

Getting started with Snowflake data governance with Atlan:




Share this article

resource image

Free Guide: Find the Right Data Catalog in 5 Simple Steps.

This step-by-step guide shows how to navigate existing data cataloging solutions in the market. Compare features and capabilities, create customized evaluation criteria, and execute hands-on Proof of Concepts (POCs) that help your business see value. Download now!

[Website env: production]