Snowflake Data Governance — Data Discovery, Security & Access Policies

September 19th, 2022

header image for Snowflake Data Governance — Data Discovery, Security & Access Policies

Atlan: Modern data governance workspace for Snowflake

Atlan is a single data governance plane for all your Snowflake data assets.

Atlan helps build a robust data governance system by:

  • Protecting sensitive data at scale.
  • Automating consistent data access policies across your entire data ecosystem.
  • Providing transparency into data lifecycle through lineage.
  • Establishing trust in data through context and collaboration.

Table of contents

  1. What is Snowflake
  2. What is data governance?
  3. Data governance benefits
  4. Data governance capabilities in Snowflake
  5. Challenges of implementing data governance in Snowflake
  6. Snowflake data governance with Atlan
  7. Atlan: A Snowflake validated data governance solution

What is Snowflake

Snowflake is a cloud-native data warehouse primarily used for batch data ingestion and data analytics of both structured and unstructured data from diverse sources.

Quoting Snowflake’s website:

Snowflake is a fully managed service that’s simple to use but can power a near-unlimited number of concurrent workloads. Snowflake is your solution for data warehousing, data lakes, data engineering, data science, data application development, and securely sharing and consuming shared data.

The crux of what makes Snowflake different from other warehouses is it decouples both storage and compute. This means you can spin up and down machines on demand based on the analytics workload.

Quoting Snowflake from S-1 form,

Our platform solves the decades-old problem of data silos and data governance. Leveraging the elasticity and performance of the public cloud, our platform enables customers to unify and query data to support a wide variety of use cases. It also provides frictionless and governed data access so users can securely share data inside and outside of their organizations, generally without copying or moving the underlying data.

Snowflake uses cloud-based object storage solutions such as AWS S3, Azure, and GCP for data storage. Some of the customers of Snowflake include Dropbox, Doordash, Hubspot, Adobe, and fitbit.

Snowflake Architecture diagram

Snowflake Architecture diagram. Source: Snowflake


What is data governance?

Data governance is a set of procedures, models, and guidelines around people, processes, and technologies that detail how data is to be properly managed, accessed, and used. Good data governance helps ensure the availability, quality, integrity, and security of organizational data.

Quoting The Data Governance Institute(DGI):

Data Governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.


Data governance benefits

Quoting Mckinsey & Company:

Leading firms have eliminated millions of dollars in cost from their data ecosystems and enabled digital and analytics use cases worth millions or even billions of dollars. Data governance is one of the top three differences between firms that capture this value and firms that don’t.

Data quality: Governance ensures that data throughout its lifecycle is accurate, consistent, fresh, and complete.

Data access: Governance helps provide flexible, and scalable access policies to data users.

Security and compliance: Governance monitors and reduces the risk of exposing private data. It enables data assets to be auditable across their life cycle and thus helping businesses comply with regulations like GDPR, HIPAA, etc.

Life cycle management: Governance helps set policies for data creation, data retention, and data deletion.

Trust: Governance improves the trust and the confidence of data and thus helps increase the ROI of your data assets

Data quality and data availability: Key benefits of having a robust data governance program

Data quality and data availability: Key benefits of having a robust data governance program. Source: Mckinsey Digital


[Download ebook] → Rethinking Data Governance for the Modern Data Stack


Data governance capabilities in Snowflake

Snowflake provides the following data governance features out of the box:

  1. Column level security
  2. Row-level access policies
  3. Object tag-based masking policies
  4. Data classification
  5. Object dependencies
  6. Oauth

Column-level governance with data masking

Column-level governance lets you add data masking policy within a table or a view through Dynamic data masking and External tokenization.

Learn more: Column-level data governance for Snowflake tables and views

Row-level governance

Snowflake uses row-level policies to control what rows are returned in the query result — SELECT, UPDATE, DELETE and MERGE statements.

Learn more: Row access policies for Snowflake data

Object tagging

Data stewards use tags to track sensitive data for security, privacy, compliance, discovery, and resource usage. Tags become powerful when you attach them with access policies, this helps manage and scale data governance in Snowflake easier.

Learn more: Object tag-based access policies

Data classification

Snowflake samples all data assets and classifies them as tags. The broad classification use cases include PII, data access, policies, and anonymization.

Learn more: Introduction to Snowflake data classification

Object dependencies

Object dependencies enable you to track dependencies within Snowflake data assets. Dependencies are useful for performing an impact analysis, being compliant, and maintaining data integrity.

Learn more: Object dependencies and their use-cases

OAuth

Snowflake supports both built-in and third-party OAuth for authentication and authorization of users.

Learn more: Introduction to Oauth in Snowflake


Challenges of implementing data governance in Snowflake

Metadata scope limitations

The scope of what is considered and tracked as metadata is expanding, especially, with the adoption of the Modern data stack. Teams can now use ETL logs, data quality metrics, workflow errors, etc as metadata to improve and automate data governance. So you'll need a centralized metadata management tool to track and take action on metadata.

Governance for data from multiple sources(non-Snowflake)

Data governance, especially on the cloud, proliferates across multiple tools and processes ranging from ingestion, ETL, data quality, and business intelligence(BI). So the need for a centralized metadata and data governance platform is a must-have to get a complete hold on governance.

Lack of data lineage

Data governance in essence ensures that high-quality data exists for analysis throughout the life-cycle of the data. Lineage helps track this by helping visualize the journey of the data from the source to the dashboard.

Data discovery challenges

The two core components of a good data governance system are “availability” and “usability” of data. The lack of a data catalog and a business glossary in Snowflake might make it difficult to find, use, and collaborate on data.

Governance management for non-technical/business users

Data governance setup and management on Snowflake are entirely done through writing SQL queries. This makes it harder for non-technical/business users to use Snowflake for governance.

CREATE OR REPLACE TAG Classification;
ALTER TAG Classification set comment =
    "Tag Tables or Views with one of the following classification values:
    'Confidential', 'Restricted', 'Internal', 'Public'";

CREATE OR REPLACE TAG PII;
ALTER TAG PII set comment = "Tag Tables or Views with PII with one or more
    of the following values: 'Phone', 'Email', 'Address'";

CREATE OR REPLACE TAG SENSITIVE_PII;
ALTER TAG SENSITIVE_PII set comment = "Tag Tables or Views with Sensitive PII
    with one or more of the following values: 'SSN', 'DL', 'Passport',
    'Financial', 'Medical'";

Creating a PII tag classification on Snowflake by writing SQL queries. Source: The Snowflake Definitive Guide, by, Joyce Kay Avila.


Snowflake data governance with Atlan: Flexible and scalable

Atlan helps transform data governance from a complex, bureaucratic process into a simple, community-driven approach. With custom classification, programmable PII bots, and automated classification propagation, Atlan makes data governance management easy at scale.

Listed below are some of the key data governance use cases that Atlan helps solve:

Data discovery and search: Catalog

Atlan crawls and catalogs all your data assets in your Snowflake data warehouse and drives self-service data discovery — Atlan also crawls other data sources like Bigquery, Redshift, Databricks, etc.

Atlan’s Google-like search lets you find relevant data assets, documents, BI dashboards, and queries and understand the associated context between different data assets in a business-friendly user interface.

The powerful search filters let you slice and dice your search results based on data sources, tables, columns, business glossary, owners, and classification tags.

Learn more: How to search and discover assets in Atlan

Snowflake Data Governance: Atlan Data catalog facilitates metadata discovery across your Snowflake warehouse. Source: Atlan

Snowflake Data Governance: Atlan Data catalog facilitates metadata discovery across your Snowflake warehouse. Source: Atlan


A Guide to Building a Business Case for a Data Catalog

Download ebook


Single source of truth: Centralized documentation

With Atlan as the central metadata management system, there is no need for end-users to log into different systems to find and understand the data, this reduces the time to value of any data project.

Atlan’s data dictionary and a business glossary help crowdsource the tribal knowledge and create a unified taxonomy of data assets across your Snowflake warehouse. The business glossary maps physical data elements like databases, tables, columns, and SQL queries to business terms, definitions, metrics, KPIs, calculations, and reports.

Learn more: How to create and manage a business glossary in Atlan

A centralized knowledge bank that explains key business terms and concepts. Source: Atlan

Snowflake Data Governance: A centralized knowledge bank that explains key business terms and concepts. Source: Atlan

Transparency and Traceability: Data Lineage

Atlan’s automated data lineage helps businesses to meet regulatory requirements with ease. Key compliance needs like data provenance, data usage, data access, data transformation, and data archival/deletion can be tracked end-to-end from the source to the BI dashboard using data lineage.

Atlan actively parses SQL queries and builds column-level lineage. This granular visualization helps identify both upstream and downstream dependencies of a data asset. Data users can visually backtrack and identify the source and logic of a data asset and thereby helping reduce the total volume of data requests and increasing analyst productivity.

Learn more: End-to-end life cycle visualization with Atlan’s data lineage

Snowflake Data Governance: Atlan Data lineage helps you understand the journey of the data from its data source to dashboards

Snowflake Data Governance: Atlan Data lineage helps you understand the journey of the data from its data source to dashboards. Source: Atlan

DataOps: Agile data science for scale

Data lineage helps DataOps engineers to understand the data dependencies better, thus helping in better pipeline workflow design. Tracing data across its lifecycle help identify and fix issues with root cause analysis(RCA) and impact analysis.

Technical metadata like workflow errors and data quality logs help alert downstream users of data reliability.

Learn more: Solve DataOps use cases with Atlan’s custom metadata


A Demo of Atlan Data Governance for Snowflake


Classification of sensitive data: Be compliant-ready

Atlan auto classifies PII data like email, name, phone number, and credit card information.

As the number of compliances — GDPR, HIPPA, CCPA, etc — companies must adhere to is growing and changing, protecting sensitive data is becoming ever more challenging. Altan auto propagates all classifications downstream such that every table that is derived from the column is tagged with the same classification. Atlan also helps mask sensitive data through hashing, nullifying, and redacting.

Learn more: How to add classification to a Snowflake data asset

Automate classification and access control through data governance.

Snowflake Data Governance: Automate classification and access control through data governance.

Scale governance better: Granular access control

Atlan lets you manage data access governance through role-based access — Administrators, members, and guests — and access policies that control access to certain data assets. Access Policies lets you define granular access via:

  • Metadata policies: Control who can view, update, add and delete metadata.
  • Data policies: Control what users can do with the data — querying data, hiding data, etc.
  • Glossary policies: Create, update, delete, and add classifications to business terms.

To scale data governance better Atlan also lets you define access policies and user experience based on:

  • Personas: Personas define policies to control which users can (or cannot) take certain actions on specific assets — examples include marketing, sales, data engineering, etc.
  • Purposes: Purposes control permissions based on which users and groups can view, edit, and query assets tagged with that classification. For example, a finance team can tag all the assets related to their analysis as “finance” and create policies around the tag.

Atlan integrates with your existing single sign-on and user management systems like Okta, JumpCloud, OneLogin, Google, and Azure SSO.

Learn more: Identity and access management in Atlan for Snowflake assets

Learn more: Seamlessly scale governance for Snowflake data with Personas and Purposes.

Snowflake Data Governance: Scale access control with Personas and Purposes

Snowflake Data Governance: Scale access control with Personas and Purposes.

Embedded collaboration: Context on demand without switching costs

Atlan provides you with a decentralized and community-driven approach to data governance; Governance instead of being an afterthought is now part of your daily workflows.

With Atlan, you can learn more about any data asset without ever switching the application and breaking your flow. For example, information about when the column was updated last on your BI dashboard is just a click away.

Atlan also helps democratize governance by integrating with your collaboration and project management tools like Slack and Jira. With Atlan’s slack bot you can look up for any data asset right within your chat interface. Data users can initiate Slack conversations around any data asset right within Atlan.

Learn more: Seamlessly collaborate around your Snowflake data assets with Slack and Jira integration.

Embedded collaboration with data assets and team members in the tools you are familiar with

Snowflake Data Governance: Embedded collaboration with data assets and team members in the tools you are familiar with.

Governance Reporting

Atlan’s reporting center gives you a quick snapshot of import important governance metrics like:

  • Total data assets crawled and cataloged — the breakdown of assets by data sources
  • A breakdown of asset categories — SQL assets, BI assets, process assets, etc.
  • Asset drill down by Persona, Owners, and Groups.
  • Total assets that have been verified and certified.

Learn more: Track Snowflake data governance metrics on Atlan’s Reporting Center

Snowflake Data Governance: A snapshot of data governance status through Reporting center.

Snowflake Data Governance: A snapshot of data governance status through Reporting center.


Atlan: A Snowflake validated data governance solution

If you are evaluating and looking to deploy best-in-class data access governance for Snowflake data warehouse without compromising on data democratization? Do give Atlan a spin.

Atlan is the first data catalog and metadata management solution validated by Snowflake’s technology validation program.

Quoting, Bob Muglia, Former CEO of Snowflake,

"Atlan’s unique, collaboration-first approach for the modern data stack helps to break down organizational silos and empower cross-functional teams to work together to make better business decisions."

Atlan is more than a metadata management and data cataloging tool. Atlan is built by data engineers solving for the evolving needs of the modern data teams which include faster discovery, transparent data flow, robust governance, and collaboration built on open infrastructure and an easy-to-use user interface.

The deep integration and the open API enable Atlan to solve other modern data governance use cases across DataOps, workflow management, and pipeline automation.

Atlan has been named a leader in The Forrester Wave™: Enterprise Data Catalogs for DataOps, Q2 2022.

The report states,

“Atlan is the tool of choice for DataOps and data product deployment. Atlan’s vision is to create frictionless data product deployment through a single metadata and data automation platform.”

Getting started with Snowflake data governance with Atlan:



Free Guide: Find the Right Data Catalog in 5 Simple Steps.

This step-by-step guide shows how to navigate existing data cataloging solutions in the market. Compare features and capabilities, create customized evaluation criteria, and execute hands-on Proof of Concepts (POCs) that help your business see value. Download now!