Snowflake Data Governance — Data Discovery, Security & Access Policies
September 19th, 2022
Atlan: A modern data governance workspace for Snowflake
Atlan is a single data governance plane for all your Snowflake data assets.
Atlan helps build a robust data governance system by:
- Protecting sensitive data at scale.
- Automating consistent data access policies across your entire data ecosystem.
- Providing transparency into data lifecycle through lineage.
- Establishing trust in data through context and collaboration.
Table of contents
- What is Snowflake?
- What is data governance?
- Data governance benefits
- Data governance capabilities in Snowflake
- 5 key challenges of implementing data governance in Snowflake
- Snowflake data governance with Atlan
- Atlan: A Snowflake validated data governance solution
What is Snowflake?
Snowflake is a cloud-native data warehouse primarily used for batch data ingestion and data analytics of both structured and unstructured data from diverse sources.
Snowflake is a fully managed service that’s simple to use but can power a near-unlimited number of concurrent workloads. Snowflake is your solution for data warehousing, data lakes, data engineering, data science, data application development, and securely sharing and consuming shared data.
The crux of what makes Snowflake different from other warehouses is that it decouples both storage and compute. This means you can spin up and down machines on demand based on the analytics workload.
Our platform solves the decades-old problem of data silos and data governance. Leveraging the elasticity and performance of the public cloud, our platform enables customers to unify and query data to support a wide variety of use cases. It also provides frictionless and governed data access so users can securely share data inside and outside of their organizations, generally without copying or moving the underlying data.
What is data governance?
Data governance is a set of procedures, models, and guidelines around people, processes, and technologies that detail how data is to be properly managed, accessed, and used. Good data governance helps ensure the availability, quality, integrity, and security of organizational data.
Quoting The Data Governance Institute(DGI):
Data Governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.
Benefits of governing Snowflake data assets
Leading firms have eliminated millions of dollars in cost from their data ecosystems and enabled digital and analytics use cases worth millions or even billions of dollars. Data governance is one of the top three differences between firms that capture this value and firms that don’t.
Here are the benefits of governing Snowflake data assets:
Data governance ensures that data throughout its lifecycle is accurate, consistent, fresh, and complete.
Data governance helps provide flexible, and scalable access policies to data users.
Security and compliance
Data governance monitors and reduces the risk of exposing private data. It enables data assets to be auditable across their life cycle and thus helping businesses comply with regulations like GDPR, HIPAA, etc.
Life cycle management
Governance helps set policies for data creation, data retention, and data deletion.
Governance improves the trust and confidence in data and thus helps increase the ROI of your data assets
Data governance capabilities in Snowflake
Snowflake provides the following data governance features out of the box:
- Column-level security
- Row-level access policies
- Object tag-based masking policies
- Data classification
- Object dependencies
1. Column-level governance with data masking
2. Row-level governance
Snowflake uses row-level policies to control what rows are returned in the query result — SELECT, UPDATE, DELETE and MERGE statements.
Learn more: Row access policies for Snowflake data
3. Object tagging
Data stewards use tags to track sensitive data for security, privacy, compliance, discovery, and resource usage. Tags become powerful when you attach them with access policies. This helps manage and scale data governance in Snowflake easier.
Learn more: Object tag-based access policies
4. Data classification
Snowflake samples all data assets and classifies them as tags. The broad classification use cases include PII, data access, policies, and anonymization.
Learn more: Introduction to Snowflake data classification
5. Object dependencies
Object dependencies enable you to track dependencies within Snowflake data assets. Dependencies are useful for performing an impact analysis, being compliant, and maintaining data integrity.
Learn more: Object dependencies and their use-cases
Snowflake supports both built-in and third-party OAuth for authentication and authorization of users.
Learn more: Introduction to OAuth in Snowflake
5 key challenges of implementing data governance in Snowflake
The following can be listed as the 5 key challenges of implementing effective governance across snowflake data assets:
- Metadata scope limitations
- Governance for data from multiple sources (non-Snowflake)
- Lack of data lineage
- Data discovery challenges
- Governance management for non-technical / business users
1. Metadata scope limitations
The scope of what is considered and tracked as metadata is expanding, especially with the adoption of the modern data stack. Teams can now use ETL logs, data quality metrics, workflow errors, etc. as metadata to improve and automate data governance. So you’ll need a centralized metadata management tool to track and take action on metadata.
2. Governance for data from multiple sources (non-Snowflake)
Data governance, especially on the cloud, proliferates across multiple tools and processes ranging from ingestion, ETL, data quality, and business intelligence(BI). So the need for a centralized metadata and data governance platform is a must-have to get a complete hold on governance.
3. Lack of data lineage
Data governance in essence ensures that high-quality data exists for analysis throughout the life-cycle of the data. Lineage helps track this by helping visualize the journey of the data from the source to the dashboard.
4. Data discovery challenges
The two core components of a good data governance system are “availability” and “usability” of data. The lack of a data catalog and a business glossary in Snowflake might make it difficult to find, use, and collaborate on data.
5. Governance management for non-technical/business users
Data governance setup and management on Snowflake are entirely done through writing SQL queries. This makes it harder for non-technical/business users to use Snowflake for governance.
CREATE OR REPLACE TAG Classification; ALTER TAG Classification set comment = "Tag Tables or Views with one of the following classification values: 'Confidential', 'Restricted', 'Internal', 'Public'"; CREATE OR REPLACE TAG PII; ALTER TAG PII set comment = "Tag Tables or Views with PII with one or more of the following values: 'Phone', 'Email', 'Address'"; CREATE OR REPLACE TAG SENSITIVE_PII; ALTER TAG SENSITIVE_PII set comment = "Tag Tables or Views with Sensitive PII with one or more of the following values: 'SSN', 'DL', 'Passport', 'Financial', 'Medical'";
Creating a PII tag classification on Snowflake by writing SQL queries. Source: The Snowflake Definitive Guide, by, Joyce Kay Avila.
Snowflake data governance with Atlan: Flexible and scalable
Atlan helps transform data governance from a complex, bureaucratic process into a simple, community-driven approach. With custom classification, programmable PII bots, and automated classification propagation, Atlan makes data governance management easy at scale.
Listed below are some of the key data governance use cases that Atlan helps solve:
Data discovery and search: Catalog
Atlan crawls and catalogs all your data assets in your Snowflake data warehouse and drives self-service data discovery — Atlan also crawls other data sources like Bigquery, Redshift, Databricks, etc.
Atlan’s Google-like search lets you find relevant data assets, documents, BI dashboards, and queries and understand the associated context between different data assets in a business-friendly user interface.
The powerful search filters let you slice and dice your search results based on data sources, tables, columns, business glossary, owners, and classification tags.
Learn more: How to search and discover assets in Atlan
A Guide to Building a Business Case for a Data Catalog
Single source of truth: Centralized documentation
With Atlan as the central metadata management system, there is no need for end-users to log into different systems to find and understand the data. This reduces the time to value of any data project.
Atlan’s data dictionary and a business glossary help crowdsource the tribal knowledge and create a unified taxonomy of data assets across your Snowflake warehouse. The business glossary maps physical data elements like databases, tables, columns, and SQL queries to business terms, definitions, metrics, KPIs, calculations, and reports.
Transparency and Traceability: Data Lineage
Atlan’s automated data lineage helps businesses to meet regulatory requirements with ease. Key compliance needs like data provenance, data usage, data access, data transformation, and data archival/deletion can be tracked end-to-end from the source to the BI dashboard using data lineage.
Atlan actively parses SQL queries and builds column-level lineage. This granular visualization helps identify both upstream and downstream dependencies of a data asset. Data users can visually backtrack and identify the source and logic of a data asset and thereby helping reduce the total volume of data requests and increasing analyst productivity.
DataOps: Agile data science for scale
Data lineage helps DataOps engineers understand the data dependencies better, thus helping with better pipeline workflow design. Tracing data across its lifecycle helps identify and fix issues with root cause analysis (RCA) and impact analysis.
Technical metadata like workflow errors and data quality logs help alert downstream users of data reliability.
Get a Demo of Atlan Data Governance for Snowflake
Classification of sensitive data: Be compliant-ready
Atlan auto classifies PII data like email, name, phone number, and credit card information.
As the number of compliances (GDPR, HIPPA, CCPA, etc.) companies must adhere to is growing - protecting sensitive data is becoming ever more challenging. Altan auto propagates all classifications downstream such that every table that is derived from the column is tagged with the same classification. Atlan also helps mask sensitive data through hashing, nullifying, and redacting.
Scale governance better: Granular access control
Atlan lets you manage data access governance through role-based access — Administrators, members, and guests — and access policies that control access to certain data assets. Access policies let you define granular access via:
- Metadata policies: Control who can view, update, add and delete metadata.
- Data policies: Control what users can do with the data — querying data, hiding data, etc.
- Glossary policies: Create, update, delete, and add classifications to business terms.
To scale data governance better, Atlan also lets you define access policies and user experience based on:
- Personas: Personas define policies to control which users can (or cannot) take certain actions on specific assets — examples include marketing, sales, data engineering, etc.
- Purposes: Purposes control permissions based on which users and groups can view, edit, and query assets tagged with that classification. For example, a finance team can tag all the assets related to their analysis as “finance” and create policies around the tag.
Atlan integrates with your existing single sign-on and user management systems like Okta, JumpCloud, OneLogin, Google, and Azure SSO.
Embedded collaboration: Context on demand without switching costs
Atlan provides you with a decentralized and community-driven approach to data governance; Instead of being an afterthought, governance is now part of your daily workflows.
With Atlan, you can learn more about any data asset without ever switching the application and breaking your flow. For example, information about when the column was updated last on your BI dashboard is just a click away.
Atlan also helps democratize governance by integrating with your collaboration and project management tools like Slack and Jira. With Atlan’s slack bot, you can look up any data asset right within your chat interface. Data users can initiate Slack conversations around any data asset right within Atlan.
Atlan’s reporting center gives you a quick snapshot of important governance metrics like:
- Total data assets crawled and cataloged — the breakdown of assets by data sources
- A breakdown of asset categories — SQL assets, BI assets, process assets, etc.
- Asset drill down by Persona, Owners, and Groups.
- Total assets that have been verified and certified.
Learn more: Track Snowflake data governance metrics on Atlan’s Reporting Center
Atlan: A Snowflake-validated data governance solution
If you are evaluating and looking to deploy best-in-class data access governance for your Snowflake data warehouse - give Atlan a spin.
Atlan is the first data catalog and metadata management solution validated by Snowflake’s technology validation program.
Quoting, Bob Muglia, Former CEO of Snowflake:
“Atlan’s unique, collaboration-first approach for the modern data stack helps to break down organizational silos and empower cross-functional teams to work together to make better business decisions.”
Atlan is more than a metadata management and data cataloging tool. Atlan is built by data engineers for solving the evolving needs of modern data teams. Atlan’s capabilities include faster discovery, transparent data flow, robust governance, and collaboration built on open infrastructure and an easy-to-use user interface.
The deep integration and the open API enable Atlan to solve other modern data governance use cases across DataOps, workflow management, and pipeline automation.
Atlan has been named a leader in The Forrester Wave™: Enterprise Data Catalogs for DataOps, Q2 2022.
The report states,
“Atlan is the tool of choice for DataOps and data product deployment. Atlan’s vision is to create frictionless data product deployment through a single metadata and data automation platform.”
Getting started with Snowflake data governance with Atlan:
- How to crawl Snowflake metadata
- How to mine Snowflake metadata
- What does Atlan crawl from Snowflake?
- How to attach a classification for Snowflake data assets?
- How do I control access to Snowflake metadata and data?
Snowflake data governance: Related reads
- Snowflake metadata management — Discovery, lineage, and governance
- Data governance and its importance in the modern data stack
- 6 commonly referenced data governance frameworks in 2023
- Snowflake data dictionary — Documentation for your Snowflake data warehouse
- Snowflake data access control made easy and scalable
- Snowflake data catalog: Enabling active metadata management for your data cloud