Snowflake Data Dictionary: Documentation for Your Database

Updated August 01st, 2023
header image

Share this article

Finding the right data is the hardest and the most time-consuming task in a data analyst’s/scientist’s typical work day. Data discovery and search have now become business-critical, influencing the success/failure of a data analytics project.

In addition to data catalog and data lineage that help make search and discovery easier and faster, data dictionary plays a crucial role in the process.

In the context of the Snowflake platform, how important is a data dictionary? How does a data dictionary help in establishing trust in snowflake data assets? Here, we will explore answers to these questions.

Table of contents

  1. What is a data dictionary?
  2. Benefits of a data dictionary
  3. Snowflake Data Dictionary: The Information Schema
  4. Key components of a Snowflake data dictionary
  5. Challenge using Snowflake data dictionary
  6. Atlan: The Modern data dictionary for Snowflake

What is a data dictionary?

A data dictionary holds the metadata for the database assets. It is the documentation for your database which describes the structure and content of the database objects.

Quoting the Harvard Business Review,

Studies show that knowledge workers waste up to 50% of time hunting for data, identifying and correcting errors, and seeking confirmatory sources for data they do not trust.

It becomes extremely difficult to understand data without metadata which helps explain the source, nature, structure, quality, and usefulness of data.

A data dictionary is a collection of read-only tables and views that gets automatically created with every database. So one doesn’t have to probe source code, SQL queries, BI reports and depend on the tribal knowledge of the few to get more context about the data.

Learn more - Built-in data dictionary in PostgreSQL | Oracle Database | MySQL

Accessing PostgreSQL data dictionary with Amazon RDS

Accessing PostgreSQL data dictionary with Amazon RDS. Source: Amazon RDS.

A Guide to Building a Business Case for a Data Catalog

Download free ebook

Top 4 benefits of a data dictionary

Without a data dictionary for your Snowflake assets, one has to rely on siloed, undocumented tribal knowledge or spend lots of time digging through codebases, SQL queries, and logs.

Unified source of truth

Information about databases, schemas, and tables is managed in a single source of truth — a data dictionary that is editable, shareable, auditable, and self-serve, as compared to knowledge spread across various teams, different tools, and chaotic access policies. A data dictionary makes new analysts’ onboarding and training much easier and more productive.

Data discovery

The self-serve nature of the data dictionary helps analysts search and understand data needed for analysis and reporting. Understanding table relationships and foreign key constraints help analysts write better-performing SQL queries. It helps remove dependencies and bottlenecks and thereby enabling better governance and stewardship.

Trust data better

Data profiling gives a quick snapshot of the nature and quality of data. The usability of data assets can be ascertained through: Data types, min-max values, frequency distribution, and unique, duplicate, and null values.

Database optimization

A data dictionary is a rich source of technical metadata such as table relationships, query history, foreign key constraints, and database usage and performance. This helps DataOps engineers to monitor and tune databases for optimal performance and costs (storage and processing)

Snowflake Data Dictionary: The Information Schema

Does Snowflake have a data dictionary? Yes, it does.

Snowflake has a built-in data dictionary known as and can be accessed through INFORMATION_SCHEMA. The Information Schema is always included within every database created in Snowflake.

The Snowflake data dictionary has the following two components:

1. Schema views: Metadata about all data assets/objects stored in the database.

2. Table functions: Metadata about historical information on storage, tasks, query history, and account-level usage.

Learn more: Using Snowflake Information Schema as a data dictionary

Data dictionary in Snowflake is accessed through Information Schema

The data dictionary in Snowflake is accessed through Information Schema. Source: Snowflake: The Definitive Guide - Joyce Kay Avila.

Building blocks of a Snowflake data dictionary

  • Table schema, names, and descriptions
  • Table owners and relationships
  • Number of columns, rows, column names, and descriptions
  • Permissible values and validation rules for a column
  • Data types
  • Referential constraints — foreign keys and primary keys
  • Data profiling with descriptive statistics — missing values, min-max values, and histogram distribution
  • SQL query logs
  • Warehouse storage, load, and a metering history

Learn more: Reference documentation of Snowflake data dictionary

5 key challenges of using a Snowflake Data Dictionary

The 5 key challenges of relying just on Snowflake’s native data dictionary include:

  1. Accessibility for non-technical users
  2. Absence of an intuitive search experience
  3. Support for data stored at non-snowflake databases
  4. Lack of custom metadata support
  5. Lack of data collaboration

Accessibility for non-technical/business users

Managing and accessing a data dictionary in Snowflake is only possible by running SQL queries. This makes it harder for business users to use dictionaries for data discovery.

Writing SQL queries to access data dictionary in Snowflake

Writing SQL queries to access data dictionary in Snowflake. Source: Snowflake.

For accessing a data dictionary in Snowflake, one still needs to have a general idea of where a particular data asset comes from. This makes it difficult when you have thousands of tables spread over different databases.

Multiple data sources support

Snowflake provides a data dictionary only for databases stored within the Snowflake warehouse. When you have data stored at non-Snowflake databases, you’ll need a centralized data dictionary tool to assimilate all data sources.

Lack of custom metadata support

Snowflake data dictionary supports only metadata exposed through the API. It is not possible to add custom metadata — ETL logs, Airflow status, and data quality checks — to the data dictionary.

Lack of data collaboration

As the volume of data on your Snowflake warehouse increases, access to the data dictionary alone would not be sufficient to find and trust data. Bringing in the human element through embedded collaboration — chats, crowdsourced data stewardship, and integrations with IT/task management tools — helps make data reliable and usable.

Atlan: A modern data dictionary for Snowflake

The problem with traditional data dictionaries (that come built-in within databases) is that it is driven by passive metadata: Metadata that is the default, limiting, and doesn’t give you a complete picture of a data asset.

On the contrary, the crux of active metadata is its openness and interoperability. Atlan makes it possible to effortlessly move metadata across your data stack — data lakes, warehouses, BI tools, pipelines, ETL — and helps embed this rich context on data catalog, lineage, and data dictionary.

Atlan’s data dictionary helps you build a 360° data profile to understand your data better, search and discover data faster, and collaborate in and around data assets.

Now, let’s have a look at the key data dictionary capabilities of Atlan:

Auto-generated data dictionary

Apart from Snowflake, Atlan’s bots automatically crawl and pull data from all the commonly used data sources like Bigquery, Databricks, Redshift, MySQL, and PostgreSQL. Atlan also connects with BI tools like Looker, Tableau, and Power BI and so it gives your data dictionary an end-to-end view of a data asset.

Learn more: How Atlan connects with multiple data sources

Snowflake data dicionary: Atlan — Automated data dictionary for Snowflake data warehouse

Atlan — Automated data dictionary for Snowflake data warehouse. Source: Atlan.

Search and discovery on an easy-to-use interface

Atlan’s primary use case is a data catalog. It provides a Google-like search experience that lets anyone on your team — from a data engineer to a business user/analyst — search and filter data assets on an easy-to-use user interface.

Learn more: Atlan data catalog for data discovery and search

Snowflake data dicionary: Atlan data catalog provides Google-like search to discover and find the right data for your analysis

Atlan data catalog provides a Google-like search to discover and find the right data for your analysis. Source: Atlan.

Data dictionary enriched with custom metadata

Atan’s APIs let you track all kinds of metadata — technical, administrative, business, and social — from your warehouse, ingestion, ETL, and business intelligence tools. Questions like, “when was the last time this data was updated?”, and “what are the transformations the data has gone through?” are answered right there in the data dictionary.

Learn more: Custom metadata support in Atlan

Everyone is on the same page with a business glossary

Atlan’s data dictionary also has a business glossary — a centralized knowledge bank that lists all of the key business terms, concepts, KPIs, and metrics associated with a data asset.

Learn more: Build a business knowledge base with a data glossary

Snowflake data dicionary: A centralized repository that lists all of the key business terms and concepts used in the day-to-day operation of an organization

A centralized repository that lists all of the key business terms and concepts used in the day-to-day operation of an organization. Source: Atlan

End-to-end visibility with data lineage

Atlan’s automated data lineage crawls data sources and parses SQL queries to build column-level lineage visualization. Data lineage helps understand relationships and dependencies between various data assets across the entire data life cycle.

Learn more: Visualize your data asset from the source to the BI dashboard

Snowflake data dicionary: Get end-to-end data visiblity from the data sources to BI dashboard with data lineage

Get end-to-end data visibility from the data sources to the BI dashboard with data lineage. Source: Atlan.

Democratize data with embedded collaboration

Atlan seamlessly integrates with your team’s communication and project management tools. This lets you initiate conversations around a data asset right within Atlan. Crowdsource data stewardship tasks like updating documentation/READMEs, adding owners, certifying terms, and adding linked assets.

Learn more: Democratize your data team with embedded collaboration

Snowflake data dicionary: With Atlan, search and find data right from the tools you are familiar with.

With Atlan, search and find data right from the tools you are familiar with. Source: Atlan.

A Demo of the Atlan data dictionary for Snowflake data warehouse

Atlan: A Snowflake Ready Technology partner for data discovery and governance

If you are evaluating and looking to deploy a best-in-class data catalog and data dictionary for the Snowflake data warehouse? Do give Atlan a spin.

Atlan is the first data catalog and metadata management solution approved by the Snowflake Ready Technology Validation Program.

Quoting Tarik Dwiek, Head of Technology Alliances at Snowflake,

As Snowflake continues to help companies mobilize their data, Atlan makes it possible to find, use, document, and collaborate on this data. Our partnership with Atlan was a natural fit, given how its open approach, pay-as-you-go model, and delightful user experience align with Snowflake’s own ethos.

Atlan is more than a metadata management and data cataloging tool. Atlan is built by data engineers for solving the evolving needs of modern data teams which include faster discovery, transparent data flow, robust governance, and collaboration built on open infrastructure and an easy-to-use user interface.

The deep integration and the open API enable Atlan to solve other modern data governance use cases across DataOps, workflow management, and pipeline automation.

Share this article

resource image

Free Guide: Find the Right Data Catalog in 5 Simple Steps.

This step-by-step guide shows how to navigate existing data cataloging solutions in the market. Compare features and capabilities, create customized evaluation criteria, and execute hands-on Proof of Concepts (POCs) that help your business see value. Download now!

[Website env: production]