Snowflake Data Dictionary — Documentation for Your Database

September 14th, 2022

header image for Snowflake Data Dictionary — Documentation for Your Database

Finding the right data is the hardest and the most time-consuming task in a data analyst’s/scientist’s typical work day. Data discovery and search have now become a business-critical factor influencing the success/failure of a data analytics project.

In addition to data catalog and data lineage that help make search and discovery easier and faster. Let’s explore, in the context of Snowflake warehouse, the importance and role of a data dictionary in helping establish trust in a data asset.

What is a data dictionary?

A data dictionary holds the metadata for the database assets. It is the documentation for your database which describes the structure and content of the database objects.

Quoting the Harvard Business Review,

Studies show that knowledge workers waste up to 50% of time hunting for data, identifying and correcting errors, and seeking confirmatory sources for data they do not trust.

It becomes extremely difficult to understand data without metadata which helps explain the source, nature, structure, quality, and usefulness of data.

A data dictionary is a collection of read-only tables and views that gets automatically created with every database. So one doesn’t have to probe source code, SQL queries, BI reports and depend on the tribal knowledge of the few to get more context about the data.

Learn more - Built-in data dictionary in PostgreSQL | Oracle Database | MySQL

Accessing PostgreSQL data dictionary with Amazon RDS

Accessing PostgreSQL data dictionary with Amazon RDS. Source: Amazon RDS

A Guide to Building a Business Case for a Data Catalog

Download free ebook

Benefits of a data dictionary

Unified source of truth

Information about databases, schemas, and tables is managed in a single source of truth — a data dictionary that is editable, sharable, auditable, and self-serve, as compared to knowledge spread across various teams, different tools, and chaotic access policies. A data dictionary makes new analysts’ onboarding and training much more easier and productive.

Data discovery

The self-serve nature of the data dictionary helps analysts search and understand data needed for analysis and reporting. Understanding table relationships and foreign key constraints help analysts write better-performing SQL queries.  It helps remove removes dependencies and bottlenecks and hence better governance and stewardship.

Trust data better

Data profiling gives a quick snapshot of the nature and quality of data. The usability of data assets can be ascertained through: Data types, min-max values, frequency distribution, unique, duplicate, and null values.

Database optimization

A data dictionary is a rich source of technical metadata such as table relationships, query history, foreign key constraints, and database usage and performance. This helps DataOps engineers to monitor and tune databases for optimal performance and costs (storage and processing)

Snowflake Data Dictionary: The Information Schema

Snowflake has a built-in data dictionary known as and can be accessed through INFORMATION_SCHEMA.  The Information Schema is always included within every database created in Snowflake.

The Snowflake data dictionary has the following two components:

Schema views: Metadata about all data assets/objects stored in the database.

Table functions: Metadata about historical information on storage, tasks, query history, and account-level usage.

Learn more: Using Snowflake Information Schema as a data dictionary

Data dictionary in Snowflake is accessed through Information Schema

Data dictionary in Snowflake is accessed through Information Schema. Source: Snowflake: The Definitive Guide - Joyce Kay Avila

Key components of a Snowflake data dictionary

  • Table schema, names, and descriptions
  • Table owners and relationships
  • Number of columns, rows, column name, and descriptions
  • Permissible values and validation rules for a column
  • Data types
  • Referential constraints — foreign keys and primary keys
  • Data profiling with descriptive statistics — missing values, min-max values, and histogram distribution
  • SQL query logs
  • Warehouse storage, load, and a metering history

Learn more: Reference documentation of Snowflake data dictionary

Challenges using Snowflake Data Dictionary

Accessibility for non-technical/business users

Managing and accessing a data dictionary in Snowflake is only possible by running SQL queries. This makes it harder for business users to use dictionaries for data discovery.

Writing SQL queries to access data dictionary in Snowflake

Writing SQL queries to access data dictionary in Snowflake. Source: Snowflake

For accessing a data dictionary in Snowflake, one still needs to have a general idea of where a particular data asset comes from, this makes it difficult when you have thousands of tables spread over different databases.

Multiple data sources support

Snowflake provides data dictionary only for databases stored within the Snowflake warehouse. When you have data stored at non-snowflake databases, you’ll need a centralized data dictionary tool to assimilate all data sources.

Lack of custom metadata support

Snowflake data dictionary supports only metadata exposed through the API. It is not possible to add custom metadata — ETL logs, Airflow status, data quality checks — to the data dictionary.

Lack of data collaboration

As the volume of data on your Snowflake warehouse increases, access to the data dictionary alone not would be sufficient to find and trust data. Bringing in the human element through embedded collaboration — chats, crowdsourced data stewardship, and integrations with IT/task management tools — helps make data reliable and usable.

[Download] → Forrester Wave™: Enterprise Data Catalog for DataOps, Q2 2022

Atlan: Modern data dictionary for Snowflake

The problem with traditional data dictionaries (that come built-in within databases) is that it is driven by passive metadata: Metadata that is default, limiting, and doesn’t give you a complete picture of a data asset.

On the contrary, the crux of active metadata is its openness and interoperability. Atlan makes it possible to effortlessly move metadata across your data stack — data lakes, warehouses, BI tools, pipelines, ETL — and helps embed this rich context on data catalog, lineage, and data dictionary.

Atlan’s data dictionary helps you build a 360° data profile to understand your data better, search and discover data faster, and to collaborate in and around data assets.

Let’s now look at the key data dictionary capabilities of Atlan:

Auto-generated data dictionary

Apart from Snowflake, Atlan’s bots automatically crawl and pulls data from all the commonly used data sources like Bigquery, Databricks, Redshift, MySQL, and PostgreSQL. Atlan also connects with BI tools like Looker, Tableau, and Power BI and thus giving your data dictionary an end-to-end view of a data asset.

Learn more: How Atlan connects with multiple data sources

Snowflake data dicionary: Atlan — Automated data dictionary for Snowflake data warehouse

Atlan — Automated data dictionary for Snowflake data warehouse. Source: Atlan

Search and discovery on an easy-to-use interface

Atlan’s primary use case is a data catalog. It provides a Google-like search experience that lets anyone on your team — from a data engineer to a business user/analyst — to search and filter data assets on an easy-to-use user interface.

Learn more: Atlan data catalog for data discovery and search

Snowflake data dicionary: Atlan data catalog provides Google-like search to discover and find the right data for your analysis

Atlan data catalog provides Google-like search to discover and find the right data for your analysis. Source: Atlan

Data dictionary enriched with custom metadata

Atan’s APIs lets you track all kinds of metadata — technical, administrative, business, and social — from your warehouse, ingestion, ETL, and business intelligence tools. Questions like, “when was the last time this data was updated”, and “what are the transformations the data has gone through” are answered right there on the data dictionary.

Learn more: Custom metadata support in Atlan

Everyone is on the same page with a business glossary

Atlan’s data dictionary also has a business glossary — a centralized knowledge bank that lists all of the key business terms, concepts, KPIs, and metrics associated with a data asset.

Learn more: Build a business knowledge base with a data glossary

Snowflake data dicionary: A centralized repository that lists all of the key business terms and concepts used in the day-to-day operation of an organization

A centralized repository that lists all of the key business terms and concepts used in the day-to-day operation of an organization. Source: Atlan

End-to-end visibility with data lineage

Atlan’s automated data lineage crawls data sources and parses SQL queries to build column-level lineage visualization. Data lineage helps understand relationships and dependencies between various data assets across the entire data life cycle.

Learn more: Visualize your data asset from the source to the BI dashboard

Snowflake data dicionary: Get end-to-end data visiblity from the data sources to BI dashboard with data lineage

Get end-to-end data visiblity from the data sources to BI dashboard with data lineage. Source: Atlan

Democratize data with embedded collaboration

Atlan seamlessly integrates with your team’s communication and project management tools, this lets you initiate conversations around a data asset right within Atlan. Crowdsource data stewardship tasks like updating documentation/READMEs, adding owners, certifying terms, and adding linked assets.

Learn more: Democratize your data team with embedded collaboration

Snowflake data dicionary: With Atlan, search and find data right from the tools you are familiar with. Source: Atlan

With Atlan, search and find data right from the tools you are familiar with. Source: Atlan

A Demo of Atlan data dictionary for Snowflake data warehouse

Atlan: A Snowflake validated partner for data discovery and governance

If you are evaluating and looking to deploy a best-in-class data catalog and data dictionary for the Snowflake data warehouse? Do give Atlan a spin.

Atlan is the first data catalog and metadata management solution validated by Snowflake’s technology validation program.

Quoting Tarik Dwiek, Head of Technology Alliances at Snowflake,

As Snowflake continues to help companies mobilize their data, Atlan makes it possible to find, use, document, and collaborate on this data. Our partnership with Atlan was a natural fit, given how its open approach, pay-as-you-go model, and delightful user experience align with Snowflake’s own ethos.

Atlan is more than a metadata management and data cataloging tool. Atlan is built by data engineers solving for the evolving needs of the modern data teams which include faster discovery, transparent data flow, robust governance, and collaboration built on open infrastructure and an easy-to-use user interface.

The deep integration and the open API enable Atlan to solve other modern data governance use cases across DataOps, workflow management, and pipeline automation.

Free Guide: Find the Right Data Catalog in 5 Simple Steps.

This step-by-step guide shows how to navigate existing data cataloging solutions in the market. Compare features and capabilities, create customized evaluation criteria, and execute hands-on Proof of Concepts (POCs) that help your business see value. Download now!