Snowflake Data Dictionary: Documentation for Your Database
Share this article
Finding the right data is the hardest and the most time-consuming task in a data analyst’s/scientist’s typical work day. Data discovery and search have now become business-critical, influencing the success/failure of a data analytics project.
In the context of the Snowflake platform, how important is a data dictionary? How does a data dictionary help in establishing trust in snowflake data assets? Here, we will explore answers to these questions.
Table of contents
- What is a data dictionary?
- Benefits of a data dictionary
- Snowflake Data Dictionary: The Information Schema
- Key components of a Snowflake data dictionary
- Challenge using Snowflake data dictionary
- Atlan: The Modern data dictionary for Snowflake
What is a data dictionary?
A data dictionary holds the metadata for the database assets. It is the documentation for your database which describes the structure and content of the database objects.
Quoting the Harvard Business Review,
Studies show that knowledge workers waste up to 50% of time hunting for data, identifying and correcting errors, and seeking confirmatory sources for data they do not trust.
It becomes extremely difficult to understand data without metadata which helps explain the source, nature, structure, quality, and usefulness of data.
A data dictionary is a collection of read-only tables and views that gets automatically created with every database. So one doesn’t have to probe source code, SQL queries, BI reports and depend on the tribal knowledge of the few to get more context about the data.
A Guide to Building a Business Case for a Data Catalog
Top 4 benefits of a data dictionary
Without a data dictionary for your Snowflake assets, one has to rely on siloed, undocumented tribal knowledge or spend lots of time digging through codebases, SQL queries, and logs.
Unified source of truth
Information about databases, schemas, and tables is managed in a single source of truth — a data dictionary that is editable, shareable, auditable, and self-serve, as compared to knowledge spread across various teams, different tools, and chaotic access policies. A data dictionary makes new analysts’ onboarding and training much easier and more productive.
The self-serve nature of the data dictionary helps analysts search and understand data needed for analysis and reporting. Understanding table relationships and foreign key constraints help analysts write better-performing SQL queries. It helps remove dependencies and bottlenecks and thereby enabling better governance and stewardship.
Trust data better
Data profiling gives a quick snapshot of the nature and quality of data. The usability of data assets can be ascertained through: Data types, min-max values, frequency distribution, and unique, duplicate, and null values.
A data dictionary is a rich source of technical metadata such as table relationships, query history, foreign key constraints, and database usage and performance. This helps DataOps engineers to monitor and tune databases for optimal performance and costs (storage and processing)
Snowflake Data Dictionary: The Information Schema
Does Snowflake have a data dictionary? Yes, it does.
Snowflake has a built-in data dictionary known as and can be accessed through INFORMATION_SCHEMA. The Information Schema is always included within every database created in Snowflake.
The Snowflake data dictionary has the following two components:
1. Schema views: Metadata about all data assets/objects stored in the database.
2. Table functions: Metadata about historical information on storage, tasks, query history, and account-level usage.
Building blocks of a Snowflake data dictionary
- Table schema, names, and descriptions
- Table owners and relationships
- Number of columns, rows, column names, and descriptions
- Permissible values and validation rules for a column
- Data types
- Referential constraints — foreign keys and primary keys
- Data profiling with descriptive statistics — missing values, min-max values, and histogram distribution
- SQL query logs
- Warehouse storage, load, and a metering history
5 key challenges of using a Snowflake Data Dictionary
The 5 key challenges of relying just on Snowflake’s native data dictionary include:
- Accessibility for non-technical users
- Absence of an intuitive search experience
- Support for data stored at non-snowflake databases
- Lack of custom metadata support
- Lack of data collaboration
Accessibility for non-technical/business users
Managing and accessing a data dictionary in Snowflake is only possible by running SQL queries. This makes it harder for business users to use dictionaries for data discovery.
Absence of a universal search
For accessing a data dictionary in Snowflake, one still needs to have a general idea of where a particular data asset comes from. This makes it difficult when you have thousands of tables spread over different databases.
Multiple data sources support
Snowflake provides a data dictionary only for databases stored within the Snowflake warehouse. When you have data stored at non-Snowflake databases, you’ll need a centralized data dictionary tool to assimilate all data sources.
Lack of custom metadata support
Snowflake data dictionary supports only metadata exposed through the API. It is not possible to add custom metadata — ETL logs, Airflow status, and data quality checks — to the data dictionary.
Lack of data collaboration
As the volume of data on your Snowflake warehouse increases, access to the data dictionary alone would not be sufficient to find and trust data. Bringing in the human element through embedded collaboration — chats, crowdsourced data stewardship, and integrations with IT/task management tools — helps make data reliable and usable.
Atlan: A modern data dictionary for Snowflake
The problem with traditional data dictionaries (that come built-in within databases) is that it is driven by passive metadata: Metadata that is the default, limiting, and doesn’t give you a complete picture of a data asset.
On the contrary, the crux of active metadata is its openness and interoperability. Atlan makes it possible to effortlessly move metadata across your data stack — data lakes, warehouses, BI tools, pipelines, ETL — and helps embed this rich context on data catalog, lineage, and data dictionary.
Atlan’s data dictionary helps you build a 360° data profile to understand your data better, search and discover data faster, and collaborate in and around data assets.
Now, let’s have a look at the key data dictionary capabilities of Atlan:
Auto-generated data dictionary
Apart from Snowflake, Atlan’s bots automatically crawl and pull data from all the commonly used data sources like Bigquery, Databricks, Redshift, MySQL, and PostgreSQL. Atlan also connects with BI tools like Looker, Tableau, and Power BI and so it gives your data dictionary an end-to-end view of a data asset.
Learn more: How Atlan connects with multiple data sources
Search and discovery on an easy-to-use interface
Atlan’s primary use case is a data catalog. It provides a Google-like search experience that lets anyone on your team — from a data engineer to a business user/analyst — search and filter data assets on an easy-to-use user interface.
Learn more: Atlan data catalog for data discovery and search
Data dictionary enriched with custom metadata
Atan’s APIs let you track all kinds of metadata — technical, administrative, business, and social — from your warehouse, ingestion, ETL, and business intelligence tools. Questions like, “when was the last time this data was updated?”, and “what are the transformations the data has gone through?” are answered right there in the data dictionary.
Learn more: Custom metadata support in Atlan
Everyone is on the same page with a business glossary
Atlan’s data dictionary also has a business glossary — a centralized knowledge bank that lists all of the key business terms, concepts, KPIs, and metrics associated with a data asset.
End-to-end visibility with data lineage
Atlan’s automated data lineage crawls data sources and parses SQL queries to build column-level lineage visualization. Data lineage helps understand relationships and dependencies between various data assets across the entire data life cycle.
Democratize data with embedded collaboration
Atlan seamlessly integrates with your team’s communication and project management tools. This lets you initiate conversations around a data asset right within Atlan. Crowdsource data stewardship tasks like updating documentation/READMEs, adding owners, certifying terms, and adding linked assets.
A Demo of the Atlan data dictionary for Snowflake data warehouse
Atlan: A Snowflake Ready Technology partner for data discovery and governance
If you are evaluating and looking to deploy a best-in-class data catalog and data dictionary for the Snowflake data warehouse? Do give Atlan a spin.
Atlan is the first data catalog and metadata management solution approved by the Snowflake Ready Technology Validation Program.
Quoting Tarik Dwiek, Head of Technology Alliances at Snowflake,
As Snowflake continues to help companies mobilize their data, Atlan makes it possible to find, use, document, and collaborate on this data. Our partnership with Atlan was a natural fit, given how its open approach, pay-as-you-go model, and delightful user experience align with Snowflake’s own ethos.
Atlan is more than a metadata management and data cataloging tool. Atlan is built by data engineers for solving the evolving needs of modern data teams which include faster discovery, transparent data flow, robust governance, and collaboration built on open infrastructure and an easy-to-use user interface.
The deep integration and the open API enable Atlan to solve other modern data governance use cases across DataOps, workflow management, and pipeline automation.
Snowflake data dictionary: Related reads
- Data catalog for Snowflake data warehouse
- How to manage data governance for Snowflake data warehouse
- Snowflake data access control made easy and scalable
- Visualize data lineage for Snowflake data objects
- How to manage metadata for Snowflake data assets
- Data dictionary: Definition, examples, purpose, and why do you need one?
Share this article