Data Dictionary: Definition, Examples, Purpose, and Why Do You Need One?

April 25th, 2022

header image for Data Dictionary: Definition, Examples, Purpose, and Why Do You Need One?

What is a data dictionary: The definition

A data dictionary describes the data stored in a database. In simple terms, it provides information and insights about your database, in other words, a data dictionary is a documentation for all the data assets in a database.

Data Dictionary Example

An example of how a data dictionary looks like. Source: Atlan

Traditional data dictionaries usually only make sense to engineering, operations or IT, leaving business people in the dark.

Often, the humans of data (aka folks like you) spend an insane amount of time figuring out what data means and whether or not it’s credible. As per HBR, “80% of a data scientist’s valuable time is spent simply finding, cleaning, and organizing data, leaving only 20% to actually perform analysis”.

A traditional data dictionary cannot solve this problem. Then what will? A modern data dictionary. It is a repository of all column descriptions along with metrics describing the characteristics of the column as well like: mean, median, missing values, etc.

[Download ebook] → Data catalogs are going through a paradigm shift. Here’s all you need to know about a 3rd Generation Data Catalog

Data dictionary example: Key components

A data dictionary should become the go-to tool for the humans of data in your organization to understand everything about a data set and check data quality at a glance. It will have information such as:

  • Tables names and descriptions
  • Table relationships
  • Number of columns, column name, and descriptions
  • Permissible values and validation rules for a field
  • Data types
  • Column nullability
  • Referential constraints — foreign keys and primary keys
  • Data and time when the property was created or changed
  • Data profiling with descriptive statistics — missing values, min-max values, and histogram distribution.
  • Data and time when the property was created or changed
  • Owners and editors of data sets that contain these variables.
  • Social metadata associated with each data asset stored as tags, notes, and chat transcripts
  • Auto-classification of PII and other sensitive data assets

Most importantly, a data dictionary should be right next to your data table with all information easily accessible.

Data dictionary example: Collection of Information about database objects. Source: Atlan

Data dictionary example: Collection of information about various database objects. Source: Atlan

A Guide to Building a Business Case for a Data Catalog

Download ebook

The purpose and importance of a data dictionary

As per the state of data 2018 report , “The estimated global annual spend on data initiatives by companies in 2018 was $114 billion”. Despite significant investments in data lakes, most organizations don’t have an easy way for humans to discover, access and share data. Collecting vast amounts of data is useless if you can’t interpret or analyze it.

Usually, the database administrator or engineer handles transforming and storing this data in warehouses or databases or further analysis. Now imagine if this person were to suddenly disappear tomorrow. Is there documentation somewhere that will explain everything that you need to know to take over the reins?

If you have a data dictionary in place, this won’t be a problem. A data dictionary can help team members learn everything about a data set.

But this isn’t the only reason that you should care about a data dictionary.

Here are the four biggest benefits of a modern data dictionary:

  1. Detect anomalies quickly
  2. Evaluate data quality
  3. Get more trustworthy data
  4. Build transparency within data teams

Detect anomalies quickly

Identifying anomalies in data or missing data is easier with a dictionary since it displays the results of data checks such as minimum and maximum values or the count of distinct values. Spot duplicate, inaccurate or questionable data at a glance.

Data dictionary showing data profiling with descriptive statistics

Modern data dictionaries help enable best-in-class data profiling and quality audits. Image by Atlan

Evaluate data quality

Data dictionaries make it easier to create a standard set of variable names and descriptions across an organization. This helps you automatically understand the quality of your data and makes data analysis quicker and easier. Quickly evaluate data quality and speed up your analysis!

Get more trustworthy data

With all of the information about a data set (sources, owners, descriptions, discussions, etc.) recorded in one place, data becomes more reliable. Now you can truly say, “In data we trust!”

Build transparency within data teams

When the entire organization understands what every detail within a data set means, it brings everyone on the same page, reduces dependencies, helps everyone use the data in the same way, and makes onboarding a breeze.

Modern data dictionary enables transparency and colloboration

Modern data dictionaries foster collaboration and transparency. Image by Atlan

Well, now that you know how handy a data dictionary can be, let’s see how to create one.

Learn more →6 important benefits of a data dictionary

[Download] → Forrester Wave™: Enterprise Data Catalog for DataOps, Q2 2022

How to create a data dictionary

To create a data dictionary, you should have answers to these six questions:

  1. What does each variable/element/field/attribute within a data set mean? What is it describing?
  2. How did you collect each variable? How did you measure it?
  3. If there are numeric values, are these values raw or are they calculated using a formula?
  4. What are the tests or checks you need to run to determine whether your data is trustworthy?
  5. Who collected your data? Are they still the owners or is it somebody else? Who has interacted with your data, and what are the changes that they made? Who oversees the changes made to your data?
  6. How can you reach out to the owners, admins, and editors of your data?

You might notice that it’s harder to find these answers once your data’s already modeled, prepped, and being actively used for analysis.

That’s why it’s a best practice to start building a data dictionary right when you’re modeling your data—it makes it a lot easier to define what each variable stands for, how it is being measured or calculated, who can make changes, and who is responsible for monitoring the changes made.

Data dictionary: Next steps

If you are looking to build a modern data dictionary, do take Atlan for a spin — Atlan is more than a standard data dictionary, it is a third-generation modern data catalog built on the framework of embedded collaboration that is key in today’s modern workplace, borrowing principles from GitHub, Figma, Slack, Notion, Superhuman, and other modern tools that are commonplace today.

Data Catalog Primer - Everything You Need to Know About Data Catalogs.

Adopting a data catalog is the first step towards data discovery. In this guide, we explore the evolution of the data management ecosystem, the challenges created by traditional data catalog solutions, and what an ideal, modern-day data catalog should look like. Download now!