What Is a Data Glossary? And Four Ways to Create a Useful One.

November 25th, 2020

Data Glossary

What is Data Glossary?

A data glossary is a collection of all terms that define your data's key characteristics, organized in a way that is easy to search.

A glossary is a list of terms and their definitions that gives context and helps organize knowledge. A data glossary serves the same purpose for all the data assets in an organization. It contains business terms, phrases, and concepts that help define the data.

Apart from providing context, a data glossary can help organize and thus make it easier to discover data assets. For example, terms like “cost”, “profit”, and “revenue” can be used to define and group all financial data assets.

A data glossary is more commonly referred to as a “business glossary”, and these two terms can be used interchangeably. Why? Because the terminology used in a data glossary is consistent with business concepts. A useful business glossary will help prevent confusion and create a common language to communicate about data across the organization.

Curious how a data glossary is different from a data dictionary? Read about it here: What is a Data Dictionary?

Why do you need a data glossary ?

A data glossary or a business glossary is the bridge between the IT and the business teams — those who maintain and create data, and those who use data to drive actions. If you do not understand the data or cannot locate it quickly, you can never use it effectively.

A well-maintained data glossary can become a single source of truth and thus increase an organization’s overall trust in data.

    Here are five ways in which a data glossary can help your organization:
  1. Improves understanding of data
  2. Makes data visible
  3. Enables collaboration
  4. Powers search
  5. Promotes data governance

Improves understanding of data

By linking the right data glossary terms to data, you can figure out what is inside the data without even opening it. For example, say that a data table has glossary terms like “region”, “sales”, “quantity”, “brand”, and “year” attached to it. You can easily infer that this table has data related to sales, and the quantity is probably given brand-wise at a regional level.

Makes data visible

A useful data glossary gives all business users visibility into data without worrying about whether they have full access. It promotes awareness about existing data and makes the organization more data-driven.

Enables collaboration

The first step to overcome data communication and collaboration challenges is to create a business glossary. It creates a common ground of contextual knowledge that is accessible to everyone. As the chances of misunderstanding decrease, data scientists and analysts will be able to communicate better with other teams.

Powers search

The glossary terms linked to data assets improves data discovery. A data glossary provides additional metadata that helps generate more accurate search results. This makes searching for data faster and easier.

Promotes data governance

A good data glossary can pave the way for a successful data governance initiative in your organization. Standardizing data terms and definitions helps improve the quality of both data assets and the organization’s data knowledge. An ideal data glossary can even help an organization to maintain access policies using the glossary terms.

A well-maintained data glossary can become a single source of truth and thus increase overall enterprise data trust.

Four ways to build a useful data glossary

To reap all the benefits listed above, a data glossary has to be useful. Your data colleagues should be able to use it quickly and easily.

Here are some practical tips for creating a useful data glossary:

1.Follow industrial best practices

Instead of starting from scratch or inventing new terminology, follow the existing industrial standards. This will make your data glossary generic across your organization, rather than changing it for each new type of data or use case. For example, you can use the Financial Industry Business Ontology as a standard glossary for financial data. You can always tweak the terminology based on your requirements. But starting a business glossary from blank paper can become a daunting task.

2.Link data glossary with your data ecosystem

Bring your data glossary and data together. The glossary terms should be linked to your data. This helps a data steward or admin see how the glossary helps them in their daily work; otherwise, they will stop maintaining it. You can even use intelligence bots to auto-suggest glossary terms for your data assets.

Data glossary and data together

3. Assign a business owner to enrich the data glossary

Make sure someone (like a data steward) is responsible for regularly updating your data glossary. It’s also important that it is easy to update and add glossary terms. Ideally, the data users should be able to suggest glossary terms (i.e. crowdsource them) for their data assets, since they have the full context on that data.

Crowdsource data glossary

4.Maintain a hierarchical glossary structure

A hierarchical glossary structure will allow data glossaries from multiple domains to co-exist. For example, an enterprise may have data related to both finances and retail. The terminology of each will be different; hence, it will need a folder structure to nest glossary terms for each separately. Apache Atlas can be a useful tool to create a data glossary with a folder-like hierarchical structure.

Hierarchical glossary structure

Read more

A useful data glossary can be a significant step to drive good data governance in your organization. Check out the resources below to learn more about how to create a great data glossary.

And that’s it! Time to go forth and jumpstart your data governance strategy create one source of truth for your data.

Are you looking for an intelligent Data Glossary solution?See the demo

Ebook cover - data catalog primer

Data Catalog Primer - Everything You Need to Know About Data Catalogs.

Adopting a data catalog is the first step towards data discovery. In this guide, we explore the evolution of the data management ecosystem, the challenges created by traditional data catalog solutions, and what an ideal, modern-day data catalog should look like. Download now!