What is Data Glossary?
A data glossary is a collection of all terms that define your data's key characteristics, organized in a way that is easy to search.
A glossary is a list of terms and their definitions that gives context and helps organize knowledge. A data glossary serves the same purpose for all the data assets in an organization. It contains business terms, phrases, and concepts that help define the data.
Apart from providing context, a data glossary can help organize and thus make it easier to discover data assets. For example, terms like “cost”, “profit”, and “revenue” can be used to define and group all financial data assets.
A data glossary is more commonly referred to as a “business glossary”, and these two terms can be used interchangeably. Why? Because the terminology used in a data glossary is consistent with business concepts. A useful business glossary will help prevent confusion and create a common language to communicate about data across the organization.
Curious how a data glossary is different from a data dictionary? Read about it here: What is a Data Dictionary?
Why do you need a data glossary ?
A data glossary or a business glossary is the bridge between the IT and the business teams — those who maintain and create data, and those who use data to drive actions. If you do not understand the data or cannot locate it quickly, you can never use it effectively.
A well-maintained data glossary can become a single source of truth and thus increase an organization’s overall trust in data.
Here are five ways in which a data glossary can help your organization:
- Improves understanding of data
- Makes data visible
- Enables collaboration
- Powers search
- Promotes data governance
Improves understanding of data
By linking the right data glossary terms to data, you can figure out what is inside the data without even opening it. For example, say that a data table has glossary terms like “region”, “sales”, “quantity”, “brand”, and “year” attached to it. You can easily infer that this table has data related to sales, and the quantity is probably given brand-wise at a regional level.
Makes data visible
A useful data glossary gives all business users visibility into data without worrying about whether they have full access. It promotes awareness about existing data and makes the organization more data-driven.
The first step to overcome data communication and collaboration challenges is to create a business glossary. It creates a common ground of contextual knowledge that is accessible to everyone. As the chances of misunderstanding decrease, data scientists and analysts will be able to communicate better with other teams.
The glossary terms linked to data assets improves data discovery. A data glossary provides additional metadata that helps generate more accurate search results. This makes searching for data faster and easier.
Promotes data governance
A good data glossary can pave the way for a successful data governance initiative in your organization. Standardizing data terms and definitions helps improve the quality of both data assets and the organization’s data knowledge. An ideal data glossary can even help an organization to maintain access policies using the glossary terms.
A well-maintained data glossary can become a single source of truth and thus increase overall enterprise data trust.
Four ways to build a useful data glossary
To reap all the benefits listed above, a data glossary has to be useful. Your data colleagues should be able to use it quickly and easily.
Here are some practical tips for creating a useful data glossary:
1. Follow industrial best practices
Instead of starting from scratch or inventing new terminology, follow the existing industrial standards. This will make your data glossary generic across your organization, rather than changing it for each new type of data or use case. For example, you can use the Financial Industry Business Ontology as a standard glossary for financial data. You can always tweak the terminology based on your requirements. But starting a business glossary from blank paper can become a daunting task.
2. Link data glossary with your data ecosystem
Bring your data glossary and data together. The glossary terms should be linked to your data. This helps a data steward or admin see how the glossary helps them in their daily work; otherwise, they will stop maintaining it. You can even use intelligence bots to auto-suggest glossary terms for your data assets.
3. Assign a business owner to enrich the data glossary
Make sure someone (like a data steward) is responsible for regularly updating your data glossary. It’s also important that it is easy to update and add glossary terms. Ideally, the data users should be able to suggest glossary terms (i.e. crowdsource them) for their data assets, since they have the full context on that data.
4. Maintain a hierarchical glossary structure
A hierarchical glossary structure will allow data glossaries from multiple domains to co-exist. For example, an enterprise may have data related to both finances and retail. The terminology of each will be different; hence, it will need a folder structure to nest glossary terms for each separately. Apache Atlas can be a useful tool to create a data glossary with a folder-like hierarchical structure.
A useful data glossary can be a significant step to drive good data governance in your organization. Check out the resources below to learn more about how to create a great data glossary.
- Business glossary basics - Dataversity
- Open business glossary community - OpenSource
- A step-by-step guide on building managing and sharing a business-glossary - DGPO
- Selection criteria for business glossary tools - Dataversity
- You probably don’t need a data dictionary - LocallyOptimistic
- Data glossary - Atlan
And that’s it! Time to go forth and jumpstart your data governance strategy create one source of truth for your data.
Are you looking for an intelligent Data Glossary solution?