A Metadata is a collection of structured information that describes the source, context, purpose, characteristics, and other extended information about the data asset itself. Metadata helps answer the "what, where, when, how, who" of data.
There are many different types of metadata, all of which are useful in unlocking the value of your data assets. These include:
- Technical metadata
- Business metadata
- Operational metadata
- Structural metadata
- Administrative metadata
- Social metadata
Gartner says, “It is metadata that turns information into an asset. Generally speaking, the more valuable the information asset, the more critical it is to manage the metadata about it.”
Let’s explore these various types of metadata with examples, and recognize how they help you understand your data better.
According to the University of Warwick, “technical metadata provides information on the technical properties of a digital file or the particular hardware and software environments required in order to render or process digital information.”
Technical metadata includes information such as:
- File name
- File format
- Data source
- Geographic location
Technical Metadata Usage Example
Data scientists at a granola bar company are analyzing sales data for its line of products. Technical metadata within the data management platform helps identify exactly where sales originated, such as specific 7/11 convenience stores in the U.S. They can use the specificity the technical metadata provides to compare sales across different states.
Without technical metadata, however, the data workers at the granola bar company would lack visibility into the source and geographic location of sales, hindering their ability to run analysis across different states and convenience stores to compare where performance is strongest and weakest.
Business metadata adds business context to data. From Wikipedia: “It provides information authored by business people and/or used by business people. It is in contrast to technical metadata, which is data used in the storage and structure of the data in a database or system.”
Business metadata includes:
- Business requirements and models
- Business process flows
- Business terminology
Business Metadata Usage Example
A natural baby food maker stores its business metadata in a metadata repository. Inside, current and new employees can find information related to the company’s goal of selling $1M worth of product to purchasers within the next twelve months.
According to the business terminology at this company, "purchasers" are defined as the buyers of the baby food, while “consumers” is used when referencing the babies who eat the food. Defining the two words eliminates confusion when talking about the two groups of people who interact with the product.
Business metadata keeps an organization well aligned. It becomes particularly useful as an organization scales, growing from a small group of workers to a vast enterprise with thousands of people and sites around the world. The information cuts down the costs of misunderstandings and unaligned workflows by ensuring team members are speaking the same common company language and operating with the same understanding.
Operational metadata helps ensure data quality is accurate, current, and error-free. According to IBM,“Operational metadata describes the events and processes that occur and the objects that are affected.”
It adds additional detail to data repositories and ETL processes, enhancing data management and, thus, its use. Outdated data is no different than bad data, so it’s important that data practitioners use the most current version of data in their work.
Types of operational metadata can include information on:
- Load date
- Update date(s)
Operational metadata also includes its subsets, process metadata and provenance metadata.
Process metadata is a subset of operational metadata found within data warehouses or data lakes. It provides information on the details of the process that loads data into the data repositories. This information can be useful in debugging and root cause analysis in the event of a problem.
Process metadata may include information on:
- Job execution logs
- Errors logs
- Audit results
Provenance metadata is another subset of operational metadata that tracks the origin of a data value and its changes over time. It provides information data workers can use in data traceability, so bad or inaccurate data can be located and removed, enhancing data quality and trust.
Provenance metadata may include information on:
- Ownership records
- Change logs
- Versioning records
Operational Metadata Usage Example
Data engineers at an insurance company must have access to current patient data. With operational metadata, the team will be able to see the date the information was originally ingested, when and how the data was used, and in which form (raw, semi-structured, structured) the data currently exists.
This is critical when it comes to data analysis. If data scientists suspect analyses might have been made from bad data, data engineers have visibility into lineage that the operational metadata provides to find, understand, and correct the source of the errors.
Structural metadata provides technical information on the physical organization of a data set. Data workers can use structural metadata in the creation and maintenance of data dictionaries.
Some types of structural metadata include:
- Data element types
- Table names
- Record size
Structural Metadata Usage Example
An ecommerce shop wishes to create a data dictionary that will be included in its data catalog. After forming a data governance committee, the team decides to structure the data dictionary with fields for attribute name, data type (integer, real, character), and ownership.
Structural metadata serves as the backbone for the data dictionary, informing which fields the incoming data should populate. Leveraging structural metadata ensures the governance committee can efficiently and effectively create a
Administrative metadata provides information used in data governance, helping manage and establish the credibility of data assets. Its subset — rights management metadata — refers to intellectual property rights.
Administrative metadata includes:
- Technical data on rights management
- Copyright information and license agreements
- Access control information
- User restrictions
Administrative Metadata Usage Example
An art director at a museum wants to catalog its collection of digital prints for safekeeping and rights management. The director assigns administrative metadata on who can access certain prints, and how specific files might be used (for commercial, personal, or educational use), while restricting who can alter or change the files.
Administrative metadata provides controls on how the files can be used, who can use them, and for what purpose. In the future, data workers who want to comb through historical data assets can reference administrative metadata and confirm the assets’ integrity hasn’t been compromised, engendering trust in the data.
Social metadata provides essential information on how data is being utilized by humans. It includes information such as:
- Author information
- Frequency of use
- Most queried tables
Social Metadata Usage Example
A media company with a robust data science department realizes that they’re collecting far more information than they need for insightful analysis. The team decides to free up space in its data warehouse by eliminating data that is little used. With the right social metadata in place they can query to see which assets the scientists use most and least frequently.
Equipped with this context that social metadata provides, organizations can make strategic decisions on whether to increase, maintain, or decrease the use of specific assets to promote greater efficiency, productivity, and cost savings.
Passive Metadata vs Active Metadata
The different types of metadata listed above can be thought of as either being active or passive. So what’s the difference?
- Passive metadata refers to traditional metadata that provides basic information regarding schema, data types, models, owner name, etc. It is considered to be static, and requires human effort to curate or document. This passive state limits its ability to provide complete visibility into what’s happening inside sophisticated data pipelines.
- Active metadata refers to metadata that has been “activated” using machine learning, AI, and automation. Active metadata is “built on the premise of actively finding, enriching, inventorying, and using all of this metadata, taking a traditionally “passive” technology and making it truly action-oriented.”
In the era of Big Data, passive metadata is not scalable and is ineffective for data management because it relies on manual action. This is incapable of keeping up with all data being ingested.
Active metadata, however, is captured from sources in real time, so data practitioners and business leaders can easily identify, track, manage, trust, and understand data assets.
Active Metadata Platform: Open API and collaboration focussed
An active metadata platform is an always-on, intelligence-driven, action-oriented system that provides a 360-degree view into the latest version of a data asset. It’s constantly collecting metadata without having to rely on manual entries. It does so by leveraging machine learning to process metadata and create actionable intelligence so teams can extract the greatest value from their data.
If you are evaluating a metadata management solution for your business, do take Atlan for a spin.
Atlan is built for the modern data stack, clearly shifting from the traditional metadata management to the new era of: “Collaboration-focused, intelligent, and open by default”
Types of metadata: Related reads
- What is metadata: Definition, examples, and types
- What is metadata management and why is it so important?
- 6 Metadata Management Best Practices to Follow in 2022
- What is the difference between data catalog and metadata management?