What is metadata?
In simple terms, metadata is “data/information about data". Metadata helps us understand the structure, nature, and context of the data.
Metadata facilitates easy search and retrieval of data. Metadata also helps keep a check on the quality and reliability of data. Metadata is the key to unlocking the value of your data.
Let's look at other definitions of metadata:
“A set of data that describes and gives information about other data.”
One useful definition of metadata is “any data which conveys knowledge about an item without requiring examination of the item itself.” Because metadata derives its value from saving human time and attention, it must be effective at distinguishing relevant and irrelevant or redundant content.
What meta itself means:
“Meta is a word which, like so many other things, we have the ancient Greeks to thank for. When they used it, meta meant “beyond,” “after,” or “behind.” The “beyond” sense of meta still lingers in words like metaphysics or meta-economy.”
Modern Metadata Management: The Key Trends, the Data Stack, and the Humans of Data
Examples of metadata
Let’s take an example of an image. To the naked eye, a rose is just a rose.
But to the more discerning “meta” eye, a rose is so much more. It’s the sum total of its meta.
You might be surprised by the amount of metadata that goes into describing an image.
Some of the metadata information stored are:
- The make of the camera
- Lenses used
- Time at which the picture was taken
- Focal length
- GPS coordinates of the location
- Image resolution
- Color profiles.
Image metadata gives technical insights that are helpful during image processing. Metadata also facilitates easy search, retrieval, and backups and hence helps increase productivity.
Let’s take another example of looking into the metadata of an mp3 audio file.
The key metadata information are:
- Audio format
- Bit rate
- Album release date
Metadata in database:
Getting closer home to the humans of data, here’s an example of something that we use on a daily basis—the mighty Excel sheet.
While the data in an Excel sheet refers to the actual information (numbers or text) contained in rows X columns, the metadata refers to the description:
- Tables/column names, source, descriptions, and relationships
- Validation rules for a data asset
- Data types
- Column statistics — missing values, min-max values, and histogram distribution.
- Data owner
So you get a better context on the data itself. Like an explainer.
Understand the Defining Attributes of Third-Generation Data Catalogs
Why is metadata so important?
Data is nothing but the sum total of its metadata. It is what helps us create a complete picture of our data and understand it in its entirety.
For instance, after the COVID-19 pandemic, medical/pharma research became increasingly collaborative. Researchers needed an effective system to search, share, understand, peer review, and replicate experiments.
The fundamental aspect of such a system is the availability of robust metadata.
Metadata for scientific research includes information about test design, test population details, the definition of terms, measurement methods, and data collection schedules.
Given that enterprises are increasingly investing in and betting on data to make better decisions, the amount of data we use is only set to increase. In order, to increase the shelf life and longevity of data, it’s important for companies to invest in managing their metadata as well.
The need of the hour is to remove data silos, let analytics flow at the speed of thought and create a single source of truth for your entire team, which brings us to an important point.
Types of metadata
Today, metadata is everywhere. Every component of the modern data stack and every user interaction on it generates metadata. Apart from traditional forms like technical metadata (e.g. schemas) and business metadata (e.g. taxonomy, glossary), our data systems now create entirely new forms of metadata.
Technical metadata is information about the data itself. It is the documentation for the database. This includes information about the design and structure of schema, table and column information, column size, validation rules, and data quality profiles on data assets.
Structural metadata provides information that helps establish object-to-object relationships and hierarchical structure between different data assets. This includes table names, data types, data sources, foreign key cardinality, and referential integrity.
Operational metadata tracks all the information related to the flow of data throughout its lifecycle. This includes information about the data source, data transformations, lineage, and logs of ETL and orchestration jobs.
Business metadata is a glossary of terms/definitions that helps business users understand a particular data asset.
For instance, questions like, does the metric annual recurring revenue(ARR) on the dashboard includes one-time discount and initial setup costs? Could be documented in the glossary for reference.
Business metadata enables collaboration to validate, verify, and attach terms to the right data assets.
Administrative metadata provides information related to governance, privacy, security, and access controls. This includes technical data on rights management, copyright information, and license agreements, access control information, and user restrictions.
As more and more businesses embrace democratizing their data, a new set of valuable metadata is emerging from the collaborative efforts.
Examples of social metadata include ratings, chat transcripts, notes, tags, comments, glossary, and bookmarks.
Provenance metadata is information about the origin of a data asset. It informs about data sources, ownerships, transformations, freshness, usage, and archival.
Subscribe Now → 5000+ data leaders follow our Metadata Weekly newsletter!
What are the biggest challenges in metadata management?
One of the biggest problems facing businesses is that though they are aware of the value of metadata and have invested in managing it, they are yet to see enough ROI.
Sadly, companies have traditionally invested in more manual, ad-hoc processes to manage their situation. Departments would either share information, including metadata, verbally or by maintaining Excel/doc files to document data.
- No one knows where the documents are located—missing information
- No one bothers to update the documents, especially when people move on—outdated data
- No one knows how data sets are related—and how to fix changing values across all of them—no data lineage or data quality checks
- No way to maintain all revisions or versions of data
- No way to keep metadata along with the data—leading to even more data silos and versions of the truth
That’s why simply plugging in an isolated metadata management tool or metadata catalog within your data lake may not be the answer to your data woes. Today’s business mandates that data be available for whoever needs it, wherever and whenever they need it—with all the context they need.
Learn more: What is Metadata Management? Benefits, tools, and best practices
Robust metadata management is the key for data-driven teams to discover, understand, trust, and collaborate on data assets across your data universe. If you are looking to implement a metadata management tool for your organization, do take Atlan for a spin.
Atlan Case study: Metadata Management at WeWork
What is metadata: Related reads
- Types of metadata: How each helps with faster data discovery and better insights
- What is the difference between data catalog and metadata management?
- Metadata management 101: Benefits, tools, and best practices
- 6 metadata management best practices to follow in 2022
- Data vs. Metadata: Understand the differences
- Enterprise metadata management and its importance in the modern data stack