What is metadata?
In simple terms, metadata is “data/information about data". Metadata helps us understand the structure, nature, and context of the data.
Metadata facilitates easy search and retrieval of data. Metadata also helps keep a check on the quality and reliability of data. Metadata is the key to unlocking the value of your data.
Let's look at other definitions of metadata:
“A set of data that describes and gives information about other data.”
One useful definition of metadata is “any data which conveys knowledge about an item without requiring examination of the item itself.” Because metadata derives its value from saving human time and attention, it must be effective at distinguishing relevant and irrelevant or redundant content.
What meta itself means:
“Meta is a word which, like so many other things, we have the ancient Greeks to thank for. When they used it, meta meant “beyond,” “after,” or “behind.” The “beyond” sense of meta still lingers in words like metaphysics or meta-economy.”
[Download ebook] → Data catalogs are going through a paradigm shift. Here’s all you need to know about a 3rd Generation Data Catalog
Examples of metadata
Let’s take an example of an image. To the naked eye, a rose is just a rose.
But to the more discerning “meta” eye, a rose is so much more. It’s the sum total of its meta.
You might be surprised by the amount of metadata that goes into describing an image.
Some of the metadata information stored are:
- The make of the camera
- Lenses used
- Time at which the picture was taken
- Focal length
- GPS coordinates of the location
- Image resolution
- Color profiles.
Image metadata gives technical insights that are helpful during image processing. Metadata also facilitates easy search, retrieval, and backups and hence helps increase productivity.
Let’s take another example of looking into the metadata of an mp3 audio file.
The key metadata information are:
- Audio format
- Bit rate
- Album release date
Metadata in database:
Getting closer home to the humans of data, here’s an example of something that we use on a daily basis—the mighty Excel sheet.
While the data in an Excel sheet refers to the actual information (numbers or text) contained in rows X columns, the metadata refers to the description:
- Tables/column names, source, descriptions, and relationships
- Validation rules for a data asset
- Data types
- Column statistics — missing values, min-max values, and histogram distribution.
- Data owner
So you get a better context on the data itself. Like an explainer.
[Download] → Forrester Wave™: Enterprise Data Catalog for DataOps, Q2 2022
Why is metadata so important?
Data is nothing but the sum total of its metadata. It is what helps us create a complete picture of our data and understand it in its entirety.
For instance, after the COVID-19 pandemic, medical/pharma research became increasingly collaborative. Researchers needed an effective system to search, share, understand, peer review, and replicate experiments.
The fundamental aspect of such a system is the availability of robust metadata.
Metadata for scientific research includes information about test design, test population details, the definition of terms, measurement methods, and data collection schedules.
Given that enterprises are increasingly investing in and betting on data to make better decisions, the amount of data we use is only set to increase. In order, to increase the shelf life and longevity of data, it’s important for companies to invest in managing their metadata as well.
The need of the hour is to remove data silos, let analytics flow at the speed of thought and create a single source of truth for your entire team, which brings us to an important point.
Types of metadata
Today, metadata is everywhere. Every component of the modern data stack and every user interaction on it generates metadata. Apart from traditional forms like technical metadata (e.g. schemas) and business metadata (e.g. taxonomy, glossary), our data systems now create entirely new forms of metadata.
Technical metadata is information about the data itself. It is the documentation for the database. This includes information about the design and structure of schema, table and column information, column size, validation rules, and data quality profiles on data assets.
Operational metadata tracks all the information related to the flow of data throughout its lifecycle. This includes information about the data source, data transformations, lineage, and logs of ETL jobs.
Business metadata is a glossary of terms/definitions that helps business users understand a particular data asset.
For instance, questions like, does the metric annual recurring revenue(ARR) on the dashboard includes one-time discount and initial setup costs? Could be documented in the glossary for reference.
Business metadata enables collaboration to validate, verify, and attach terms to the right data assets.
As more and more businesses embrace democratizing their data, a new set of valuable metadata is emerging from the collaborative efforts.
Examples of social metadata include ratings, chat transcripts, notes, tags, comments, glossary, and bookmarks.
A Guide to Building a Business Case for a Data Catalog
What are the biggest challenges in metadata management?
One of the biggest problems facing businesses is that though they are aware of the value of metadata and have invested in managing it, they are yet to see enough ROI.
Sadly, companies have traditionally invested in more manual, ad-hoc processes to manage their situation. Departments would either share information, including metadata, verbally or by maintaining Excel/doc files to document data.
- No one knows where the documents are located—missing information
- No one bothers to update the documents, especially when people move on—outdated data
- No one knows how data sets are related—and how to fix changing values across all of them—no data lineage or data quality checks
- No way to maintain all revisions or versions of data
- No way to keep metadata along with the data—leading to even more data silos and versions of the truth
That’s why simply plugging in an isolated metadata management tool or metadata catalog within your data lake may not be the answer to your data woes. Today’s business mandates that data be available for whoever needs it, wherever and whenever they need it—with all the context they need.
Learn more:What is Metadata Management? Benefits, Tools, and Best Practices
Finally, at the metalevel…
You need to implement a metadata management strategy that boosts your team’s productivity and agility and puts data at your fingertips. Because at the end of the day, it’s all about the meta!
Take Atlan for a spin — Atlan is a modern metadata management platform built on the premise of embedded collaboration that is key in today’s modern workplace.
What is metadata: Related reads
- What is metadata management and why is it so important?
- What is the difference between data catalog and metadata management?
- Metadata management 101: Benefits, tools, and best practices
- 6 metadata management best practices to follow in 2022
- Data vs. Metadata: Understand the differences
- Enterprise metadata management and its importance in the modern data stack