DataOps brings together people, processes, and technology to enable agile, automated, and secure management of data. A data catalog has emerged as a critical starting point in the development of a functional DataOps ecosystem - as pointed out by Andy Parmer, the person who popularized the concept of DataOps.
Traditionally, data catalogs were just inventory of data, equipped to solve for intelligence and consumption use cases. A data catalog for DataOps has to be much more than that. It’s an open and interoperable platform that empowers data teams to optimize data flow and performance, automate governed data access, and maintain data and analytics products, among other things.
As Forrester explains in its latest report on Enterprise Data Catalogs for DataOps:
Conceptual data understanding is viable for consumption use cases like analytics, governance, and business processes that data governance catalogs provide. But data engineers need detailed telemetry and logical data of the data environment to make the right trade-offs to architect and build data-driven applications and address data flow and performance, which is why they need enterprise data catalogs for the DataOps environment to understand and activate data.
[Download] → Forrester Wave™: Enterprise Data Catalog for DataOps, Q2 2022
7 Essential Capabilities of Data Catalogs for DataOps
- Easy discovery of data assets
- Automated data lineage
- Governance as part of daily workflows
- Embedded collaboration
- Intuitive UI/UX
- Agile CI/CD
- Extensible and customizable
Easy discovery of data assets
Businesses are continuously generating an inconceivable amount of data. More often than not, the data is locked in silos and belongs to disparate distributed ecosystems.
A data catalog serves as a centralized repository of data from diverse data sources, including notes on a data set’s structure, quality, definitions, and usage. It serves as a single access layer for data producers and consumers to query for available data.
One of the ultimate goals of DataOps is to deliver high-quality data to consumers efficiently. Secure access and visibility of available data via a data catalog is the starting point for that.
Automated data lineage
With a focus on data lineage and processing tracking in place across the data ecosystem, reproducibility goes up, and confidence in data increases.
- Key Principles of a DataOps Ecosystem, Andy Parmar
Data catalogs for DataOps automate the tracing of data lineage across various sources, pipelines, transformations, and dashboards — both upstream and downstream.
Automating data lineage ensures everyone and not just technical users have an equal understanding of the origins of data and how it has evolved through its life cycle. This is a step toward building transparency and traceability of data flows for all stakeholders -- a critical driver in a DataOps environment.
Governance as part of daily workflows
Another key component of the DataOps ecosystem is automated data governance. Data catalogs for DataOps focus on the automation of data access policies and ensure that governance - by design is a consideration in every user interaction with data rather than being an afterthought.
Some ways in which data catalogs establish governance across a DataOps ecosystem:
- Having a codified data access policy in line with organizational, sectoral, national, and international regulations. E.g. HIPAA, GDPR
- Automatically grant or restrict access to databases, schemas, or even tag-based groups of data assets based on users, groups, and teams
- Auto-propagation of governance policies and access controls throughout the lifecycle of the data asset via lineage
The only constant reality of data teams is diversity. Engineers, scientists, analysts, product people, marketers, and executives – all need to find, understand, trust and use data. All of these personas have their preferences when it comes to tooling and workflows.
Embedded collaboration is about work happening where you are, with the least amount of friction. The best data catalogs for DataOps are inclusive to all data practitioners and seamlessly integrate into their workflows.
They help foster cross-functional collaboration, a key driver of DataOps, through social features like in-app chats, comments, annotations, quality ratings, integration with project management apps, and enable easy sharing of data assets.
Since data catalogs are a unified source of data interaction, the user experience must cater to a wide range of data users from data engineers to business analysts.
DataOps borrows from Agile principles, and one of them dictates:
Continuous attention to technical excellence and good design enhances agility.
Data catalogs for DataOps are designed with easy-to-use and intuitive user interfaces that reduce complexity and redundancy in the workflows of data producers and consumers.
Data catalogs for DataOps are built on open API architecture and the intuitive user interface layer creates a self-serve ecosystem for anyone to search, integrate, optimize, and deploy data from various sources using a variety of data management tools. This helps DataOps teams prototype, deploy, and test new applications faster.
Extensible and customizable
The best data catalogs for DataOps environments are open by default.
This enables data engineers to scale and optimize the existing data management setup faster and easier. The capability of integrating with all parts of the modern data stack, makes a data catalog go from passively storing metadata, and being just another tool, to truly becoming a platform that data engineers can innovate on to enable automation, interoperability, and more.
In its report, Enterprise Data Catalogs For DataOps, Q2 2022 - Forrester mentions that enterprise data catalog customers should look for solution providers that:
- Address the diversity, granularity, and dynamic nature of data and metadata
- Generate deep transparency of the nature and path of data flow and delivery
- Deliver a UI/UX that reinforces modern DataOps and engineering best practices
Checking off the capabilities mentioned above will ensure that the data catalog solution that you are evaluating for your DataOps environment is aligned with these expectations.
Data catalog for DataOps: Related reads
- DataOps explained from scratch: Principles, emergence, and importance
- What is a data catalog: Use cases, features, and tools
- How to build a business case for DataOps?
- DataOps vs. DevOps: Differences and similarities