Egeria: A Comprehensive Guide on This Open-Source Data Governance Project

Updated August 18th, 2023
Egeria Data Governance

Share this article

Egeria, an open-source project, is dedicated to making metadata and governance more accessible and effective.

This guide delves into Egeria’s inception, features, architecture, and benefits within the data management landscape and provides an overview of Egeria’s role in the context of metadata management and governance.

Is Open Source really free? Estimate the cost of deploying an open-source data catalog 👉 Download Free Calculator

Table of contents #

  1. What is Egeria?
  2. Egeria’s architecture
  3. Egeria features for data governance
  4. Getting started with Egeria
  5. Egeria alternatives for data governance
  6. Final thoughts
  7. Related reads

What is Egeria? #

Egeria is an open-source project that enables the easy exchange of metadata between tools and platforms in a vendor-agnostic manner.

It can connect and manage data from various sources, such as data warehouses, databases, files (CSV, JSON, etc.), data catalogs, and data science tools.

Note: Egeria defines the open metadata standard schema for over 800 types of metadata.

Egeria: IBM’s universal travel adapter for metadata

Egeria: IBM’s universal travel adapter for metadata - Source: Twitter.

Egeria offers various components to help you capture, store, and query metadata, as well as define and implement data governance policies.

It is designed to serve various metadata repositories and is compatible with various data tools that support data discovery, quality, lineage, and more.

Egeria’s integration daemon manages metadata exchange across various data sources

Egeria’s integration daemon manages metadata exchange across various data sources - Source: Egeria.

Data cataloging, lineage, and governance with Egeria #

One of Egeria’s key features is data cataloging, which supports data governance by enabling structured classification and alignment with compliance requirements.

For example, Egeria can be used to classify data assets according to their sensitivity levels, such as personally identifiable information (PII) or protected health information (PHI).

The hierarchy of asset types in Egeria for data cataloging

The hierarchy of asset types in Egeria for data cataloging - Source: Egeria.

Egeria’s governance capabilities include lineage mapping. It tracks the flow and transformation of data, detailing its origin, changes, and consumption points. Lineage views can be horizontal (i.e., visualize data flow from origin to destination) or vertical (i.e., see how business concepts such as glossaries and terms are mapped to data assets).

By offering a transparent view of a data asset’s journey, lineage mapping helps with regulatory compliance, data quality assurance, data process monitoring, and logging traceability of data.

The lineage mappings linking the graph together

The lineage mappings linking the graph together - Source: Egeria.

The origins of Egeria #

Egeria is an Open Data Platform Initiative (ODPi) project. ODPi is a nonprofit organization committed to simplification and standardization within the big data ecosystem. Currently, the ODPi foundation is part of the Linux Foundation.

Here’s how John Mertic (Director of Program Management, ODPi) highlights the benefits of using Egeria for metadata management and governance:

“By adopting ODPi Egeria standards and implementation as the core of your metadata management and governance program, an organization is able to future-proof their investments and be able to adopt the best-of-breed tools for their business.”

An overview of Egeria’s architecture #

Egeria’s architecture consists of the following key components:

  1. Open Metadata and Governance (OMAG)
  2. Open Metadata Repository Services (OMRS)
  3. Open Metadata Access Services (OMAS)
  4. Governance Action Framework (GAF)

Let’s delve into the specifics of each component and the role they play in data and metadata governance.

1. Open Metadata and Governance (OMAG) #

OMAG defines a set of services for managing metadata and governance information.

The OMAG Server Platform provides a runtime process for all of Egeria’s services.

The OMAG Server Platform can support multi-tenant cloud services and host several OMAG Servers.

The various ways of setting up an OMAG Server Platform

The various ways of setting up an OMAG Server Platform - Source: Egeria.

How OMAG supports data governance

The OMAG Server Platform has an open metadata and governance services layer that integrates tools, platforms, other open metadata repositories, etc.

This helps in setting up a common layer for all metadata from various data sources and documenting data flow across your data estate.

2. Open Metadata Repository Services (OMRS) #

OMRS provides the underlying services for managing metadata repositories that support the OMAG Server Platform.

It connects with and syncs metadata repositories using APIs or events, facilitating metadata sharing.

OMRS supports connectors for audit logs, registry stores, event topics, metadata repositories, event mappers, and more.

Note: A Connector is a Java class that supports the standard Open Connector Framework (OCF) APIs.

Open Metadata Repository Services configuration in Egeria

Open Metadata Repository Services configuration in Egeria - Source: Egeria.

How OMRS supports data governance

OMRS can support data governance by providing a centralized repository for metadata and being interoperable with various metadata repositories.

This metadata can be used to track data flow, identify data quality issues, and enforce data security policies.

3. Open Metadata Access Services (OMAS) #

OMAS offer several interfaces for accessing metadata and governance information from the OMRS. OMAS can be of various types, such as Asset Catalog OMAS, Data Privacy OMAS, Glossary VIew OMAS, and more.

OMAS structure

OMAS structure - Source: Egeria.

For example, the Governance Engine OMAS includes APIs and events that fetch and manage metadata for governance engines — a collection of governance services that provide pluggable governance functions. These can include provisioning resources, triaging, verifying asset properties, etc.

The governance engine in Egeria

The governance engine in Egeria - Source: Egeria.

How OMAS supports data governance

The Governance Program OMAS offers an interface to set up subject area definitions — glossary terms, reference data definitions, data quality rules.

It also supports measuring the effectiveness of governance programs via the GovernanceMetricsManager client that supports the GovernanceMetricsInterface.

Other properties include defining data asset classification levels, certification types, security tags, etc.

4. Governance Action Framework (GAF) #

Egeria has several frameworks defining the interfaces for pluggable components. The purpose is to detect and report issues that risk data security, integrity, or privacy and to enhance metadata quality.

The GAF acts as the foundation for governance action services — types of connectors that monitor metadata changes, triage issues, validate metadata, and perform assessments and remediation if requested.

Governance action services

Governance action services - Source: Egeria.

Egeria features for data governance #

Egeria has been designed to cater to data governance needs. So, its features are directed toward ensuring compliance, enhancing trust in data, maintaining data integrity, and more. These include:

  • Automated metadata capture for better data consistency
  • Schema registry for consistency and better data quality
  • Data lineage visualization to track data flow
  • Data quality monitoring
  • RBAC for better access control
  • Encryption and data masking for data security
  • Data classification and tagging for better context
  • Policy propagation for PII data

Getting started with Egeria #

Here are six steps to getting with Egeria:

  1. Install Egeria
  2. Configure Egeria
  3. Capture metadata from sources
  4. Track data pipelines
  5. Monitor data quality
  6. Enable data security and compliance

Alternative installation: Kubernetes #

You can also use Kubernetes to run Egeria via the published builds. This method doesn’t require building Egeria on your machine.

Building with Gradle or Maven #

Egeria supports building with Gradle or Maven. Gradle is the primary tool, while Maven is being phased out.

Egeria alternatives for data governance #

Egeria is not the only open-source option for metadata management and governance. There are other tools that offer similar or complementary functionalities, such as Amundsen, DataHub, Apache Atlas, Magda, and OpenMetadata.

Read More → 7 open-source data governance tools to consider

Final thoughts #

In this guide, we explored Egeria and how it manages metadata and provides governance frameworks for data across diverse sources. Its key features include data cataloging, lineage mapping, and automated metadata capture.

When it comes to data governance, Egeria enhances data quality, builds trust, and ensures compliance.

While Egeria offers an open-source solution, it may have some technical shortcomings related to customization complexity, integration challenges, and scalability issues. As an alternative, an off-the-shelf active data governance solution like Atlan provides ready-to-use features, enhanced integration capabilities, scalable architecture, and dedicated support.

Share this article

[Website env: production]