Federated Data Catalog: Complete Guide for 2024

Updated September 28th, 2024

Share this article

A federated data catalog is a natural extension of the data catalog model. It lets you control access to data using the source data access and control policies.

Federated data catalogs are particularly useful for enterprises dealing with complex data environments.
See How Atlan Simplifies Data Cataloging – Start Product Tour

In this article, we’ll cover the basics of a federated data catalog, when to use a federation and the advantages and disadvantages of centralized versus decentralized (federated) data catalogs.

Table of contents #

What is a federated data catalog?
What is federation?
How federation affects the core functions of a data catalog
How federated authentication (authn) and authorization (authz) streamline data security and governance
When should you use a federated data catalog?
Summary
Related reads

What is a federated data catalog? #

A federated data catalog is a central portal for data discovery and collaboration that also enforces the standards for:

Authentication: Validating a user’s identity
Authorization: Determining what access rights should be granted

It uses a decentralized approach to data access management.

Before we delve into the specifics of a federated data catalog, let’s understand the concept of federation.

What is federation? #

In security, federation refers to synchronizing a digital identity across multiple platforms.

Non-federated systems duplicate access credentials (like usernames and passwords) and access permissions across multiple software systems. This results in fragmentation and inconsistency, opening the door for data leaks.

In a federated identity system, sites and toolsets integrate with one or more identity providers. The identity provider validates the user’s identity and determines the scope of the user’s rights and permissions.

Every system that integrates with a given federated identity provider is part of the same federated domain. As a result, their login credentials are valid across all systems that belong to that domain.

Federation vs. SSO: What’s the difference? #

Federation is similar to single sign-on (SSO) but not synonymous.

In SSO, all sites within a company integrate with a single identity management system (for example, Microsoft’s Active Directory).

Federation takes this a step further by establishing a set of standards so that sites and toolsets can integrate with multiple identity providers. This enables flexibility in organizations that use disparate tools from several vendors.

How federation affects the core functions of a data catalog #

A fully federated data catalog impacts every service a data catalog has to offer, such as:

Search
Lineage
Tagging and classification
Governance

Let’s explore this further.

Search #

A federated data catalog will provide a single interface for you to search across your data ecosystem, regardless of where the data is sourced from or stored. However, the accessibility will depend on the user’s access rights.

For example, a user may be able to see and read certain data, but not edit it once they find it.

Moreover, they may only be able to find certain sensitive data (for example, customer’s Personally Identifiable Information, or PII) if they have sufficient privileges.

Lineage #

Data lineage tracks the flow of your data through your organization as it is created, consumed, and modified over time.

In a federated data catalog, distributed authorization controls who can access the lineage history. It also provides comprehensive tracking of who in the company made specific changes to a dataset.

Tagging and classification #

With a federated authorization model, data stewards can define who in the company has the necessary insights to clean, tag, and classify data properly.

This opens up and democratizes classification while also maintaining governance controls.

Governance #

Federation simplifies data governance by propagating policies defined by data stewards and owners across the organization.

How federated authentication (authn) and authorization (authz) streamline data security and governance #

Let’s explore federated authentication and authorization in a federated data catalog in more detail.

Federated authentication (authn) #

What is federated authentication?

Authentication (often abbreviated to authn) determines that a user is who they say they are.

In federated authentication, a user’s identity and authentication information is stored in a trusted identity provider (IdP). The IdP can be used to authenticate a user with other systems or services that support the same authentication protocol.

Let’s say that a site integrates with a federated authentication provider. Instead of handling authorization itself, the site will redirect the user to login via the authentication provider’s interface. The authentication provider then returns a security token that represents a successful request.

When do you need federated authentication?

Companies rely on federated authentication when they need to integrate different systems and tools from a large number of different vendors.

The federated approach enables a more distributed, decentralized approach to information management, which can speed up time to market and encourage innovation.

How can you implement federated authentication?

Identity providers can use several technologies to implement federated authentication. Some of the most used include Security Assertion Markup Language (SAML), OAuth, or OpenID.

However, instead of implementing these standards themselves, many companies will integrate with a popular federated authentication provider, such as PassportJS or Okta Identity.

This approach leaves authentication in the hands of security experts and frees up your IT resources to focus on the domain-specific challenges relevant to your business.

Federated authorization (authz) #

What is federated authorization?

Authorization (abbreviated to authz) is the natural complement to authn. It determines what rights an authorized user has to what parts of a system.

In non-federated systems, authorization can end up being a nightmare. This is because every system makes its own policy decisions, often in proprietary ways that aren’t portable to other systems or toolsets.

With federated authorization, all tools rely on a common policy language to express authorization decisions around data and resources.

How can you implement federated authorization?

One of the most common policy languages for authorization is Open Policy Agent (OPA).

OPA uses Rego language to process arbitrary structured JSON descriptions of resources. The output is a set of policy decisions represented as arbitrary structured JSON.

OPA is domain agnostic, i.e., services can define their preferred input and output formats for themselves.

More importantly, they can ship the Rego code that recognizes and produces these structures along with their service. This decouples policy decisions from the service code, enabling other services to interoperate easily with a services’ policy framework.

With federated authorization, services no longer need to roll their own authorization to interoperate with other services. All services, including a federated data catalog, can use a single policy language and framework, and consume other services’ policy frameworks as is with little additional coding.

When should you use a federated data catalog? #

Moving to a federated data catalog does involve some lift. It requires services that produce data to onboard to federated authentication and authorization models.

While that work is less intensive — and, ultimately, more time-saving — than every service rolling its own authz and authn, it does take an investment of time and personnel.

So when do you move to a decentralized, federated data catalog instead of using a more centralized model with strict specifications for inbound data?

Investing in a federated data catalog makes the most sense when:

Your business is large and generates data from a large number of different toolsets: Moving everyone in a large organization to a single toolset is often impossible. With a federated data catalog, you need only establish a set of standards and protocols for federated authentication and authorization that every service follows.
Your source tools from several vendors, with each vendor using their own method of access management: In this case, your federation standards become an interoperability requirement that vendors must meet to license their tools within your organization.

Federated vs. centralized data catalog: What’s the difference? #

Depending on your company’s size, culture, and data maturity, you may opt for a federated or centralized data catalog.

Here are some factors to consider when weighing the decision.

Attribute	Federated (decentralized) data catalog	Centralized data catalog
Architecture	Decentralized data mesh architecture; divisions within a company can make their own service/toolset decisions	Centralized data fabric; specification of data formats and API integrations to which all organizations must adhere
Speed of integration of external data sources	Faster and more flexible, as it uses interoperable standards for authentication and authorization	Slower and more rigid, requiring adherence to a centralized set of standards
Data accessibility vs. quality	Emphasizes data accessibility and collaboration	Emphasizes data quality and governance controls

Summary #

The future of data is decentralized and human-oriented. A modern data stack encourages data democratization, community support, and a non-bureaucratic, decentralized approach to data governance. That’s where a federated data catalog can help.

In this article, we explored how a federated data catalog can play an important role in data democratization.

Using federated authentication (authn) and federated authorization (authz), a federated data catalog can support data management more easily from a disparate number of tools and services.

Data Catalog: What It Is & How It Drives Business Value
What Is a Metadata Catalog? - Basics & Use Cases
Modern Data Catalog: What They Are, How They’ve Changed, Where They’re Going
Open Source Data Catalog - List of 6 Popular Tools to Consider in 2024
5 Main Benefits of Data Catalog & Why Do You Need It?
Enterprise Data Catalogs: Attributes, Capabilities, Use Cases & Business Value
The Top 11 Data Catalog Use Cases with Examples
15 Essential Features of Data Catalogs To Look For in 2024
Data Catalog vs. Data Warehouse: Differences, and How They Work Together?
Snowflake Data Catalog: Importance, Benefits, Native Capabilities & Evaluation Guide
Data Catalog vs. Data Lineage: Differences, Use Cases, and Evolution of Available Solutions
Data Catalogs in 2024: Features, Business Value, Use Cases
AI Data Catalog: Exploring the Possibilities That Artificial Intelligence Brings to Your Metadata Applications & Data Interactions
Amundsen Data Catalog: Understanding Architecture, Features, Ways to Install & More
Machine Learning Data Catalog: Evolution, Benefits, Business Impacts and Use Cases in 2024
7 Data Catalog Capabilities That Can Unlock Business Value for Modern Enterprises
Data Catalog Architecture: Insights into Key Components, Integrations, and Open Source Examples
Data Catalog Market: Current State and Top Trends in 2024
Build vs. Buy Data Catalog: What Should Factor Into Your Decision Making?
How to Set Up a Data Catalog for Snowflake? (2024 Guide)
Data Catalog Pricing: Understanding What You’re Paying For
Data Catalog Comparison: 6 Fundamental Factors to Consider
Alation Data Catalog: Is it Right for Your Modern Business Needs?
Collibra Data Catalog: Is It a Viable Option for Businesses Navigating the Evolving Data Landscape?
Informatica Data Catalog Pricing: Estimate the Total Cost of Ownership
Informatica Data Catalog Alternatives? 6 Reasons Why Top Data Teams Prefer Atlan
Data Catalog Implementation Plan: 10 Steps to Follow, Common Roadblocks & Solutions
Data Catalog Demo 101: What to Expect, Questions to Ask, and More
Data Mesh Catalog: Manage Federated Domains, Curate Data Products, and Unlock Your Data Mesh
Best Data Catalog: How to Find a Tool That Grows With Your Business
How to Build a Data Catalog: An 8-Step Guide to Get You Started
The Forrester Wave™: Enterprise Data Catalogs, Q3 2024 | Available Now
How to Pick the Best Enterprise Data Catalog? Experts Recommend These 11 Key Criteria for Your Evaluation Checklist
Collibra Pricing: Will It Deliver a Return on Investment?
Data Lineage Tools: Critical Features, Use Cases & Innovations
OpenMetadata vs. DataHub: Compare Architecture, Capabilities, Integrations & More
Automated Data Catalog: What Is It and How Does It Simplify Metadata Management, Data Lineage, Governance, and More
Data Mesh Setup and Implementation - An Ultimate Guide
What is Active Metadata? Your 101 Guide