Apache Atlas Alternatives — Amundsen, DataHub, and Metacat

March 15th, 2022

What are some alternatives to Apache Atlas?

Apache Atlas is a popular open-source data catalog software. It enjoys an active community of committers from businesses like Hortonworks, Aetna, Merck, IBM, and Target. Contributors to the project who keep developing and expanding it year on year.

Yet, it can be a bit clunky to use and navigate. Here are Apache Atlas alternatives to consider while researching for an open-source data catalog tool that is best suited to your organizational needs.

3 open source Apache Atlas alternatives

  1. Lyft’s Amundsen
  2. LinkedIn’s DataHub
  3. Netflix’s Metacat

[Download ebook] → A Guide to Building a Business Case for a Data Catalog



Amundsen

Built by the Lyft engineering team, Amundsen is a popular open source data discovery platform and metadata engine.

It was introduced to the world in April 2019 and open sourced later that year for adoption outside Lyft. It was primarily built to improve the productivity of data scientists, engineers, and analysts at Lyft.

Amundsen enjoys high adoption at Lyft and has an open-source community spanning 750+ members, and 37+ organizations who are officially using it.

Typical use cases of Amundsen include:

  • Simple text search powering easy data discovery
  • More context on data with automated and curated metadata
  • Ease of sharing context with others
  • Learning more about data usage

Further reading for Amundsen, as an Apache Atlas alternative


DataHub

DataHub is an open-source metadata search and discovery tool that was built at LinkedIn.

DataHub, which was open-sourced in 2020, is actually LinkedIn’s second attempt at solving data discovery and cataloging as a problem. Their first attempt was WhereHows in 2016.

DataHub has the following main capabilities:

  • Ease of data discovery via searching and browsing a data asset
  • Understanding data with context
  • Automated metadata ingestion from diverse data sources

Further reading for DataHub, as an Apache Atlas alternative



Data catalogs are going through a paradigm shift. Here’s all you need to know about a 3rd Generation Data Catalog

Download ebook



Metacat

Metacat is an open source federated metadata management platform  that powers data discovery and metadata interoperability at Netflix.

It is used to catalog, discover, process, and manage data. It forms a single access layer for data residing across the diverse mesh of data sources operating at Netflix.

Metacat is primarily known for the following capabilities:

  • Common abstraction layer
  • Provision for user and business defined metadata storage
  • Easy data discovery
  • Notifications related to data changes

Further reading for Metacat, as an Apache Atlas alternative




Deploying a data catalog software is often the first step in enabling a collaborative and efficient data culture in your organization. But requires answering multiple questions at once.

  • Should we build it? Should we buy it?
  • Will it support all our primary use cases?
  • Will the platform work in case we change our data stack in a couple of years?
  • Will all our users be comfortable using it?
  • How do we make a case for the money that we’re asking for?

We understand this can be a bit overwhelming and it always helps to think through your options out loud. Let us help!


It would take six or seven people up to two years to build what Atlan gave us out of the box. We needed a solution on day zero, not in a year or two.

Akash Deep Verma
Akash Deep Verma

Director of Data Engineering

Build vs Buy: Delhivery’s Learnings from Implementing a Data Catalog

[Website env: production]