Apache Atlas Alternatives — Amundsen, DataHub, Metacat, Databook

March 15th, 2022

header image for Apache Atlas Alternatives — Amundsen, DataHub, Metacat, Databook

What are some alternatives to Apache Atlas?

Apache Atlas is a popular open-source data catalog software. It enjoys an active community of committers from businesses like Hortonworks, Aetna, Merck, IBM, and Target. Contributors to the project who keep developing and expanding it year on year.

Yet, it can be a bit clunky to use and navigate. Here are Apache Atlas alternatives to consider while researching for an open-source data catalog tool that is best suited to your organizational needs.

4 open source Apache Atlas alternatives

  1. Lyft's Amundsen
  2. LinkedIn's DataHub
  3. Netflix's Metacat
  4. Uber's Databook

Amundsen

Built by the Lyft engineering team, Amundsen is a popular open source data discovery platform and metadata engine.

It was introduced to the world in April 2019 and open sourced later that year for adoption outside Lyft. It was primarily built to improve the productivity of data scientists, engineers, and analysts at Lyft.

Amundsen enjoys high adoption at Lyft and has an open-source community spanning 750+ members, and 37+ organizations who are officially using it.

Typical use cases of Amundsen include:

  • Simple text search powering easy data discovery
  • More context on data with automated and curated metadata
  • Ease of sharing context with others
  • Learning more about data usage

Further reading for Amundsen, as an Apache Atlas alternative


DataHub

DataHub is an open-source metadata search and discovery tool that was built at LinkedIn.

DataHub, which was open-sourced in 2020, is actually LinkedIn's second attempt at solving data discovery and cataloging as a problem. Their first attempt was WhereHows in 2016.

DataHub has the following main capabilities:

  • Ease of data discovery via searching and browsing a data asset
  • Understanding data with context
  • Automated metadata ingestion from diverse data sources

Further reading for DataHub, as an Apache Atlas alternative


Metacat

Metacat is an open source federated metadata management platform  that powers data discovery and metadata interoperability at Netflix.

It is used to catalog, discover, process, and manage data. It forms a single access layer for data residing across the diverse mesh of data sources operating at Netflix.

Metacat is primarily known for the following capabilities:

  • Common abstraction layer
  • Provision for user and business defined metadata storage
  • Easy data discovery
  • Notifications related to data changes

Further reading for Metacat, as an Apache Atlas alternative


Databook

Databook is Uber's in-house data catalog tool that was first developed in 2016 when their data had not reached the current scale.

It was later revamped to suit their evolving needs. Uber is known to support more than 400,000 queries a day on its infrastructure. Most of those with zero engineering dependencies. Databook has made that possible.

Its main capabilities include:

  • Discovering data - Databook is the single destination for searching data at Uber
  • Understanding data - Databook provides users with maximum context about the data
  • Managing data - Databook enables crowdsourcing useful information about data and organizing this information.

Further reading for Databook, as an Apache Atlas alternative



Deploying a data catalog software is often the first step in enabling a collaborative and efficient data culture in your organization. But requires answering multiple questions at once.

  • Should we build it? Should we buy it?
  • Will it support all our primary use cases?
  • Will the platform work in case we change our data stack in a couple of years?
  • Will all our users be comfortable using it?
  • How do we make a case for the money that we're asking for?

We understand this can be a bit overwhelming and it always helps to think through your options out loud. Let us help!


It would take six or seven people up to two years to build what Atlan gave us out of the box. We needed a solution on day zero, not in a year or two.

Akash Deep Verma
Akash Deep Verma

Director of Data Engineering

Delhivery: Leading fulfilment platform for digital commerce.

Build vs Buy: Delhivery’s Learnings from Implementing a Data Catalog

Build vs Buy: Delhivery’s Learnings from Implementing a Data Catalog