What are some alternatives to Apache Atlas?
Apache Atlas is a popular open source data catalog software. It enjoys an active community of committers from organizations like Hortonworks, Aetna, Merck, IBM, Target etc. Contributors to the project who keep developing and expanding it year on year.
Yet, it can be a bit clunky to use and navigate. Here are 4 Apache Atlas alternatives to consider while researching for an open source data catalog tool that is best suited to your organizational needs.
4 open source Apache Atlas alternatives
Built by the Lyft engineering team, Amundsen is a popular open source data discovery platform and metadata engine.
It was introduced to the world in April 2019 and open sourced later that year for adoption outside Lyft. It was primarily built to improve the productivity of data scientists, engineers, and analysts at Lyft.
Amundsen enjoys high adoption at Lyft and has an open source community spanning 750+ members, and 37+ organizations who are officially using it.
Typical use cases of Amundsen include:
- Simple text search powering easy data discovery
- More context on data with automated and curated metadata
- Ease of sharing context with others
- Learning more about data usage
Check Amundsen, as an Apache Atlas alternative
DataHub is an open source metadata search and discovery tool that was built at LinkedIn.
DataHub, which was open sourced in 2020, is actually LinkedIn's second attempt at solving data discovery and cataloging as a problem. Their first attempt was WhereHows in 2016.
DataHub has the following main capabilities:
- Ease of data discovery via searching and browsing a data asset
- Understanding data with context
- Automated metadata ingestion from diverse data sources
Check DataHub, as an Apache Atlas alternative
Metacat is an open source federated metadata management platform that powers data discovery and metadata interoperability at Netflix.
It is used to catalog, discover, process and manage data. It forms a single access layer for data residing across the diverse mesh of data sources operating at Netflix.
Metacat is primarily known for the following capabilities:
- Common abstraction layer
- Provision for user and business defined metadata storage
- Easy data discovery
- Notifications related to data changes
Check Metacat, as an Apache Atlas alternative
Databook is Uber's open source data catalog tool that was first developed in 2016 when their data had not reached the current scale. It was later revamped to suit their evolving needs.
Uber is known to support more than 400,000 queries a day on its infrastructure. Most of those with zero engineering dependencies. Databook has made that possible.
Its main capabilities include:
- Easy to add new metadata, storage, and entities
- Services can access metadata programmatically
- It can support an enormous volume of queries
- Cross-data centre read and write
Check Databook, as an Apache Atlas alternative
Deploying a data catalog software that enables teams to discover, understand and use data is often the first step in enabling a collaborative and efficient data culture in your organization. But reaching their means answering multiple questions at once.
Should we build it? Should we buy it? Will it support all our primary use cases? Will the platform work in case we change our data stack in a couple of years? Will all our users be comfortable using it? How do we make a case for the money that we're asking for?
We understand this can be a bit overwhelming and it always helps to think through your options out loud. Let us help!