Alation vs. Collibra vs. OpenMetadata vs. Atlan: Choose the Best Solution for Your Business

Updated May 31st, 2024

Share this article

While much progress has been made in data integration, transformation, and visualization, areas like governance, security, and data discovery are still actively evolving.

Organizations are now recognizing the value of adding data catalogs, search and discovery engines to their stacks. When considering solutions, they typically face a choice: building on top of open-source tools or deploying a fully managed enterprise solution.

Although there were early players in the data catalog space like Alation and Collibra, open-source tools like OpenMetadata and open-by-default modern tools like Atlan have offered a fresh and promising perspective.

When evaluating a data catalog for your business, you can assess tools from open-source and enterprise domains and choose the one that’s a closer fit for your business objectives and use cases.

Working with a variety of organizations and technology partners, we’ve seen a recurring list of most critical factors that go into RFPs or evaluation check-lists of data leaders.

  • Architecture
  • Data cataloging, search, and discovery
  • Data lineage
  • Data quality
  • Pricing and support

Considering the above mentioned themes, this article will compare Alation, Collibra, OpenMetadata, and Atlan. Let’s dive right in!

Jump to this summary if you have less time on your hands


nasdaq-quote-for-atlan

Looking for a data catalog with an ROI you can present to your CDO? Atlan is designed for adoption and embedded with automation. It helps you save time, cut cloud costs, and make faster, better decisions that lead to revenue.


Architecture #

OpenMetadata’s architecture consists of a backend database that stores all the metadata in relational and graph data models, allows data ingestion using an extensible Python-based source-agnostic metadata ingestion framework, and powers the full-text search on the front end using Elasticsearch. OpenMetadata supports both pull and push-based metadata consumption but only supports pull-based ingestion. The architecture is quite flexible and modular, which means replacing some of the components with a drop-in alternative is possible. One example of this is that you can also use PostgreSQL in place of MySQL in OpenMetadata.

Alation’s infrastructure is divided into two layers: a set of core products like the Alation Data Catalog, Alation Data Governance, etc., and a set of platform services like Intelligent Search, Active Metadata Graph, etc. These two layers are connected to data sources via API, OCF, and ODQF connectors. Aside from this, Alation’s architecture is pretty much a black box. It’s also lagging behind in terms of cloud readiness, as only less than half of Alation customers are running it in the cloud.

Collibra has a complicated architecture that is a result of a mesh of different services that all of its different products bring to the table. Understanding, implementing, and maintaining this architecture becomes a pain for people implementing it in an organization. This is one of the main reasons why it can take up to a year to set Collibra up. Because of this rigidity from legacy design, it is hard to justify the cost and administrative burn that a Collibra implementation will bring.

Atlan’s support for all three major cloud platforms is well-documented on its architecture page, where it mentions all the components that power the various aspects of Atlan’s active metadata platform. Similar to OpenMetadata, Atlan runs on open-source technologies like PostgreSQL, Kafka, Zookeeper, Argo, Kibana, Kong, and Cassandra, among others. Atlan parts ways from OpenMetadata and other open-source platforms with its enterprise-grade support for cloud platforms, privacy, and security controls, as well as superior product capabilities.


Data cataloging, search, and discovery #

OpenMetadata simplifies the search, cataloging, and discovery user experience by using various methodologies, including keyword search, data associations, tags, and more. This combined with a holistic view of data evolution using lineage and metadata versioning, is available for data consumers across the board. The full-text search is augmented by a structured filter-based search that includes filters on owners, tags, tiers, services, databases, schemas, columns, usage, associations, and relationships, among other things.

Alation’s Universal Search feature allows you to search all your assets from one place. It also allows you to perform multi-layered searches that involve data asset types, data quality scores, and more. Having said that, it’s worth mentioning that some users have found the search filters not intuitive and useful.

Collibra allows tag-based filters on the data catalog search interface. It supports full-text search with the ability to perform wildcard searches, too. Business users have complained the search process, at times, involves a lot of manual work and therefore found the search experience to be a bit clunky.

Atlan provides an Amazon-like search and filtering experience that isn’t limited to tabular data assets but includes all kinds of other database objects like columns, saved queries, reports, dashboards, and more. Atlan’s search covers all data sources and all the assets they house. Atlan also allows you to link business glossary terms to your assets, which can be later used to search for them. It offers a variety of search filters to make a business user’s life easy. Finally, you can sort your search results by relevance, name, and popularity, among other things. All of these features make Atlan’s search holistic and effective.


Data lineage #

Data lineage lets users look at the data flow from source to consumption. It also serves as an interface to discover data assets by looking at relationships between various data assets. Lineage is most useful when it is the most granular, i.e., at the column level.

OpenMetadata supports automatic data lineage ingestion from dbt, BigQuery, Snowflake, Redshift, etc. It also supports column-level lineage. However, OpenMetadata requires work to configure and maintain the lineage integration.

Alation’s column-level lineage comes with conditions. Its native lineage isn’t effective or reliable, so it partners with a third-party lineage provider, Manta, a data lineage tool (acquired by IBM), to provide a better alternative. Even users have found the lineage UI “not ready for primetime,” stating complexity and visual difficulty in tracking down specific lineage paths.

Collibra uses the Lineage Harvester to extract and build lineage. The performance and accuracy of the lineage harvester have been questioned by many users. It is also worth noting that the data lineage feature is not supported for Self-Hosted version of Collibra. Collibra scores a 7.9 on G2’s data catalog software ratings.

Atlan’s lineage checks all the boxes, offering the finest granularity of all tools while requiring no extra effort to set up. It’s lineage API is architected to be open, making it easy for organizations to use and create custom connectors based on their requirements. Atlan scores the highest on G2’s data lineage score with 9.1. Atlan’s data lineage feature, support, and troubleshooting documentation are public and easily accessible to all.


Data quality #

OpenMetadata offers native data quality testing and alerting features, along with a health dashboard you can use to track test results in real time. One of the recent features also allows you to create workflows around test resolutions to notify relevant data consumers. If you require more advanced features, OpenMetadata also allows you to use third-party data quality and observability tools to get quality and profiling metadata.

Alation’s Open Data Quality Framework allows it to integrate with a range of data quality and observability tools like Soda, Monte Carlo, LightUp, and others to surface data quality metadata in data catalogs.

Collibra’s focus has been pushing down data quality and observability to the cloud platforms, databases, and file systems so that the checks can be run where the data resides and no data transfer to the DQ engine is required. It’s been built with platforms like Databricks, Snowflake, and BigQuery in mind. Collibra offers AI-based rule-creation based on the AdaptiveRules engine. It also allows you to create custom rules. Some users have found that the Collibra DQ tool needs maturity.

Atlan supports native integrations with industry-leading data quality platforms like Monte Carlo and Soda. It also enables you to automate data profiling for various data sources. You also have the flexibility to use Atlan’s REST API to send and query data quality metadata. With more features like automated data profiling and native support for data contracts for reliability, Atlan covers a wide range of data quality and observability requirements for businesses.


Pricing and support #

OpenMetadata doesn’t seem pricey from a distance. It is well-maintained and under active development, and also has an active community on various channels like Slack, YouTube, GitHub, and Medium. However, like all other open-source data catalogs, OpenMetadata requires significant engineering effort and can cause significant delays in time-to-value. Delhivery took seven months with 2 FTEs to build an open-source catalog on top of Amundsen, and it failed. With open-source tools, you’re on your own. Also, there is no dedicated customer success team to help drive value for organizations.

Both Alation and Collibra suffer from poor ROI as they typically fail in end user adoption. Despite being in the business of data catalogs for a long time, both Alation and Collibra have a G2 rating of 8.7 and 8.2 for quality of support respectively. This is because they suffer from low-paced product innovation, difficulty traversing documentation, and complicated procedures to get product support involved in troubleshooting.

Atlan embraces a partner not vendor approach, putting customer value and support at foremost. Atlan has an experienced customer success team, on-demand knowledge resources and data driven success metrics to make customers successful. Atlan’s documentation is easy to navigate, and support is easily accessible when you’re stuck with an issue. This is why Atlan has a 9.5 on G2’s quality of support rating.


Alation vs Collibra vs OpenMetadata vs Atlan: At a glance #

Alation

Collibra

OpenMetadata

Atlan

Search, discovery, and cataloging

Universal Search; multi-filter search interface, but not intuitive for some users

Lots of search and discovery features, but require a lot of manual work at times

Simplified search interface with keyword search, filters, tags, etc. - overall intuitive user experience

Intuitive, Amazon-like search and cataloging experience with support for extensive filters

Data lineage

Depends on third-party, closed-source vendor for data lineage

Uses an internal Lineage Harvester, whose performance and accuracy have been doubted by users

Integrates with modern data stack tools for column-level lineage ingestion, but does need a bit of effort to set up

Built-in column-level lineage that doesn't require any extra effort to set up; highest score on G2's data lineage

Data quality

Integrates with third-party data quality tools to gather quality and profiling metadata

Pushdown data quality checks a positive but needs overall product maturity

Built-in data quality testing features with alerting and notification capabilities

Partner-led native integrations with data quality and observability tools like Monte Carlo and Sodal; flexible API, support for data contracts

Pricing and support

Slow product innovation and support, along with a complicated pricing structure

Can take up to a year to set this up, and the support is often not great

Significant development time and effort required for implementation and maintenance

Leading product innovation in the data cataloging space; easily accessible support and user-friendly documentation


Conclusion #

It is crucial to evaluate both open-source and enterprise tools based on a range of factors that determine how suitable and valuable they would be for your organization. Understanding the architecture and aspects of data cataloging, search, discovery, lineage, and quality are some of the key things you should look out for. You should also consider the true cost of implementing a data catalog, including licensing, implementation, and maintenance costs. Lastly, the quality of support matters greatly for efficiently doing everything mentioned above.



Share this article

[Website env: production]