Working With Apache Iceberg on Databricks: A Complete Guide [2025]

Updated March 07th, 2025

Share this article

Databricks’ support for Apache Iceberg allows enterprises to build flexible and scalable data lakehouses, leveraging Iceberg’s open table format for managing large-scale analytical workloads.
See How Atlan Simplifies Data Governance ✨ – Start Product Tour

This article introduces you to the scope of open table formats in Databricks. You’ll see how Apache Iceberg integrates with Databricks, especially with the Unity Catalog and UniForm format conversion feature of Delta Lake. Toward the end, we’ll explore the need for a broader and horizontal metadata control plane across your data ecosystem.


Table of Contents #

  1. Databricks and Apache Iceberg: An overview
  2. Uniform: The key to reading Delta Lake tables with Apache Iceberg clients
  3. Databricks-Iceberg interoperability: Choosing the right catalog
  4. Working with Databricks and Apache Iceberg: The need for a metadata control plane to extract full value
  5. Databricks and Apache Iceberg: Summing up
  6. Working with Apache Iceberg on Databricks: Related reads

Databricks and Apache Iceberg: An overview #

Databricks has been at the forefront of open-source and open standard development for several years. The platform is built on many open-source projects, primarily Apache Spark.

Databricks’s open-source philosophy has led to innovation and support for several projects, including the links to MLFlow, Apache Spark, Redash, TensorFlow, PyTorch, and Keras.

While the projects above relate to data processing, machine learning, and business intelligence, Databricks has also focused on creating and supporting open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake.

Together with Databricks, projects like Apache Iceberg can bring the features of traditional relational databases and data warehouses to the data lake, essentially creating a data lakehouse.

Before going any further, let’s quickly explore the background on how Databricks came to support table formats like Apache Iceberg.

The table format landscape in Databricks #


Delta Lake was born when Databricks formalized and popularized the idea of a transactional data lake, i.e., a data lakehouse, in 2017. By then, organizations that had moved from the Hadoop-Hive ecosystem were facing the next set of problems – optimizations around metadata management, directory listings, and small files.

Delta Lake was a native solution for these problems. Databricks later donated this project to Linux Foundation’s Data & AI arm.

Around the same time, two other companies, Uber and Netflix, were working on bringing transactional guarantees to their near real-time and batch-based data processing use cases, which is what led to the creation of Apache Hudi and Apache Iceberg.

Databricks added support for Apache Hudi and Apache Iceberg with limited functionality in mid-2023.

Because these open table formats work at the fundamental metadata collection and management layer, Databricks decided to integrate them with its native Unity Catalog. They created an open table format conversion tool called UniForm that allows interoperability between formats like Delta Lake, Apache Hudi, and Apache Iceberg.

Next, let’s look at UniForm – the key to Databricks-Iceberg interoperability.


Uniform: The key to reading Delta Lake tables with Apache Iceberg clients #

With many table formats, including Apache Paimon (which Databricks currently doesn’t support) entering the toolchest of data engineers, Databricks saw the need for interoperability between these tools. This led to the development of UniForm (Universal Format), allowing Apache Iceberg clients to read data from Delta Lake tables without requiring format conversions or data copying.

This novel approach didn’t come without limitations. Currently, UniForm only supports metadata conversion for Parquet files using table formats – Iceberg and Hudi.

It is important to appreciate the need for this interoperability feature as it gives you the option to decouple your data processing layer completely and use the query engine of your choice to process data based on your use case – real-time streaming, batch-based, read-heavy, write-heavy, and so on.

UniForm for Databricks-Iceberg interoperability

UniForm for Databricks-Iceberg interoperability - Source: Databricks.


Databricks-Iceberg interoperability: Choosing the right catalog #

Another aspect of interoperability concerns the choice of Iceberg catalog when working with Databricks. Two officially supported options are AWS Glue Data Catalog and Unity Catalog. There are many others, and Iceberg also allows you to write your catalogs using the JDBC and REST catalog options.

But first, let’s look at how the AWS Glue Data Catalog works as an Iceberg Catalog in Databricks.

AWS Glue Data Catalog as the REST Catalog for Apache Iceberg in Databricks #


Many organizations that use Databricks on AWS also have other data infrastructure either on AWS native services or on other cloud platforms. Apache Iceberg gives organizations the option to opt for a singular table format. However, Iceberg still needs a backend catalog to manage metadata for all the Iceberg assets.

While Hive remains the default option for the Iceberg catalog, the AWS Glue Data Catalog is also a key option. The Databricks AWS Glue connector can help in setting up AWS Glue Data Catalog as the REST catalog for Iceberg. This is extremely useful when your organization’s data is stored in the Parquet + Iceberg combination in Amazon S3, as it gives you the flexibility to use a variety of query engines – Spark, Presto, and Trino.

Unity Catalog as the REST Catalog for Iceberg reads #


Like AWS Glue Data Catalog, Unity Catalog can also be plugged into Iceberg as the REST catalog. This, for a native Databricks setup, will be more seamless and beneficial. That said, Unity Catalog integrates with other data platforms like AWS Redshift, Google BigQuery, and Snowflake.

Unity Catalog has implemented the Iceberg REST Catalog APIs since the launch of Universal Format (UniForm) in 2023. [Its] Iceberg REST Catalog endpoints allow external systems to access tables… and extend governance via vended credentials.” - Databricks

Using the Unity Catalog will also potentially reduce the latency in catalog lookups because it won’t depend on AWS Glue’s capacity and bandwidth to respond to requests.

Unity Catalog for Iceberg reads

Unity Catalog for Iceberg reads - Source: Databricks.


Working with Databricks and Apache Iceberg: The need for a metadata control plane to extract full value #

Bringing together a variety of file formats, table formats, catalogs, query engines, etc. doesn’t offer many benefits unless handled sustainably and consistently for everyone in your organization, and not just the engineers.

This is where the need for a metadata control plane arises. A metadata control plane comprises aspects of cataloging, governance, business glossary, lineage, and more.

A unified metadata control plane for your data stack

A unified metadata control plane for your data stack - Source: Atlan.

A control plane for metadata sits horizontally across your organization’s data ecosystem. It integrates with all Databricks, non-Databricks, cloud-based, and on-premises tools to bring all data assets in one place, not just to be cataloged, but also to be governed, profiled, analyzed, and thoroughly used.

This is exactly what Atlan does. It takes an organization’s data discovery, cataloging, lineage, collaboration, governance, and documentation needs and brings them all under a single roof, acting as the metadata control plane.

Read more → What is a unified control plane for data?


Databricks and Apache Iceberg: Summing up #

Open table formats have not been around for long, but they have had significant implications for storing, processing, and consuming data. This makes the conversation about Apache Iceberg and its integration with various catalogs and query engines very important.

For the Databricks ecosystem, the Apache Iceberg-Databricks interoperability becomes seamless with Unity Catalog and UniForm. However, to maximize the value of this integration, you need a metadata control plane that unifies discovery, governance, and collaboration across their data ecosystem.

Despite the availability of a variety of technical catalogs, such as AWS Glue or Unity Catalog, there is still a problem of a consistent user experience for both technical and business users. A control plane for metadata, agnostic of the data tools and technologies you use, gives you access to the full landscape of data tools.

Read on to know more about Atlan’s integration with Databricks and Apache Iceberg.



Share this article

[Website env: production]