Working With Apache Iceberg on Databricks: A Complete Guide [2026]

Emily Winks

Data Governance Expert

Updated:03/08/2025

Published:03/07/2025

7 min read

Get 90-Day DG Roadmap See Context Layer in Action

Key takeaways

Databricks Unity Catalog is the recommended governance layer for Apache Iceberg tables on Databricks.
Delta Lake and Iceberg coexist in Databricks via UniForm, which writes Delta and Iceberg metadata simultaneously.
Databricks supports Iceberg REST Catalog interoperability, enabling query access from Spark, Trino, and Snowflake.

Quick Answer: How does Apache Iceberg work on Databricks?

Databricks supports Apache Iceberg through direct Spark integration and Unity Catalog as the Iceberg catalog. You can read and write Iceberg tables, query them with Delta Lake tables side-by-side, and use Delta Lake UniForm to convert Delta tables to Iceberg format for cross-engine compatibility. A broader metadata control plane enables governance across your entire data ecosystem beyond Databricks.

Key Databricks integrations:

Unity Catalog as the Iceberg catalog for managing table metadata
Spark engine support for reading and writing Iceberg tables natively
Delta Lake UniForm for converting Delta tables to Iceberg format
Cross-format queries to join Iceberg and Delta tables in the same workflow

Is your data stack AI-ready?

Assess Context Maturity

Iceberg makes the context layer portable — your context lives in your cloud, queryable from any engine, agnostic to which agent runtime wins next. On Databricks, that portability lets the same governed tables serve every engine and future agent without lock-in. Databricks’ support for Apache Iceberg allows enterprises to build flexible and scalable data lakehouses, using Iceberg’s open table format to manage large-scale analytical workloads.

Watch Context Studio Demo

This article introduces you to the scope of open table formats in Databricks. You’ll see how Apache Iceberg integrates with Databricks, especially with the Unity Catalog and UniForm format conversion feature of Delta Lake. Toward the end, we’ll explore the need for a broader and horizontal metadata control plane across your data ecosystem.

Databricks’ Iceberg support is one deployment option in a multi-engine world — Atlan’s Apache Iceberg guide covers the full open source story: architecture, cross-cloud integrations, format comparisons, and governance.

Databricks and Apache Iceberg: An overview

Databricks has been at the forefront of open-source and open standard development for several years. The platform is built on many open-source projects, primarily Apache Spark.

Databricks’s open-source philosophy has led to innovation and support for several projects, including the links to MLFlow, Apache Spark, Redash, TensorFlow, PyTorch, and Keras.

While the projects above relate to data processing, machine learning, and business intelligence, Databricks has also focused on creating and supporting open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake.

Together with Databricks, projects like Apache Iceberg can bring the features of traditional relational databases and data warehouses to the data lake, essentially creating a data lakehouse.

Before going any further, let’s quickly explore the background on how Databricks came to support table formats like Apache Iceberg.

The table format landscape in Databricks

Delta Lake was born when Databricks formalized and popularized the idea of a transactional data lake, i.e., a data lakehouse, in 2017. By then, organizations that had moved from the Hadoop-Hive ecosystem were facing the next set of problems – optimizations around metadata management, directory listings, and small files.

Delta Lake was a native solution for these problems. Databricks later donated this project to Linux Foundation’s Data & AI arm.

Around the same time, two other companies, Uber and Netflix, were working on bringing transactional guarantees to their near real-time and batch-based data processing use cases, which is what led to the creation of Apache Hudi and Apache Iceberg.

Databricks added support for Apache Hudi and Apache Iceberg with limited functionality in mid-2023.

Because these open table formats work at the fundamental metadata collection and management layer, Databricks decided to integrate them with its native Unity Catalog. They created an open table format conversion tool called UniForm that allows interoperability between formats like Delta Lake, Apache Hudi, and Apache Iceberg.

Next, let’s look at UniForm – the key to Databricks-Iceberg interoperability.

Uniform: The key to reading Delta Lake tables with Apache Iceberg clients

With many table formats, including Apache Paimon (which Databricks currently doesn’t support) entering the toolchest of data engineers, Databricks saw the need for interoperability between these tools. This led to the development of UniForm (Universal Format), allowing Apache Iceberg clients to read data from Delta Lake tables without requiring format conversions or data copying.

This novel approach didn’t come without limitations. Currently, UniForm only supports metadata conversion for Parquet files using table formats – Iceberg and Hudi.

It is important to appreciate the need for this interoperability feature as it gives you the option to decouple your data processing layer completely and use the query engine of your choice to process data based on your use case – real-time streaming, batch-based, read-heavy, write-heavy, and so on.

UniForm for Databricks-Iceberg interoperability

UniForm for Databricks-Iceberg interoperability - Source: Databricks.

Databricks-Iceberg interoperability: Choosing the right catalog

Another aspect of interoperability concerns the choice of Iceberg catalog when working with Databricks. Two officially supported options are AWS Glue Data Catalog and Unity Catalog. There are many others, and Iceberg also allows you to write your catalogs using the JDBC and REST catalog options.

But first, let’s look at how the AWS Glue Data Catalog works as an Iceberg Catalog in Databricks.

AWS Glue Data Catalog as the REST Catalog for Apache Iceberg in Databricks

Many organizations that use Databricks on AWS also have other data infrastructure either on AWS native services or on other cloud platforms. Apache Iceberg gives organizations the option to opt for a singular table format. However, Iceberg still needs a backend catalog to manage metadata for all the Iceberg assets.

While Hive remains the default option for the Iceberg catalog, the AWS Glue Data Catalog is also a key option. The Databricks AWS Glue connector can help in setting up AWS Glue Data Catalog as the REST catalog for Iceberg. This is extremely useful when your organization’s data is stored in the Parquet + Iceberg combination in Amazon S3, as it gives you the flexibility to use a variety of query engines – Spark, Presto, and Trino.

Unity Catalog as the REST Catalog for Iceberg reads

Like AWS Glue Data Catalog, Unity Catalog can also be plugged into Iceberg as the REST catalog. This, for a native Databricks setup, will be more seamless and beneficial. That said, Unity Catalog integrates with other data platforms like AWS Redshift, Google BigQuery, and Snowflake.

“Unity Catalog has implemented the Iceberg REST Catalog APIs since the launch of Universal Format (UniForm) in 2023. [Its] Iceberg REST Catalog endpoints allow external systems to access tables… and extend governance via vended credentials.” - Databricks

Using the Unity Catalog will also potentially reduce the latency in catalog lookups because it won’t depend on AWS Glue’s capacity and bandwidth to respond to requests.

Unity Catalog for Iceberg reads

Unity Catalog for Iceberg reads - Source: Databricks.

Working with Databricks and Apache Iceberg: The need for a metadata control plane to extract full value

Bringing together a variety of file formats, table formats, catalogs, query engines, etc. doesn’t offer many benefits unless handled sustainably and consistently for everyone in your organization, and not just the engineers.

This is where the need for a metadata control plane arises. A metadata control plane comprises aspects of cataloging, governance, business glossary, lineage, and more.

A unified metadata control plane for your data stack

A unified metadata control plane for your data stack - Source: Atlan.

A control plane for metadata sits horizontally across your organization’s data ecosystem. It integrates with all Databricks, non-Databricks, cloud-based, and on-premises tools to bring all data assets in one place, not just to be cataloged, but also to be governed, profiled, analyzed, and thoroughly used.

This is exactly what Atlan does. It takes an organization’s data discovery, cataloging, lineage, collaboration, governance, and documentation needs and brings them all under a single roof, acting as the metadata control plane.

Read more → What is a unified control plane for data?

Databricks and Apache Iceberg: Summing up

Open table formats have not been around for long, but they have had significant implications for storing, processing, and consuming data. This makes the conversation about Apache Iceberg and its integration with various catalogs and query engines very important.

For the Databricks ecosystem, the Apache Iceberg-Databricks interoperability becomes seamless with Unity Catalog and UniForm. However, to maximize the value of this integration, you need a metadata control plane that unifies discovery, governance, and collaboration across their data ecosystem.

Despite the availability of a variety of technical catalogs, such as AWS Glue or Unity Catalog, there is still a problem of a consistent user experience for both technical and business users. A control plane for metadata, agnostic of the data tools and technologies you use, gives you access to the full landscape of data tools.

Read on to know more about Atlan’s integration with Databricks and Apache Iceberg.

Share this article

Atlan is the Context Layer for AI — a Leader in the Gartner Magic Quadrant for D&A Governance (2026) and the Forrester Wave for Data Governance (Q3 2025). Atlan unifies your data, business knowledge, and the meaning behind your terms into one Enterprise Data Graph that gives every team and every AI agent the trusted context they need. Trusted by Mastercard, Workday, General Motors, CME Group, HubSpot, FOX, Virgin Media O2, Elastic, and 400+ enterprises representing $10T+ in market cap.

Book a Demo Watch Context Studio Demo