The Apache Iceberg Resource Hub
16 in-depth guides covering architecture, format comparisons, cloud integrations, and governance. Everything you need to evaluate, implement, and govern Apache Iceberg.
Quick answer
What is Apache Iceberg?
Apache Iceberg is an open table format for massive analytic datasets, not a file format. It is a metadata specification that sits above storage formats like Parquet and ORC, adding ACID transactions, schema evolution, time travel, and hidden partitioning to data lakes and lakehouses. Originally developed at Netflix in 2017, it graduated to a top-level Apache project in 2020.
- ◆Open table format: Manages metadata above Parquet, ORC, and Avro, not the files themselves.
- ◆ACID transactions: Safe concurrent reads and writes across multiple query engines.
- ◆Time travel: Query any prior snapshot or roll back a table to a previous state.
- ◆Multi-engine: Spark, Trino, Flink, Snowflake, BigQuery, and Databricks all read native Iceberg.
- ◆Hidden partitioning: Automatic partition pruning without requiring users to specify partition columns.
Browse All 16 Apache Iceberg Guides
Foundations
Start here. Covers what Iceberg is, how its 3-layer metadata model works, and why it was built to replace Hive-era table management.
Apache Iceberg 101
Definition, history, and why Netflix created Iceberg in 2017. Start here if you're new to the open table format.
Read the Iceberg 101 guideFoundationsApache Iceberg Architecture
The 3-layer metadata model (catalog, metadata files, and data files), explained with the full spec.
Explore the architecture guideFoundationsApache Iceberg Benefits
Six core benefits including ACID transactions, time travel, and hidden partitioning, and how they solved Hive-era pain.
See all Iceberg benefitsFormat Comparisons
Evaluating table formats? These guides compare Iceberg against Delta Lake, Hudi, Paimon, and Parquet across architecture, ecosystem fit, and governance.
Apache Iceberg vs. Delta Lake
Architecture, ecosystem, and governance differences between the two most popular open table formats.
Compare Iceberg vs. Delta LakeComparisonsApache Hudi vs. Apache Iceberg
When to use Hudi's streaming-first, record-level upsert model versus Iceberg's broader engine compatibility.
Compare Iceberg vs. HudiComparisonsApache Paimon vs. Apache Iceberg
When the newer streaming-first format makes sense versus Iceberg's wider ecosystem adoption.
Compare Iceberg vs. PaimonComparisonsApache Parquet vs. Apache Iceberg
File format vs. table format: why you need both and how they complement each other in production.
Compare Iceberg vs. ParquetComparisonsApache Iceberg Alternatives
Full decision guide for choosing your open table format, including when Iceberg is not the right choice.
View all Iceberg alternativesHow Apache Iceberg compares
Quick reference. See linked guides below for full analysis.
| Dimension | Apache Iceberg | Delta Lake | Apache Hudi | Apache Paimon | Apache Parquet |
|---|---|---|---|---|---|
| Type | Open table format | Open table format | Open table format | Streaming table format | Columnar file format |
| ACID transactions | Yes | Yes | Yes | Yes | No |
| Multi-engine support | Broad (Spark, Trino, Flink, Snowflake, BQ) | Databricks-first; growing | Limited vs. Iceberg | Flink-first; growing | N/A (file format) |
| Time travel | Yes (snapshot-based) | Yes | Yes | Yes | No |
| Schema evolution | Non-destructive | Yes | Yes | Yes | Limited |
| Hidden partitioning | Yes | No | No | No | N/A |
| Primary strength | Multi-engine interoperability | Databricks ecosystem | CDC / streaming upserts | Flink-native streaming | Storage efficiency |
Cloud & Ecosystem Integrations
Running Iceberg in production? Find integration patterns for the six most common cloud platforms and query engines.
Apache Iceberg on AWS
Using Iceberg with Amazon Athena, EMR, S3, and AWS Glue Catalog in a cloud-native lakehouse.
Read the AWS Iceberg guideCloudApache Iceberg & AWS Glue
ETL pipelines and REST catalog specifics for Iceberg on AWS Glue, including native integration patterns.
Read the AWS Glue integration guideCloudApache Iceberg on Azure
Iceberg with Azure Synapse Analytics, Data Factory, and Microsoft OneLake integration.
Read the Azure Iceberg guideCloudApache Iceberg with BigQuery
BigLake Managed Tables and Iceberg metadata pattern options for Google Cloud analytics.
Read the BigQuery Iceberg guideCloudApache Iceberg in Snowflake
Open Catalog (Polaris) integration and Iceberg table support inside Snowflake's managed platform.
Read the Snowflake Iceberg guideCloudDatabricks & Apache Iceberg
Unity Catalog, UniForm, and how Databricks bridges Delta Lake and Iceberg workloads.
Read the Databricks Iceberg guideGovernance & Management
Once Iceberg tables exist, you need a catalog and a governance layer. Start with your catalog options, then understand where Iceberg's built-in governance ends.
Apache Iceberg Data Catalog Options
AWS Glue, Apache Polaris, Project Nessie, and Hive Metastore: compare the options and find the right fit for your stack.
Explore Iceberg catalog optionsGovernanceApache Iceberg Table Governance
Governance primitives built into Iceberg, where they end, and where Polaris and Atlan pick up.
Read the Iceberg governance guideIceberg in production: real-world use cases
How teams are using Apache Iceberg with Atlan across industries.
The challenge
A global payments company needed governance across Iceberg tables running on two separate compute engines, Snowflake and Databricks, with no unified metadata view between them.
How Atlan helped
Atlan connected to both engines via the Iceberg REST catalog, providing a single governance layer with scheduled incremental metadata syncs. Compliance teams got one consistent view of lineage, ownership, and classification across both platforms.
The challenge
A Fortune 500 manufacturer stored Iceberg tables in cloud object storage managed by a Git-based catalog. Enterprise policy required documented ownership, completeness scores, and audit trails. None of that came with the catalog.
How Atlan helped
Atlan cataloged the Iceberg assets, applied automated tagging and ownership assignment, and generated metadata completeness scores tied directly to the governance team's compliance dashboard.
The challenge
A cloud software company wanted to power internal AI applications with metadata context but could not expose production data to external LLMs.
How Atlan helped
Atlan surfaced curated Iceberg metadata via its API to a private AI assistant, giving the LLM business context (asset descriptions, lineage, quality signals) without any raw data leaving the environment.
Watch: Atlan's Metadata Lakehouse on Apache Iceberg
See how Atlan's Metadata Lakehouse uses Apache Iceberg to deliver real-time AI context by querying live metadata through Polaris, Snowflake, and Databricks without moving data.
Frequently Asked Questions about Apache Iceberg
Common questions from data engineers and architects evaluating Iceberg.
Govern Apache Iceberg with Atlan
Track lineage, manage data quality, apply classifications, and govern Iceberg assets across every catalog and query engine.