Apache Iceberg Resources

The Apache Iceberg Resource Hub

16 in-depth guides covering architecture, format comparisons, cloud integrations, and governance. Everything you need to evaluate, implement, and govern Apache Iceberg.

16in-depth guides
4topic areas

Quick answer

What is Apache Iceberg?

Apache Iceberg is an open table format for massive analytic datasets, not a file format. It is a metadata specification that sits above storage formats like Parquet and ORC, adding ACID transactions, schema evolution, time travel, and hidden partitioning to data lakes and lakehouses. Originally developed at Netflix in 2017, it graduated to a top-level Apache project in 2020.

  • Open table format: Manages metadata above Parquet, ORC, and Avro, not the files themselves.
  • ACID transactions: Safe concurrent reads and writes across multiple query engines.
  • Time travel: Query any prior snapshot or roll back a table to a previous state.
  • Multi-engine: Spark, Trino, Flink, Snowflake, BigQuery, and Databricks all read native Iceberg.
  • Hidden partitioning: Automatic partition pruning without requiring users to specify partition columns.

Browse All 16 Apache Iceberg Guides

Format Comparisons

Evaluating table formats? These guides compare Iceberg against Delta Lake, Hudi, Paimon, and Parquet across architecture, ecosystem fit, and governance.

How Apache Iceberg compares

Quick reference. See linked guides below for full analysis.

DimensionApache IcebergDelta LakeApache HudiApache PaimonApache Parquet
TypeOpen table formatOpen table formatOpen table formatStreaming table formatColumnar file format
ACID transactionsYesYesYesYesNo
Multi-engine supportBroad (Spark, Trino, Flink, Snowflake, BQ)Databricks-first; growingLimited vs. IcebergFlink-first; growingN/A (file format)
Time travelYes (snapshot-based)YesYesYesNo
Schema evolutionNon-destructiveYesYesYesLimited
Hidden partitioningYesNoNoNoN/A
Primary strengthMulti-engine interoperabilityDatabricks ecosystemCDC / streaming upsertsFlink-native streamingStorage efficiency

Cloud & Ecosystem Integrations

Running Iceberg in production? Find integration patterns for the six most common cloud platforms and query engines.

Iceberg in production: real-world use cases

How teams are using Apache Iceberg with Atlan across industries.

Financial Services

The challenge

A global payments company needed governance across Iceberg tables running on two separate compute engines, Snowflake and Databricks, with no unified metadata view between them.

How Atlan helped

Atlan connected to both engines via the Iceberg REST catalog, providing a single governance layer with scheduled incremental metadata syncs. Compliance teams got one consistent view of lineage, ownership, and classification across both platforms.

Unified multi-engine Iceberg governance
Automotive & Manufacturing

The challenge

A Fortune 500 manufacturer stored Iceberg tables in cloud object storage managed by a Git-based catalog. Enterprise policy required documented ownership, completeness scores, and audit trails. None of that came with the catalog.

How Atlan helped

Atlan cataloged the Iceberg assets, applied automated tagging and ownership assignment, and generated metadata completeness scores tied directly to the governance team's compliance dashboard.

Policy compliance achieved across Iceberg estate
Technology

The challenge

A cloud software company wanted to power internal AI applications with metadata context but could not expose production data to external LLMs.

How Atlan helped

Atlan surfaced curated Iceberg metadata via its API to a private AI assistant, giving the LLM business context (asset descriptions, lineage, quality signals) without any raw data leaving the environment.

AI applications grounded in governed Iceberg metadata

Watch: Atlan's Metadata Lakehouse on Apache Iceberg

See how Atlan's Metadata Lakehouse uses Apache Iceberg to deliver real-time AI context by querying live metadata through Polaris, Snowflake, and Databricks without moving data.

Frequently Asked Questions about Apache Iceberg

Common questions from data engineers and architects evaluating Iceberg.

Apache Iceberg is an open table format, not a file format. It is a metadata specification that sits above storage file formats like Parquet, ORC, and Avro. Iceberg manages table snapshots, schema evolution, partition metadata, and statistics, enabling multiple query engines to safely read and write the same tables concurrently. Think of Iceberg as the organizational layer and Parquet as the storage layer. Production implementations typically use both.

Govern Apache Iceberg with Atlan

Track lineage, manage data quality, apply classifications, and govern Iceberg assets across every catalog and query engine.

 

Atlan named a Leader in 2026 Gartner® Magic Quadrant™ for D&A Governance. Read Report →

[Website env: production]