Apache Iceberg v3: What Snowflake Data Teams Need to Know

Emily Winks profile picture
Data Governance Expert
Updated:06/03/2026
|
Published:06/03/2026
17 min read

Key takeaways

  • Deletion vectors deliver up to 10x faster DML operations vs copy-on-write. GA on Snowflake May 7, 2026.
  • Row lineage: native _row_id and _last_updated_sequence_number fields enable CDC without external tooling.
  • Upgrading from v2 to v3 is irreversible on Snowflake-managed tables. Trino is not v3-ready as of June 2026.
  • Iceberg v3 is the format layer. Atlan is the context plane: 100+ connectors, column-level lineage, AI-ready metadata.

What is Apache Iceberg v3?

Apache Iceberg v3 is the third major version of the open table format specification, generally available on Snowflake as of May 7, 2026 and on Databricks Runtime 18.0+. After four years of community development, v3 adds seven capabilities: deletion vectors (up to 10x faster DML), row lineage for native CDC, VARIANT type for semi-structured data, default column values, geometry and geography types, nanosecond timestamps, and multi-argument partition transforms. Backward compatible: v3 readers can read v2 tables; v2 readers cannot read v3 tables.

The seven new capabilities

  • Deletion vectors: binary bitmaps replacing positional delete files; O(1) read-time lookup vs. O(log n) in v2; up to 10x faster DML
  • Row lineage: _row_id and _last_updated_sequence_number fields per row; enables native CDC without external tooling
  • VARIANT type: native semi-structured column with shredding and filter pushdown; replaces JSON-as-STRING workaround
  • Default column values: recorded in schema metadata; applied transparently at read time, no backfill required
  • Geometry + Geography types: planar and WGS84 spatial types with bounding box pushdown
  • Nanosecond timestamps: timestamp_ns and timestamptz_ns; matches native Snowflake precision
  • Multi-argument partition transforms: bucketing on composite columns for more granular pruning

Is your data estate AI-agent ready?

Assess Your Readiness

Apache Iceberg v3 became generally available on Snowflake on May 7, 2026, delivering up to 10x faster DML operations through deletion vectors, native row lineage for change data capture, and a VARIANT type for semi-structured data, all built directly into the format spec. This page covers every new capability and what Snowflake teams need to know before upgrading. The context layer above Iceberg v3 is where AI-readiness begins.

Property Value
Version Apache Iceberg v3 (Format Version 3)
GA on Snowflake May 7, 2026
GA on Databricks Databricks Runtime 18.0+
Latest library Iceberg 1.11.0 (May 19, 2026)
Key features Deletion vectors, row lineage, VARIANT type, default column values, geometry/geography types, nanosecond timestamps
Backward compatible Yes, v3 readers can read v2 tables; v2 readers cannot read v3 tables
Snowflake default New tables default to v2; opt-in to v3 required
Trino support Not ready for production as of early 2026

The seven new capabilities in Apache Iceberg v3

Permalink to “The seven new capabilities in Apache Iceberg v3”

Apache Iceberg v3 is the third major version of the open table format specification, generally available on Snowflake as of May 7, 2026 and on Databricks Runtime 18.0+. Released after four years of community development, v3 adds seven capabilities: deletion vectors, row lineage, VARIANT type, default column values, geometry and geography types, nanosecond timestamps, and multi-argument partition transforms, that make Iceberg production-grade for data manipulation workloads, change data capture, and semi-structured data without external workarounds. Context engineering is the layer that makes these tables AI-queryable.

Apache Iceberg is an open table format for large analytic tables: it defines how data files are organized on object storage and how metadata tracks schema, partitioning, and snapshots. If you need the Iceberg three-layer metadata architecture as context, see the foundation guide first. This page assumes you know Iceberg v2 and focuses on what changed.

GA timeline:

  • Snowflake preview: March 4, 2026
  • Snowflake GA: May 7, 2026
  • Databricks GA: Runtime 18.0+
  • Iceberg library 1.11.0: May 19, 2026 (latest stable)

See how Atlan makes Iceberg v3 tables AI-ready across your full stack

Enterprise Data Graph and Context Layer

What is new in Apache Iceberg v3?

Permalink to “What is new in Apache Iceberg v3?”

Apache Iceberg v3 ships seven new capabilities. The most impactful for data manipulation workloads is deletion vectors, up to 10x faster DML than copy-on-write, followed by row lineage, which enables native change data capture without external CDC infrastructure. All seven features are generally available on Snowflake as of May 7, 2026.

Feature What it does Performance / impact Snowflake GA status
Deletion vectors Binary bitmaps stored in Puffin files; mark deleted rows at O(1) per row at read time, one vector per data file per snapshot Up to 10x faster DML vs. copy-on-write (v2 baseline); AWS EMR 7.11 benchmarks confirm faster, cheaper compaction GA May 7, 2026
Row lineage Two persistent fields per row: _row_id (unique long, never changes) and _last_updated_sequence_number (commit that last modified the row) Eliminates full-file reprocessing for CDC; query _last_updated_sequence_number to identify exact rows changed per commit GA May 7, 2026
VARIANT type Native semi-structured column type for JSON-like payloads; supports shredding (columnar substructure) and filter pushdown Eliminates full-payload parse on every query vs. v2 STRING workaround; COPY, Snowpipe, and Snowpipe Streaming support auto-subcolumnarization GA May 7, 2026
Default column values Default recorded in schema metadata at column-add time; applied transparently at read time for pre-existing rows Eliminates backfill cost and NULL-vs-default distinction in downstream queries GA May 7, 2026
Geography + Geometry types geometry (planar, projected CRS) and geography (WGS84, Earth curvature); bounding boxes for spatial filter pushdown Eliminates WKT/WKB string workarounds; enables engine-level spatial indexing GA May 7, 2026
Nanosecond timestamps timestamp_ns and timestamptz_ns; v2 supported only microsecond precision Matches native Snowflake table precision; eliminates precision mismatch data loss on Snowflake-to-Iceberg conversion GA May 7, 2026
Multi-argument partition transforms Bucketing on composite columns or date functions, not single-column only More granular partition pruning for complex strategies without workarounds GA May 7, 2026

Deletion vectors: the 10x DML improvement explained

Permalink to “Deletion vectors: the 10x DML improvement explained”

In v2, row-level deletes used positional delete files: separate files the engine merge-joins with data files at read time (O(log n) per row), growing more expensive as deletes accumulate across snapshots.

In v3, one binary bitmap (Puffin file) per data file per snapshot applies at read time: bit = 1 means row deleted, O(1) lookup. AWS benchmarks on EMR 7.11 with Spark 3.5.6 and Iceberg 1.9.1 confirmed the performance improvement. Databricks encoded Iceberg v3 deletion vectors using identical binary format to Delta Lake’s deletion vectors, enabling cross-format interoperability without data movement.

Practical Snowflake implication: deletion vectors accelerate UPDATE and DELETE on Snowflake-managed Iceberg tables, particularly valuable for CDC workloads and frequent MERGE patterns. See the AWS blog on deletion vectors and row lineage for benchmark details.

Row lineage: native change data capture in the format spec

Permalink to “Row lineage: native change data capture in the format spec”

The two fields: _row_id (unique long, assigned at insert, never mutated) and _last_updated_sequence_number (the commit sequence number of the last modification). The next-row-id table property is mandatory, tracking the next available row ID across all writers.

In v2, detecting changed rows required external CDC tooling (Debezium, Kafka connectors) or full-file reprocessing. In v3, a SQL query on _last_updated_sequence_number identifies exact changed rows per commit window. On Snowflake, row lineage powers Dynamic Iceberg Tables with declarative syntax for INSERT, UPDATE, DELETE, and MERGE operations. Use cases include CDC without external tooling, incremental processing without full-partition scans, and audit trails in regulated environments. See Snowflake data lineage for how lineage is tracked and surfaced. AI agent observability built on row lineage gives regulated enterprises the traceable evidence trail they need.

VARIANT type: semi-structured data without the JSON workarounds

Permalink to “VARIANT type: semi-structured data without the JSON workarounds”

In v2, JSON was stored as a STRING column, losing type information and forcing full-payload parsing on every query, with no predicate pushdown into the payload. The v3 VARIANT type uses high-performance binary encoding and supports shredding: flattening variant substructures into columnar format so SQL predicates can push down into the payload without full-parse.

On Snowflake, COPY, Snowpipe, and Snowpipe Streaming all support VARIANT with automatic subcolumnarization. The VARIANT type is aligned with the type introduced in Apache Spark 4.0, ensuring consistent semantics across the Spark + Iceberg stack. Jacob Leverich, Co-Founder and CTO of Observe, described it: “Iceberg v3 support for the variant data type is a major unlock for the industry.” Use cases include IoT sensor data, API event logs, observability telemetry, and NoSQL-to-lakehouse migration.

Default values, nanosecond timestamps, and geometry types

Permalink to “Default values, nanosecond timestamps, and geometry types”

Default column values: New column added to an existing table? The default value is recorded in schema metadata and applied transparently at read time for rows written before the column existed. No backfill. No NULL-vs-default logic in every downstream query.

Nanosecond timestamps: timestamp_ns and timestamptz_ns. Required for high-frequency trading, precision IoT, and network telemetry. Native Snowflake TIMESTAMP_NTZ and TIMESTAMP_TZ columns now map to v3 nanosecond types, eliminating the precision mismatch that caused data loss when converting native Snowflake tables to Iceberg format.

Geometry and Geography: geometry (planar, projected CRS) and geography (WGS84). Bounding boxes for spatial filter pushdown. Eliminates WKT/WKB string workarounds that v2 geospatial users relied on.


Apache Iceberg v3 on Snowflake: what is supported?

Permalink to “Apache Iceberg v3 on Snowflake: what is supported?”

All major Iceberg v3 features are generally available on Snowflake as of May 7, 2026, including deletion vectors, row lineage, VARIANT type, default column values, geography and geometry types, and nanosecond timestamps. Two capabilities are not yet available: external engine writes to v3 tables via Horizon, and in-place version upgrades for Snowflake-managed tables.

GA features (May 7, 2026):

  • Deletion vectors (improved UPDATE/DELETE/MERGE performance)
  • Row lineage (_row_id, _last_updated_sequence_number, Dynamic Iceberg Tables)
  • VARIANT data type (with auto-subcolumnarization in COPY, Snowpipe, Snowpipe Streaming)
  • Default column values
  • Geography + Geometry types
  • Nanosecond timestamps
  • External engine reads via Horizon Iceberg REST Catalog (Apache Polaris)
  • GET_DDL returns ICEBERG_VERSION property for all Snowflake-managed Iceberg tables

Default behavior note: New Snowflake-managed Iceberg tables default to v2. Opt-in to v3 is required.

What is NOT yet supported:

  • External engine writes to v3 tables via Horizon (Iceberg REST Scan Plan API is read-only for v3 as of GA)
  • In-place version upgrade for Snowflake-managed tables (no ALTER TABLE ... SET ICEBERG_VERSION = 3)

Upgrade path for external engines (Spark):

ALTER TABLE db.tbl SET TBLPROPERTIES ('format-version'='3')

Note: Requires a new snapshot post-upgrade (a DML operation); upgrade is irreversible; verify all downstream reader engines support v3 before executing on production tables.

Snowflake Storage for Apache Iceberg Tables (GA April 15, 2026): Managed storage on Snowflake infrastructure; data readable as standard Iceberg by external engines. Provides a 7-day managed recovery window, cross-region/cross-cloud replication, automatic compaction, and no manual VACUUM. Announced at Snowflake Summit June 2, 2026 as part of the open interoperability framework.

CREATE ICEBERG TABLE my_iceberg_table_internal (col1 int)
CATALOG = SNOWFLAKE
EXTERNAL_VOLUME = SNOWFLAKE_MANAGED;

For how Atlan connects to Apache Polaris and the Snowflake Horizon Catalog to surface Iceberg table metadata, see the dedicated guide. For the broader Snowflake Horizon Context picture, see that companion page. Snowflake CoWork and Snowflake CoCo both depend on the Iceberg data layer for their agentic queries.


Apache Iceberg v2 vs v3: should you upgrade?

Permalink to “Apache Iceberg v2 vs v3: should you upgrade?”

The upgrade from Iceberg v2 to v3 is irreversible and requires every engine in your stack to support v3 before you proceed. Snowflake and Databricks Runtime 18.0+ are production-ready; Trino is not. If your workload involves frequent row-level deletes, CDC without external tooling, or semi-structured payloads, upgrade now. If Trino is a primary query engine, wait.

Dimension v2 v3 Migration note
Row-level deletes Positional delete files; O(log n) at read Deletion vectors (Puffin); O(1) at read In-place upgrade for external engines (Spark ALTER TABLE); Snowflake-managed requires new table
Change data capture External tooling required or full-file rescan Native via _row_id + _last_updated_sequence_number New v3 tables get row lineage automatically; v2 tables cannot retroactively add it
Semi-structured data STRING column for JSON; full-payload parse; no pushdown VARIANT type; shredding; filter pushdown Schema migration needed to retype existing STRING JSON columns to VARIANT
Default column values Application logic or backfill required Recorded in schema metadata; applied at read time Transparent after upgrade; no backfill needed for new defaults
Timestamp precision Microsecond Nanosecond (timestamp_ns, timestamptz_ns) Eliminates Snowflake-to-Iceberg precision mismatch
Spatial types WKT/WKB string workarounds Native geometry + geography types Existing spatial data requires retyping

Ecosystem readiness:

Engine v3 support status Notes
Apache Spark 4.0 Full support Reference implementation for VARIANT; Spark 3.5.x has partial support
Databricks Runtime 18.0+ GA (all clouds: AWS, Azure, GCP) Unity Catalog required; deletion vectors, row lineage, VARIANT
Apache Flink 2.0 + Iceberg 1.10 Strong and accelerating Nanosecond timestamps, VARIANT, row lineage for streaming CDC
Amazon EMR 7.11 Production-ready Spark 3.5.6 + Iceberg 1.9.1; AWS benchmarks confirmed
Trino Not ready (early 2026) Wait; monitor Trino v3 compatibility changelog before upgrading
Google BigLake Metastore Read support Google contributed to Iceberg 1.10; BigLake reads v3 tables

When to upgrade now vs. wait:

  • Upgrade now if: frequent UPDATE/DELETE/MERGE workloads; building CDC pipelines; ingesting IoT/API/JSON data with advanced RAG techniques; high-frequency trading or precision telemetry requiring nanosecond timestamps; geospatial workloads
  • Wait if: Trino is a primary query engine; any engine in your stack has not confirmed v3 support; you have active cross-engine writes to Snowflake-managed tables via Horizon

How Atlan’s context layer works with Iceberg v3 tables

Permalink to “How Atlan’s context layer works with Iceberg v3 tables”

Iceberg v3 is the format layer: it defines how data is stored and tracked on object storage. Atlan is the context layer for AI agents above it: 100+ connectors pull column-level lineage, certified definitions, and quality signals from Iceberg tables and every system upstream and downstream, delivering that context to AI agents 5x more accurately than agents operating without a context layer.

Column-level lineage for Iceberg tables

Permalink to “Column-level lineage for Iceberg tables”

Atlan connects to Iceberg catalogs (Polaris, Glue, Nessie, Hive Metastore) and query engines. SQL parsing extracts transformation logic at the column level from Snowflake, Databricks, BigQuery, and Redshift. OpenLineage events from Airflow, Spark, dbt Cloud, and Astronomer are natively consumed, capturing runtime inputs, outputs, and column-level transformations as pipelines execute.

Result: a data team can see which columns in a v3 Iceberg table flow to which downstream dashboards and models. See column-level lineage and Iceberg table governance for how this is surfaced in the enterprise data graph.

Row lineage signals in the Enterprise Data Graph

Permalink to “Row lineage signals in the Enterprise Data Graph”

Iceberg v3’s _row_id and _last_updated_sequence_number operate at the format layer. Atlan operates at the metadata layer above: it surfaces pipeline activity and data quality for AI freshness signals derived from row lineage. The practical outcome: agents and analysts accessing an Iceberg v3 table through Atlan’s MCP server for Snowflake can see “which rows in this table changed as part of this pipeline run,” governance visibility that no other tool in the SERP currently covers for this format.

Context Lakehouse: Iceberg-native architecture

Permalink to “Context Lakehouse: Iceberg-native architecture”

Atlan’s own context store is built on Iceberg-native formats: open, graph-plus-file architecture, vector-native AI search. When Iceberg v3 tables are the data layer, Atlan’s context layer stores metadata about those tables in the same open format. No vendor lock-in at the context layer.

Making Iceberg v3 tables AI-ready

Permalink to “Making Iceberg v3 tables AI-ready”

An AI agent querying a Snowflake Iceberg v3 table without context gets raw data. The same agent grounded in Atlan’s Enterprise Data Graph gets: certified column definitions, cross-system lineage, quality signals, ownership, classification tags, related glossary terms, usage patterns, and row-level freshness signals.

Atlan AI Labs measures a 5x accuracy improvement in agents grounded in the Enterprise Data Graph. 83% of AI pilots never reach production (Atlan research); the gap is almost always context, not model capability. Context agents auto-generate descriptions for Iceberg tables at scale: 690K+ descriptions generated across 50+ enterprise customers, 87% rated on par or better than human writing.

See how to implement an enterprise context layer for AI for the architecture with Iceberg as the data layer and Atlan as the context plane above it. For the broader agent context layer picture, see the dedicated guide. The context catalog is the governance layer that makes these tables trustworthy for AI queries.

See how Atlan's Enterprise Data Graph surfaces Iceberg lineage across your full stack

Watch Context Layer Live

Real stories: Iceberg and enterprise data teams

Permalink to “Real stories: Iceberg and enterprise data teams”

“We just launched our first use case to our executive dashboards. The value of Atlan overall exceeds their expectation in the whole data journey we are launching. Now it gives them the fingertip definitions and all those things.” (Data platform lead at a UK telecom company)

“Atlan is much more than a catalog of catalogs. It’s more of a context operating system. Atlan enabled us to easily activate metadata for everything from discovery in the marketplace to AI governance to data quality to an MCP server delivering context to AI models.” (Sridher Arumugham, Chief Data and Analytics Officer, DigiKey)


What Apache Iceberg v3 means for your data stack

Permalink to “What Apache Iceberg v3 means for your data stack”

Apache Iceberg v3 is a backward-compatible format upgrade that closes the longstanding gap between Iceberg’s architectural promise and production DML performance. The four most impactful changes for data teams on Snowflake:

  1. Deletion vectors - up to 10x faster UPDATE, DELETE, and MERGE operations; a direct improvement for CDC and MERGE-heavy workloads
  2. Row lineage - native _row_id + _last_updated_sequence_number fields enable CDC without external tooling; now GA on Snowflake
  3. VARIANT type - semi-structured payloads with filter pushdown, without a separate document store
  4. Default column values - schema evolution without backfill cost

The format layer is now stronger. The remaining work is the context layer above it: lineage across systems, certified definitions, AI-ready metadata. That is what Atlan’s Enterprise Data Graph provides, across Snowflake, Databricks, and every other system in your stack. The context store for AI in Atlan’s Context Lakehouse is built natively on this Iceberg layer.

What is the context layer your AI agents need? See what is context engineering? The knowledge graph model underlying the Enterprise Data Graph is what makes cross-system entity resolution possible.

Book a demo


FAQs about Apache Iceberg v3

Permalink to “FAQs about Apache Iceberg v3”

1. What is Apache Iceberg v3?

Permalink to “1. What is Apache Iceberg v3?”

Apache Iceberg v3 is the third major version of the open table format specification. Released after four years of community development, it adds deletion vectors (up to 10x faster DML), row lineage for native CDC, a VARIANT type for semi-structured data, default column values, geometry and geography types, nanosecond timestamps, and multi-argument partition transforms. It became generally available on Snowflake on May 7, 2026.

2. What is the difference between Apache Iceberg v2 and v3?

Permalink to “2. What is the difference between Apache Iceberg v2 and v3?”

The most significant difference is deletion vectors. In v2, row-level deletes use positional delete files requiring an O(log n) merge at read time. In v3, deletion vectors are binary bitmaps applied at O(1) per row, up to 10x faster. v3 also adds native row lineage (no external CDC tooling), a VARIANT type for semi-structured data, and nanosecond timestamps. The upgrade is irreversible.

3. Is Apache Iceberg v3 supported in Snowflake?

Permalink to “3. Is Apache Iceberg v3 supported in Snowflake?”

Yes. All major Iceberg v3 features, deletion vectors, row lineage, VARIANT type, default column values, geometry and geography types, and nanosecond timestamps, are generally available on Snowflake as of May 7, 2026. New Snowflake-managed Iceberg tables default to v2; v3 requires opt-in. External engine writes to v3 tables via Horizon are not yet supported.

4. What are deletion vectors in Apache Iceberg v3?

Permalink to “4. What are deletion vectors in Apache Iceberg v3?”

Deletion vectors are binary bitmaps stored in Puffin files alongside Iceberg data files. Each bit marks a row as deleted or active. At read time, the engine applies the bitmap at O(1) per row, compared to v2’s positional delete files which required an O(log n) merge-join per file. There can be at most one deletion vector per data file per snapshot. AWS benchmarks on EMR 7.11 confirm up to 10x faster DML.

5. What is row lineage in Apache Iceberg v3?

Permalink to “5. What is row lineage in Apache Iceberg v3?”

Row lineage in Iceberg v3 attaches two persistent fields to every row: _row_id (a unique long identifier assigned at insert that never changes) and _last_updated_sequence_number (the sequence number of the commit that last modified the row). These fields enable change data capture without external tooling. Query _last_updated_sequence_number to identify exactly which rows changed in any commit window.

6. How do I upgrade from Apache Iceberg v2 to v3 in Snowflake?

Permalink to “6. How do I upgrade from Apache Iceberg v2 to v3 in Snowflake?”

For Snowflake-managed Iceberg tables, in-place upgrade is not supported as of May 2026; you must create new v3 tables. For external engine tables (Spark): run ALTER TABLE db.tbl SET TBLPROPERTIES ('format-version'='3'), then execute a DML operation to create the first v3 snapshot. The upgrade is irreversible. Before upgrading any production table, confirm every reader engine in your stack supports v3. Trino is not v3-ready as of early 2026.

7. Does Apache Spark support Apache Iceberg v3?

Permalink to “7. Does Apache Spark support Apache Iceberg v3?”

Yes. Apache Spark 4.0 is the reference implementation for Iceberg v3, with full support for deletion vectors, row lineage, and VARIANT type (read + basic write; shredded VARIANT write is not yet supported). Spark 3.5.x has partial v3 support via Iceberg 1.9.1+. Databricks Runtime 18.0+ supports all v3 features on AWS, Azure, and GCP with Unity Catalog enabled.

8. Is Apache Iceberg v3 production-ready?

Permalink to “8. Is Apache Iceberg v3 production-ready?”

On Snowflake and Databricks, yes: both platforms declared GA in 2026. Apache Flink 2.0 plus Iceberg 1.10 support is strong for streaming CDC workloads. The dependency is your full engine stack: Trino does not support v3 as of early 2026. If Trino is a primary query engine, wait. For Snowflake-only or Snowflake plus Databricks stacks, v3 is production-ready.


Sources

Permalink to “Sources”
  1. Apache Iceberg Official Specification. Apache Software Foundation. 2026.
  2. Apache Iceberg v3 GA on Snowflake - Release Note. Snowflake. May 7, 2026.
  3. Announcing Apache Iceberg v3 Support on Snowflake. Snowflake Blog. 2026.
  4. Snowflake Storage for Apache Iceberg Tables. Snowflake Blog. 2026.
  5. Apache Iceberg v3: Moving the Ecosystem Towards Unification. Databricks Blog. 2026.
  6. Accelerate Data Lake Operations with Apache Iceberg v3 Deletion Vectors and Row Lineage. AWS Big Data Blog. 2025.
  7. Apache Iceberg v2 vs v3: What Changed and What It Means for Your Tables. Dremio. 2026.
  8. What Is New in Apache Iceberg v3?. Google Open Source Blog. August 2025.
  9. Snowflake Summit 2026: Context, Custom Model Training, Iceberg V3. Constellation Research. June 2026.

Share this article

signoff-panel-logo

Atlan is the context layer for AI, the governed infrastructure that delivers enterprise knowledge to every model, every agent, and every team from a single source of truth.

Bridge the context gap.
Ship AI that works.

[Website env: production]