Working with Apache Iceberg in Snowflake: A Complete Guide for 2025
Share this article
Snowflake’s default storage is a proprietary format that uses micro-partitions and clusters to store and organize data efficiently. However, in a move toward supporting open standards for data architecture, Snowflake decided to integrate with the widely adopted table format – Apache Iceberg.
AI on Snowflake? Learn to govern it — live with Atlan.
This article will introduce you to the function and scope of Apache Iceberg in Snowflake and the Snowflake Open Catalog, which is a managed version of Apache Polaris, a REST-based catalog for Apache Iceberg. We’ll also discuss the need for a metadata control plane despite having internal catalogs like Snowflake Open Catalog or Snowflake Horizon Catalog.
Table of Contents #
- Iceberg tables in Snowflake: How does it work?
- Catalog options for Iceberg tables in Snowflake
- Working with Apache Iceberg in Snowflake: The need for a metadata control plane
- Working with Apache Iceberg in Snowflake: Bottom line
- FAQs about working with Apache Iceberg in Snowflake
- Working with Apache Iceberg in Snowflake: Related reads
Iceberg tables in Snowflake: How does it work? #
Snowflake announced the support for Iceberg tables at the Snowflake Summit 2022 and the Iceberg tables preview came with Snowflake’s 7.42 release. Since then, significant developments have occurred on Iceberg and Snowflake.
To ensure that it supports a wide variety of use cases and workload types, Snowflake broadly offers four types of tables:
- Temporary and transient tables: Used for storing transitory and impermanent data used for stage-level processing and other temporary use cases
- External tables: Used to create an external stage that plugs into an external storage backend, such as Amazon S3
- Hybrid tables: Used to store data in both row and columnar formats to address transactional and analytical workloads
- Apache Iceberg tables: Used to store and manage metadata via the Iceberg table format with the Parquet file format
Snowflake decided to support Apache Iceberg as long as the underlying file format is Apache Parquet. This was due to the widespread use and demand for both formats, especially since Iceberg brings features like ACID guarantees, schema evolution, hidden partitioning, and snapshots to existing Parquet-based data lakes.
Companies like Branch migrated their petabyte-scale Parquet data lake to an Iceberg-based data lakehouse. AWS also published guidance for their customers on migrating S3 data lakes to transactional lakehouses using Parquet as the underlying file format and Iceberg as the table format.
In June 2024, Iceberg tables in Snowflake went GA with the release of Snowflake 8.20. Let’s look at some of the prominent features of the Snowflake + Iceberg integration.
- Storage integration: When using Iceberg, you can store data and metadata in a cloud-based object store like Amazon S3, Google Cloud Storage, or Azure Storage. You can do this by using external volumes via Snowflake’s storage integration.
- Catalog integration: Snowflake needs a catalog to interact with Iceberg tables. This catalog manages and organizes Iceberg tables and maintains current metadata pointers for tables, snapshots for point-in-time queries, and other related tasks.
One of the key architectural components of Apache Iceberg is its internal catalog, which tracks the current metadata for any given table. Iceberg allows you to use various types of catalogs. Let’s look at your catalog options when using Iceberg with Snowflake.
Catalog options for Iceberg tables in Snowflake #
Currently, there are two catalog options in Snowflake that you can choose from to manage Iceberg tables:
- Snowflake Open Catalog: This is a Snowflake-managed implementation of Apache Polaris based on a REST API implementation of the Iceberg catalog specification. It stores all data and metadata in an external location, such as Amazon S3, which can be accessed using an external volume-based storage integration.
- External catalog: Iceberg offers a range of cataloging options, including the AWS Glue Data Catalog, a JDBC-based catalog, or a REST-based catalog (such as the self-managed Apache Polaris). Having the option of a JDBC or REST-based catalog also means that you can implement your in-house catalog for Iceberg, which is not recommended for most organizations because of its tremendous complexity and maintenance requirements.
Snowflake Open Catalog, which is an internal catalog, works in the following way:
“… an Iceberg table is registered in Open Catalog but read and written via query engines. The table data and metadata is stored in your external cloud storage. The table uses Open Catalog as the Iceberg catalog.”
Achieving the same using an external catalog would involve setting up something like AWS Glue Data Catalog or Apache Polaris instead of Snowflake Open Catalog.
The key difference when choosing an external catalog is that you don’t get write access or Snowflake Platform Support. So, this option is not available with all the invisible table maintenance work, such as compaction, that Snowflake does under the hood – something to keep in mind when evaluating your options.
Working with Apache Iceberg in Snowflake: The need for a metadata control plane #
Regardless of the option you choose for cataloging Apache Iceberg tables, a key problem remains – your organization would still need a complete view of your data estate in one place to leverage metadata for data governance, quality, lineage, and observability purposes.
For instance, a data engineer would need a complete view of the data estate to ensure she is not duplicating data assets while addressing a new integration request. A data analyst must have a business glossary to ensure his report aligns with organizational language and metric definitions. Similar scenarios would apply to anyone else in the organization consuming data.
To solve their data needs, a need for a unified control plane for metadata arises, and that’s where Atlan comes into the picture.
By putting your metadata to use with a metadata control plane, your organization can maximize its investments in all the different pieces of the data puzzle, such as storage, orchestration, and query engines. Bringing Iceberg and Snowflake together with Atlan’s control plane for metadata will activate data and AI governance across your data ecosystem.
Working with Apache Iceberg in Snowflake: Bottom line #
The coming together of Iceberg and Snowflake is a boon for organizations, but to maximize the impact of adopting Iceberg on Snowflake, you need to think about managing all your data from a singular place, a metadata control plane.
This article walked you through the scope of Iceberg within Snowflake and the Iceberg catalog options that Snowflake offers. It also introduced you to the idea of and need for a metadata control plane like Atlan. Atlan is essential for bringing out the most value from your data assets, irrespective of the tools you use, but especially with tools like Iceberg and Snowflake, with which Atlan has close integrations. To learn more about this integration, check out Atlan’s official documentation.
FAQs about working with Apache Iceberg in Snowflake #
What is Apache Iceberg, and why is it important for Snowflake users? #
Apache Iceberg is an open table format designed for managing large analytic datasets. For Snowflake users, Iceberg provides advanced capabilities such as ACID transactions, schema evolution, and time travel. Integrating Iceberg with Snowflake allows users to leverage these features while taking advantage of Snowflake’s efficient cloud infrastructure and its ability to work with external cloud storage. The use of Apache Parquet as the underlying file format in Iceberg tables ensures seamless integration with existing data lakes.
How do Iceberg tables in Snowflake differ from traditional Snowflake tables? #
Traditional Snowflake tables rely on Snowflake’s proprietary micro-partition format, while Iceberg tables use the Apache Iceberg format with Parquet files for storage. Iceberg tables bring features like hidden partitioning, snapshot isolation, and schema flexibility, which are not natively available in Snowflake’s standard tables. This makes Iceberg tables particularly suitable for large-scale data lakes where versioning, time travel, and schema evolution are required.
What catalog options are available for managing Iceberg tables in Snowflake? #
Snowflake offers two primary catalog options for Iceberg tables:
- Snowflake Open Catalog: A managed implementation based on Apache Polaris, this catalog is tightly integrated with Snowflake and stores metadata in external cloud storage like Amazon S3.
- External Catalogs: Options like AWS Glue, JDBC, or self-managed Apache Polaris can also be used. However, using an external catalog means losing Snowflake’s native platform support and benefits like automated table maintenance and performance optimizations.
Do I need a metadata control plane if I am using Snowflake Open Catalog for Iceberg tables? #
Yes, while Snowflake Open Catalog helps manage metadata for Iceberg tables, a metadata control plane is crucial for a holistic view of your data assets across multiple platforms and tools. A control plane like Atlan allows you to manage metadata, data lineage, data governance, and quality across your entire data ecosystem, ensuring that all teams have consistent and accurate data insights, regardless of the underlying infrastructure.
What are the benefits of using Apache Iceberg with Snowflake over other lakehouse architectures like Delta Lake or Hudi? #
Apache Iceberg offers several advantages, including support for complex schema evolution, hidden partitioning, and ACID transactions without the need for proprietary formats. Unlike Delta Lake and Hudi, Iceberg is designed with broader compatibility for multiple query engines and cloud platforms. Snowflake’s integration with Iceberg allows organizations to utilize these advanced features while leveraging Snowflake’s high-performance compute and external storage capabilities.
Working with Apache Iceberg in Snowflake: Related reads #
- Apache Iceberg: All You Need to Know About This Open Table Format in 2025
- Apache Iceberg Data Catalog: What Are Your Options in 2025?
- Apache Iceberg Tables Data Governance: Here Are Your Options in 2025
- Apache Iceberg Alternatives: What Are Your Options for Lakehouse Architectures?
- Apache Parquet vs. Apache Iceberg: Understand Key Differences & Explore How They Work Together
- Apache Hudi vs. Apache Iceberg: 2025 Evaluation Guide on These Two Popular Open Table Formats
- Apache Iceberg vs. Delta Lake: A Practical Guide to Data Lakehouse Architecture
- Working with Apache Iceberg on Databricks: A Complete Guide for 2025
- Working with Apache Iceberg on AWS: A Complete Guide [2025]
- Working with Apache Iceberg and AWS Glue: A Complete Guide [2025]
- Polaris Catalog from Snowflake: Everything We Know So Far
- Polaris Catalog + Atlan: Better Together
- Snowflake Horizon for Data Governance
- What does Atlan crawl from Snowflake?
- Snowflake Cortex for AI & ML Analytics: Here’s Everything We Know So Far
- Snowflake Copilot: Here’s Everything We Know So Far About This AI-Powered Assistant
- How to Set Up Data Governance for Snowflake: A Step-by-Step Guide
- How to Set Up a Data Catalog for Snowflake: A Step-by-Step Guide
- Snowflake Data Catalog: What, Why & How to Evaluate
- AI Data Catalog: Exploring the Possibilities That Artificial Intelligence Brings to Your Metadata Applications & Data Interactions
- What Is a Data Catalog? & Do You Need One?
Share this article