Snowflake Cost Optimization: Typical Expenses & Strategies to Handle Them Effectively

Updated September 27th, 2024

Share this article

Setting up a data warehouse or a data lake has become easier with SaaS and purpose-built PaaS platforms. While a SaaS data warehousing platform like Snowflake is great for your business, following the recommended architecture, design, and modeling best practices is crucial. Failing to do so can result in a cost spiral.

See How Atlan Simplifies Data Governance – Start Product Tour

This article will explore Snowflake costs, what causes them to spiral, and how to address cost optimization in Snowflake. We’ll also explore Snowflake’s native tooling and the requirement of using an external tool to manage costs.


Table of contents #

  1. Snowflake cost optimization: How does it work?
  2. Understanding Snowflake cost
  3. Typical causes for high Snowflake cost
  4. Snowflake-native cost optimization options
  5. How Atlan can help with Snowflake cost optimization
  6. Summary
  7. Related Reads

Snowflake cost optimization: How does it work? #

A small company on Reddit expressed frustration: “… how can Snowflake be comparable to our main AWS bill?” This is not uncommon.

To fix the cost-related issues with Snowflake, you should follow their recommended best practices and take a metadata-driven approach. You can use Snowflake’s metadata and tooling to identify common data modeling, engineering, movement, and management mistakes, which can add up to your Snowflake costs.

Alternatively, you can use external tools that connect with Snowflake and leverage the wealth of metadata stored in the ACCOUNT_USAGE schema.

Before exploring either solution, let’s dig deeper into the root cause — understanding Snowflake costs and typical reasons that cause it to grow exponentially.


Understanding Snowflake cost #

Understanding Snowflake’s architecture and core services is extremely important for estimating its cost. Snowflake’s Cost Management Framework helps you explore, optimize, monitor, attribute, and control costs.

Based on Snowflake’s architecture, costs fall into three categories — database storage, query processing, and cloud services. This results in Snowflake cost being split into the following categories:

  1. Compute
  2. Storage
  3. Data transfer

1. Compute #


Compute costs include expenses from virtual warehouses, serverless compute, and cloud services compute:

  • Virtual warehouses: Compute clusters where you run queries.
  • Serverless compute: Using serverless features and services, such as automatic clustering, hybrid tables, materialized views, query acceleration, and search optimization.
  • Cloud services compute: Coordinates and orchestrates Snowflake’s activities, including query compilation, caching, and optimization. You get charged if the daily consumption of cloud services exceeds 10% of the daily usage of virtual warehouses.

2. Storage #


Storage costs include fees incurred from using Snowflake’s native storage layer, which has a flat rate depending on the cloud platform and the region of your Snowflake deployment.

The cost is primarily for storing staged files and database tables. Additionally, there’s a fee for using advanced features like Time Travel, Fail-safe, and table clones.

3. Data transfer #


Lastly, Snowflake charges you for moving data out of your Snowflake account. An example could be when you unload data from Snowflake to an Amazon S3 bucket, a Google Cloud Storage container, or Microsoft Azure ADLS storage.

Replicating data and allowing external network access also incur costs in Snowflake.


Understanding these cost components lets you focus on the areas you want to optimize. Now, let’s look at the typical causes of high costs.

Typical causes for high Snowflake cost #

High Snowflake costs usually result from failing to follow Snowflake’s cost optimization guidelines and sloppy use of resources. The most common causes that inflate your Snowflake costs are:

  • Overprovisioning virtual warehouses
  • Ignoring architecture best practices and guidelines
  • Having database objects that are rarely used and unused
  • Using unnecessary serverless compute features and services

Let’s explore each issue further.

Overprovisioning virtual warehouses #


To understand your Snowflake compute needs, you should know your data volume, access patterns, workload types, data currency, and modeling requirements. This is central to knowing how much computing power you need to run queries without delays or interruptions.

Since you can scale your Snowflake warehouses up and down quickly, following their recommendations for initial warehouse sizes, and warehouse suspension and resumption can help reduce overprovisioning.

These guidelines will also help you decide whether to use or avoid multi-cluster warehouses.

Ignoring architecture best practices and guidelines #


Snowflake will allow you to query your data at scale regardless of whether or not you follow an established data modeling pattern like Data Vault, dimensional modeling, and 3NF. The only problem that could arise is the cost.

Snowflake minimizes the impact of poorly modeled data by optimizing data storage and consumption with its proprietary micropartitioning and data clustering scheme. However, you must still follow guidelines for table design, dynamic tables, Iceberg tables, and clustering keys.

Having database objects that are rarely used and unused #


Unused database objects can silently increase costs. To monitor these assets, you must have an ownership and cost attribution tagging policy for objects.

Snowflake provides granular metadata on object usage in the access_history table in the ACCOUNT_USAGE schema. It gives you a list of objects accessed by a given query. Analyzing this data can help you identify rarely used and unused database objects.

Typically, rarely used or unused database objects are seen in the form of tables, materialized views, short-lived permanent tables, temporary tables, and transient tables. Sometimes, you have whole schemas that are lying unused.

Removing these objects directly impacts the cost — it reduces the storage cost and the associated serverless compute cost for their maintenance and upkeep.

It also has an indirect impact on the cost — cleaning up your Snowflake environment, which allows analysts and business users to consume data with less confusion and errors.

Using unnecessary serverless compute features and services #


As mentioned earlier, Snowflake’s serverless computing is great but incurs additional costs. Understanding these features is important to optimizing Snowflake costs.

For instance, choosing the right table type for your use case is essential. If you create a permanent table for a one-time analysis job and then forget to delete it, it will automatically incur costs for fail-safe. Instead, for one-time analysis scenarios, consider using transient tables.


Next, let’s explore how to optimize Snowflake costs using native options.

Snowflake-native cost optimization options #

Snowflake offers several tools for monitoring and controlling costs:

Although these are great features, you must create custom roles and permission models for your analysts and data engineers to use these tools effectively.

The elevated access and the additional overhead of running all the queries accumulate over time — and are counterintuitive to having a self-service-enabled data platform. That’s where a data catalog like Atlan can help.


How Atlan can help with Snowflake cost optimization #

Atlan offers a solution for cost optimization at three levels, utilizing an end-to-end lineage graph built from Snowflake’s cost and usage metadata:

  • Level 1: An end-to-end lineage graph identifies unused and duplicate data assets, which you can then certify as deprecated, ready to be deleted or archived from your Snowflake account.
  • Level 2: Asset popularity metrics help identify low-usage assets, which can be certified as deprecated. Atlan uses four metrics to identify candidates for deprecation :
    • Number of queries using an asset
    • Number of users using an asset
    • datetime of the last query run using the asset
    • datetime of the last update that was made to an asset
  • Level 3: Atlan targets the most popular, most queried, and most expensive assets using asset popularity metrics, such as assets with the most number of queries, users, and Snowflake credits.

Atlan’s customers have found these methods very helpful in reducing Snowflake costs.

For example, a global virtual-workforce solutions provider cut £60,000 from their annual Snowflake compute costs, thanks to asset popularity metrics.

Atlan helped them identify that a single table from a Quicksight account was responsible for approximately ~350K queries in 30 days. The source table is refreshed only once a day in Snowflake, and yet Quicksight would send 11k queries to the table every day.

The solution? Caching at Quicksight once a day instead of sending thousands of queries to Snowflake.


Summary #

This article covered the numerous aspects of Snowflake costs, typical causes of high expenses, and methodologies for cost optimization. You also learned about using Snowflake’s native tools and external tools like Atlan to optimize Snowflake costs.

Atlan’s multi-layered cost optimization strategy uses end-to-end lineage and popularity metrics to identify and deprecate less frequently used and unused data assets, resulting in considerable cost savings.

Want to know more? Chat with our data strategy experts who’ve guided data teams at Nasdaq, Autodesk, and HelloFresh in their modern data governance journey.



Share this article

[Website env: production]