Snowflake X GCP: Practical Guide For Deployment
Share this article
Snowflake is a data cloud platform that enables you to deploy different data architecture patterns, such as data warehouses and data lakes, with the help of various storage, processing, and consumption features. You can deploy Snowflake in your preferred cloud platform.
Google Cloud Platform offers over 100 cloud web services that help you build and scale all cloud applications. Google Cloud Platform is part of a more extensive cloud offering called Google Cloud, which also offers SaaS products like the G Suite, Google Apps, Android, Chrome OS, and various APIs to Google applications.
As of early 2020, Snowflake has been generally available on the Google Cloud Platform. This article will take you through how Snowflake can be deployed on the Google Cloud Platform and how various Snowflake and Google Cloud Platform services work together to help you achieve data engineering, data science, and AI/ML goals.
Throughout the article, we’ll go through the following themes:
- Deploying Snowflake on Google Cloud Platform
- Staging data in Google Cloud Storage
- Loading and unloading data in Google Cloud Storage
- Using Snowpipe with Google Cloud Storage events
- Using external Cloud Functions in Google Cloud Platform
- Using the Snowflake Business Critical Edition on Google Cloud Platform
Let’s get into it!
Modern data problems require modern solutions - Try Atlan, the data catalog of choice for forward-looking data teams! 👉 Book your demo today
Table of contents #
- Deploying Snowflake on Google Cloud Platform
- Staging data in Google Cloud Storage
- Loading and unloading data in Google Cloud Storage
- Using Snowpipe with Google Cloud Storage events
- Using external Cloud Functions in Google Cloud Platform
- Using the Snowflake Business Critical Edition on Google Cloud Platform
- Summary
- Related reads
Deploying Snowflake on Google Cloud Platform #
Snowflake is a SaaS platform that handles infrastructure provisioning, database maintenance, data storage, etc. As of October 2023, the Google Cloud Platform is only available in four regions - Iowa, North Virginia, London, and the Netherlands. Here’s what the Snowflake pricing page looks like for the Google Cloud London region:
Not all cloud platforms and regions support all security, compliance, and usage requirements. The same is valid for newly released features. Some regions get those features later than others. Another thing to consider is that the cost of storage, compute, and data movement differs significantly across some regions. Therefore, based on your requirements, you must look at everything above before deciding which region to deploy Snowflake.
Staging data in Google Cloud Storage #
Snowflake uses the concept of stages to load and unload data from and to other data systems. You can either use a Snowflake-managed internal stage to load data into a Snowflake table from a local file system, or you can use an external stage to load data from object-based storage too. The unloading process also involves the same steps but in reverse.
If you’re using the Google Cloud Platform, it is fair to assume that you might be storing data either temporarily or permanently in Google Cloud Storage buckets. You might even have an existing data lake in place from which you would want to move data into Snowflake for more structured consumption.
Snowflake uses stages to load and unload data from and to other data systems. You can either use a Snowflake-managed internal stage to load data into a Snowflake table from a local file system, or you can use an external stage to load data from object-based storage too. The unloading process also involves the same steps but in reverse.
If your data is in Google Cloud Storage buckets, you can create an external stage for your bucket to load data into Snowflake. First, create a storage integration with your Google Cloud Platform credentials. You can create a storage integration using the following SQL statement in Snowflake:
CREATE STORAGE INTEGRATION <integration_name>
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'GCS'
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/')
[ STORAGE_BLOCKED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/') ]
Using the storage integration you just created, you can create a stage by specifying the Google Cloud Storage bucket path and the format of the files you want to stage into Snowflake using the following SQL statement:
CREATE STAGE my_gcs_stage
URL = 'gcs://mybucket1/path1'
STORAGE_INTEGRATION = gcs_int
FILE_FORMAT = my_csv_format;
Once that’s done, you can load data from and unload data into Google Cloud Storage buckets. Let’s look at that in a bit more detail.
Loading and unloading data in Google Cloud Storage #
Once the Google Cloud Storage external stage is created, you can use the COPY INTO
command for bulk-loading the data into a table, as shown in the SQL statement below:
COPY INTO <table_name>
FROM @<stage_name>
PATTERN='.*address.*.csv';
The pattern would get all the address.*.csv
files, such as address.part001.csv
, address.part002.csv
, and so on, and load them in parallel into the specified Snowflake table.
You don’t necessarily have to create a separate stage for unloading data. Still, it’s considered best practice to create a unload
directory to unload the data from a table, as shown in the SQL statement below:
COPY INTO @<stage_name>/unload/ from address;
After this, you can run the GET
statement to unload the data from the stage to your local file system:
GET @mystage/unload/address.csv.gz file:///address.csv.gz
The Snowflake + Google Cloud Platform storage integration also allows you to run queries on an external stage without getting the data into Snowflake. To do this, you’ll need to create external tables in Snowflake, which collect table-level metadata and store that in Snowflake’s data catalog.
When you create external tables that refer to a Google Cloud Storage external stage, you can set that external table to refresh automatically using Google Cloud Pub/Sub events from your Google Cloud Storage bucket.
Using Snowpipe with Google Cloud Storage events #
Snowpipe is a managed micro-batching data loading service offered by Snowflake. Like bulk data loading from Google Cloud Storage, you can also load data in micro-batches using Snowpipe. The Snowpipe + Google Cloud Platform integration also rests upon the storage integration and external stage construct in Snowflake.
To trigger Snowpipe data loads, you can use Google Cloud Pub/Sub messages for Google Cloud Storage events. One additional step here is to create a notification integration using the following SQL statement:
CREATE NOTIFICATION INTEGRATION <notification_integration_name>
TYPE = QUEUE
NOTIFICATION_PROVIDER = gcp_pubsub
ENABLED = TRUE
GCP_PUBSUB_SUBSCRIPTION_NAME = '<gcp_subscription_id>';
Using the same notification integration, you need to create a pipe and then use the following SQL statement:
CREATE PIPE <pipe_name>
AUTO_INGEST = TRUE
INTEGRATION = '<notification_integration_name>'
AS
<copy_statement>;
Using external Cloud Functions in Google Cloud Platform #
In addition to external tables where you don’t have to move your data into Snowflake, you can also use external functions where you don’t have to move your business logic into Snowflake. An external function or a remote service gives you the option to offload certain aspects of your data platform to certain services in Google Cloud Platform, the most prominent of which is Cloud Functions, the serverless compute option in Google Cloud Platform.
To integrate Snowflake with Cloud Functions, you have to create an API integration, as shown in the SQL statement below:
CREATE OR REPLACE API INTEGRATION <api_integration_name>
API_PROVIDER = google_api_gateway
GOOGLE_AUDIENCE = '<google_audience_claim>'
API_ALLOWED_FIXES = ('https://')
ENABLED = TRUE;
Snowflake uses Google Cloud API Gateway as a proxy service to send requests for Cloud Function invocation, which is why you have the API_PROVIDER
listed as google_api_gateway
.
Several use cases exist for external functions to enrich, clean, protect, and mask data. You can call other third-party APIs from these external functions or write your code within the external function. One typical example of external functions in Snowflake is the external tokenization of data masking, where you can use Cloud Functions as the tokenizer.
Using the Snowflake Business Critical Edition on Google Cloud Platform #
Snowflake’s Business Critical Edition has the highest data security and protection levels, especially concerning PII and PHI data. This is very helpful when your business needs to comply with regulations like HITRUST CSF and HIPAA.
You can deploy Business Critical Edition on the Google Cloud Platform and have Google Private Service Connect in place, but make sure that you are aware of the limitations.
Summary #
This article took you through deploying Snowflake on the Google Cloud Platform and using services, such as Google Cloud Storage, Cloud Functions, API Gateway, Private Service Connect, and Pub/Sub, to enhance some of the Snowflake-native features. The article also walked you through the different ways of offloading part of your workload to the Google Cloud Platform for integration, privacy, and security.
Running Snowflake and the rest of your data stack on Google Cloud is great because you can gather metadata using Snowflake’s data governance features and Google Cloud Data Catalog. This data can then flow into Atlan, an active metadata platform, enabling your business to realize the true potential of using metadata.
You can learn more about the Snowflake + Google Cloud partnership from Snowflake’s official website and the Snowflake + Atlan partnership here.
Snowflake on GCP: Related reads #
- Snowflake Data Mesh: Step-by-Step Setup Guide
- Snowflake Data Catalog: Importance, Benefits, Native Capabilities & Evaluation Guide
- Snowflake Data Governance: Features, Frameworks & Best practices
- Snowflake Metadata Management: Importance, Challenges, and Identifying The Right Platform
- Snowflake Data Lineage: A Step-by-Step How to Guide
- Snowflake Data Dictionary: Documentation for Your Database
Share this article