What is Snowflake Cortex Training? Fine-Tuning LLMs in Snowflake

Emily Winks

Data Governance Expert

Updated:06/03/2026

Published:06/03/2026

15 min read

Watch Context Layer Live Get the Context Layer Ebook

Key takeaways

Cortex Training (preview, June 2026) enables full LLM fine-tuning on Snowflake data using managed GPU pools.
Supported models: Qwen and Mistral families. GPU pools deliver up to 2x more training runs per dollar (Snowflake claim).
Fine-tuning alone does not close the inference accuracy gap. Governed context at inference is still required.
Atlan's Enterprise Data Graph certifies training data quality. Context Lakehouse and MCP deliver governed inference context.

What is Snowflake Cortex Training?

Snowflake Cortex Training is a managed service announced at Snowflake Summit 2026 (June 2, 2026), currently in public preview, that lets enterprises fine-tune and train open-weight models on proprietary data entirely inside Snowflake. Using fully managed GPU compute pools and the open-source ArcticTraining framework, it delivers up to 2x more training runs for the same GPU budget and supports reinforcement learning on proprietary data, a capability not available in Cortex Fine-tuning.

Key facts

Status: Public preview (announced June 2, 2026; not GA as of June 3, 2026)
Supported models: Qwen family (quickstart: Qwen3-1.7B; production: up to 14B), Mistral family
Framework: ArcticTraining, Snowflake open-source declarative YAML-based LLM post-training framework
Infrastructure: Snowflake-managed GPU compute pools via ML Jobs; multi-tenant utilization
RL support: Yes, reinforcement learning on proprietary data (not available in Cortex Fine-tuning)

Is your data estate AI-agent ready?

Assess Your Readiness

Snowflake Cortex Training is a managed service, announced at Snowflake Summit 2026 and currently in public preview, that lets enterprises fine-tune and train open-weight models on proprietary data entirely inside Snowflake Cortex. Using fully managed GPU compute pools, it delivers up to 2x more training runs for the same GPU budget (Snowflake claim). Supported models include the Qwen and Mistral families.

Fact	Detail
What it is	Managed LLM fine-tuning and training service inside Snowflake
Models supported	Qwen family (quickstart: Qwen3-1.7B; production: up to 14B), Mistral family
Status	Public preview (announced June 2, 2026; not GA as of June 3, 2026)
Underlying framework	ArcticTraining, Snowflake open-source declarative YAML post-training framework
How it differs from Cortex Fine-tuning	Cortex Fine-tuning = PEFT/LoRA adapters only. Cortex Training = full fine-tuning + reinforcement learning on managed GPU compute pools
GPU infrastructure	Snowflake-managed GPU compute pools via ML Jobs; multi-tenant utilization

How Snowflake Cortex Training works

Snowflake Cortex Training is a managed service that extends Cortex AI to full model fine-tuning and reinforcement learning on proprietary data. Unlike Cortex Fine-tuning (which produces PEFT adapters), Cortex Training uses Snowflake-managed GPU compute pools and the ArcticTraining framework to let teams build domain-specific models, keeping training data inside Snowflake’s security perimeter. Fine-tuning is one approach; for the broader comparison see RAG alternatives.

Cortex Training was announced at Snowflake Summit 2026 (June 2, 2026) and is currently in public preview. Dwarak Rajagopal, VP AI Engineering at Snowflake, described it: “We are extending [Cortex Training] to actually train custom models in a safe, governed environment.” Training data reads directly from Snowflake tables via native connectors, with no data movement outside Snowflake’s security boundary.

ArcticTraining framework: ArcticTraining is Snowflake’s open-source, declarative YAML-based LLM post-training framework. It runs as the training engine inside ML Jobs on Snowflake-managed GPU compute pools. The quickstart uses Qwen3-1.7B; production workloads scale to 8B to 14B parameter models. ArcticTraining is available on GitHub and documented in the Snowflake ArcticTraining engineering blog.

Supported models: Open-weight Qwen and Mistral families. Open-weight framing is deliberate: teams can train on proprietary data and, unlike closed API models, the training output stays within Snowflake. Resolve AI, the named launch customer, made a multi-million-dollar, two-year commitment to build domain-specific RL models on proprietary data via Cortex Training.

For teams still deciding whether fine-tuning is the right approach, see Fine-Tuning vs. RAG: Which Approach Fits Your Use Case.

For context on what Cortex Sense, the runtime enrichment layer announced at the same Summit, adds at inference time, see the dedicated guide.

Preview notice: Cortex Training is in public preview as of June 2026. Capabilities described here reflect the Snowflake press release and official ArcticTraining engineering blog content. Check Snowflake documentation for the latest availability and feature details.

Is your training data certified and governed for AI use?

Assess Your Readiness

Cortex Training vs. Cortex Fine-tuning: what is different?

Cortex Fine-tuning (GA since February 2025) produces lightweight PEFT/LoRA adapters on a small set of pre-approved models via a single SQL call. Cortex Training (public preview 2026) is an entirely different infrastructure layer: full model training and reinforcement learning on Snowflake-managed GPU compute pools, with full hyperparameter control via ArcticTraining YAML config.

These are not versions of the same product. They solve different problems at different layers: Cortex Fine-tuning handles task adaptation and few-shot specialization on a small supported model set; Cortex Training handles domain-specific model building and RL training, requiring GPU infrastructure and full training runs.

Capability	Cortex Fine-tuning (GA, Feb 2025)	Cortex Training (Preview, June 2026)
Technique	PEFT / LoRA adapter only	Full fine-tuning + reinforcement learning
Models supported	~6 pre-approved Llama + Mistral models	Qwen family, Mistral family (open-weight)
Infrastructure	Serverless SQL function (FINETUNE call)	Managed GPU compute pools via ML Jobs
Hyperparameter control	Only `max_epochs` (1 to 10) configurable	Full control via ArcticTraining YAML config
Model ownership	PEFT adapter only; base model Snowflake-owned	Full model trained within Snowflake boundary
RL support	No	Yes, reinforcement learning on proprietary data
Training data format	Strict `prompt`/`completion` schema required	ArcticTraining YAML (format confirmed in docs)
Regions	4 regions only (AWS US West 2, US East 1, EU Frankfurt, Azure East US 2)	Not yet announced
GA status	Generally available	Public preview
Ideal use case	Task specialization, few-shot tuning of an existing model	Building a domain-specific model from scratch; RL on proprietary data

A documented practitioner frustration with Cortex Fine-tuning (from Medium, April 2026): the 4-region restriction causes errors for users in Singapore, Mumbai, and Ireland with no workaround; models can be removed without notice; there is no learning rate or batch size control; and base model deprecation can break fine-tuned adapters. Cortex Training is positioned to address the infrastructure limitations, but hyperparameter flexibility details remain to be confirmed in official documentation.

For the broader decision between these approaches, see Fine-Tuning vs. RAG: Which Approach Fits Your Use Case.

How does Snowflake Cortex Training work?

Cortex Training uses Snowflake’s ArcticTraining framework running on ML Jobs with Snowflake-managed GPU compute pools. Teams define training runs via a declarative YAML config specifying model, dataset (read directly from Snowflake tables), and training parameters. According to Snowflake, multi-tenant GPU utilization delivers up to 2x more training runs for the same GPU budget compared to dedicated GPU provisioning.

ArcticTraining YAML config

ArcticTraining is a declarative framework: teams specify model choice, data source (Snowflake table), training objective, and hyperparameters in a YAML file. No bespoke infrastructure management is required; Snowflake handles GPU provisioning. The quickstart model is Qwen3-1.7B for fast iteration; production models run in the 8B to 14B parameter range.

ML Jobs and GPU compute pools

ML Jobs is the execution environment inside Snowflake that runs ArcticTraining. GPU compute pools are fully managed: Snowflake allocates, monitors, and deallocates GPU capacity, with no external cloud GPU management required. The multi-tenant utilization model pools GPU capacity across training runs, which Snowflake states delivers up to 2x more training runs per GPU dollar versus single-tenant dedicated GPU provisioning. This is Snowflake’s claimed efficiency figure, not an independently verified benchmark.

Training run process

Training data resides in Snowflake tables; ArcticTraining reads via native connectors with no data movement outside the Snowflake security perimeter. The process: define a YAML config (model, dataset, training objective, hyperparameters); execute an ML Job that triggers ArcticTraining on the GPU compute pool; output is a trained model stored within Snowflake’s security boundary. Both fine-tuning (domain adaptation) and reinforcement learning on proprietary data are supported.

The models trained via Cortex Training are designed to power agentic products like Snowflake CoWork, Snowflake’s personal AI agent for knowledge workers. After training, those models still need governed, up-to-date context at inference time, which fine-tuning alone does not provide.

Snowflake Cortex Training vs. Databricks and other options

Cortex Training’s primary advantage is data residency: training runs inside Snowflake, so proprietary data never leaves the warehouse. Databricks offers more model choice and ML engineer control; SageMaker is better for AWS-native MLOps workflows; Azure AI Foundry suits Microsoft-aligned enterprises. The right choice depends on where your data lives and how much ML engineering flexibility your team needs.

	Snowflake Cortex Training	Databricks Mosaic AI	AWS SageMaker	Azure AI Foundry / Azure ML
Data residency	Training data stays inside Snowflake, no data movement	Data moves to Databricks environment	Data moves to AWS environment	Data moves to Azure environment
GPU infrastructure	Snowflake-managed, multi-tenant GPU pools	H100 GPU clusters, configurable	AWS Trainium / Inferentia chips + GPU	Azure GPU VMs, distributed training support
Model choice	Qwen and Mistral families (open-weight)	Full HuggingFace ecosystem (any model)	100+ models via JumpStart (Llama, Mistral, Falcon, etc.)	Hundreds of models (OpenAI, Meta, Mistral, Cohere, NVIDIA, HuggingFace)
Hyperparameter control	Full control via ArcticTraining YAML (preview, confirm in docs)	Full control; all hyperparameters exposed	Full control	Full control including distributed training config
Governance model	Snowflake-native; training data governed within Snowflake security boundary	Databricks Unity Catalog; separate governance stack	AWS IAM + SageMaker Model Registry	Microsoft Purview + Azure ML Model Registry
MLOps tooling	Snowflake ML Jobs; limited observability vs. ML-first platforms	MLflow automatic experiment logging, full weight ownership	SageMaker Pipelines, deep CI/CD, production MLOps	Azure ML Pipelines, AutoML, strong CI/CD integration
Best for	Snowflake-first organizations minimizing data movement and MLOps complexity	ML-engineering-first teams needing maximum model and tooling flexibility	AWS-native organizations with dedicated ML engineering and production workloads	Microsoft-aligned enterprises integrating with Azure OpenAI or Microsoft AI services

Cortex Training is not trying to beat Databricks at ML engineering flexibility. It is trying to make fine-tuning accessible for data-platform-first organizations that do not want to manage GPU infrastructure or move data outside Snowflake. For teams combining fine-tuning with advanced RAG techniques, the AI governance framework ensures the full pipeline is auditable.

After training, fine-tuned models still depend on runtime context. See what is context engineering for the inference-time layer that domain-specific models need. The context layer as AI memory is what bridges training-time knowledge and inference-time accuracy.

What Atlan’s context layer adds to Cortex Training

Fine-tuning a model on proprietary Snowflake data creates two gaps Cortex Training cannot close on its own: (1) the training data needs to be certified and governed before the job runs; (2) the trained model still needs governed, up-to-date context at inference time. Atlan’s context layer for AI addresses both: Enterprise Data Graph before training, Context Lakehouse via MCP after.

Stage 1: Enterprise Data Graph governs training data quality before the job

Cortex Training reads training data from Snowflake tables. The quality and certification of that data is what determines model accuracy. What the enterprise data graph delivers before training: certification workflows (is this dataset approved for model training, reviewed for PII and domain accuracy?), column-level lineage (where did this training data originate, is it free of contamination from low-quality upstream sources?), data quality for AI signals (is the training set complete, deduplicated, representative, and fresh?), and access governance (which datasets are permitted in the training corpus, scoped per user and per policy).

The EU AI Act (effective August 2026) mandates that training data be “relevant, sufficiently representative, and free of errors.” Atlan provides the audit trail that makes compliance demonstrable. Gartner reports that 60% of AI projects risk abandonment due to data quality. Training a model on ungoverned data produces a faster-running, cheaper-to-serve wrong answer.

The Context Engineering Studio validates the context store for AI semantic layer before it reaches the training job: bootstrap, test, and ship context as code.

Stage 2: Context Lakehouse and MCP delivers governed inference context after training

A domain-specific model trained on historical proprietary data does not know what changed last week in the data, which business definitions are canonical, which data products are certified vs. deprecated, or what access policies govern the query. This is the inference-context gap that fine-tuning alone cannot close.

Snowflake’s own internal evaluations show CoWork agents with inference context hit 83% accuracy vs. 47% without it, illustrating exactly how large this gap is. Atlan’s enterprise AI memory layer — Context Lakehouse plus MCP Server — delivers the Enterprise Data Graph at inference time: certified definitions, column-level lineage, freshness signals, and access policies to any Snowflake agent (CoWork, Snowflake CoCo, Cortex) querying the fine-tuned model.

The compound value: Cortex Training + Atlan = governed training data quality (Enterprise Data Graph) + governed inference context (Context Lakehouse + MCP). See context layer implementation for AI for the architecture walkthrough.

Snowflake named Atlan the 2025 Data Governance Partner of the Year. Native bidirectional tag synchronization between Atlan and Snowflake Horizon is live in production.

See how teams govern training data and inference context with Atlan

Watch Context Layer Live

Real stories: enterprise AI with governed training data

“We technically enriched 6,700 assets within a few minutes. What would take normally months literally, of course, within 20 minutes, that was actually really cool.” (Data governance lead at a Fortune 500 retailer, on Atlan’s Context Agents generating training-ready descriptions at scale)

“We’re excited to build the future of AI governance with Atlan. All of the work that we did to get to a shared language at Workday can be leveraged by AI via Atlan’s MCP server.” (Joe DosSantos, VP of Enterprise Data and Analytics, Workday)

Cortex Training in the broader Snowflake AI stack

Snowflake Cortex Training is a managed service for full LLM fine-tuning and reinforcement learning on proprietary Snowflake data, using ArcticTraining on GPU compute pools. Currently in public preview (announced June 2, 2026).
It is distinct from Cortex Fine-tuning (GA, Feb 2025): Cortex Fine-tuning produces PEFT/LoRA adapters; Cortex Training runs full training jobs on Snowflake-managed GPUs.
Supported models: Qwen and Mistral families. According to Snowflake, multi-tenant GPU utilization delivers up to 2x more training runs per GPU dollar.
The trained model still needs governed, up-to-date context at inference time. Fine-tuning alone does not close the inference accuracy gap; Snowflake’s own evals show 47% accuracy without inference context vs. 83% with it.
Atlan’s context layer for AI agents closes both stages: Enterprise Data Graph certifies training data before the job; Context Lakehouse + MCP server for Snowflake delivers governed inference context after training.

Book a demo

FAQs about Snowflake Cortex Training

1. What is the difference between Snowflake Cortex Training and Cortex Fine-tuning?

Cortex Fine-tuning (GA since February 2025) applies PEFT/LoRA adapters to a small set of pre-approved models via a single SQL function call, producing a lightweight task-specific adapter rather than a new model. Cortex Training (public preview, June 2026) is a separate, heavier infrastructure layer: full fine-tuning and reinforcement learning on Snowflake-managed GPU compute pools using the ArcticTraining framework. The two products target different use cases: task adaptation vs. domain-specific model building.

2. What models does Snowflake Cortex Training support?

As of the June 2026 public preview announcement, Cortex Training supports the Qwen and Mistral open-weight model families. The ArcticTraining quickstart uses Qwen3-1.7B; production workloads scale to models in the 8B to 14B parameter range. The full supported model list will be confirmed when official Cortex Training documentation publishes.

3. Is Snowflake Cortex Training generally available?

No. Cortex Training was announced at Snowflake Summit on June 2, 2026, and is currently in public preview. It is not generally available. Snowflake Summit announcements have historically preceded GA by six to twelve months. Check Snowflake’s release notes for current availability status.

4. Can I use reinforcement learning with Snowflake Cortex Training?

Yes. Reinforcement learning on proprietary data is a stated capability of Cortex Training, distinguishing it from Cortex Fine-tuning which supports PEFT/LoRA adaptation only. Resolve AI, the named launch customer, is building domain-specific RL models on proprietary data using Cortex Training under a multi-million-dollar, two-year commitment. Technical implementation details for RL workflows will be confirmed in Cortex Training’s official documentation.

5. What is ArcticTraining in Snowflake?

ArcticTraining is Snowflake’s open-source, declarative YAML-based LLM post-training framework. It serves as the training engine inside Cortex Training, running on Snowflake ML Jobs and GPU compute pools. Teams use ArcticTraining YAML config files to specify model, training dataset, objective, and hyperparameters; Snowflake handles all infrastructure provisioning. ArcticTraining is available on GitHub and documented in Snowflake’s engineering blog.

6. How does Snowflake Cortex Training handle training data?

Training data is read directly from Snowflake tables via native connectors, with no data movement outside Snowflake’s security perimeter. This is the core architectural difference from platforms like Databricks or SageMaker where data must be moved to the training environment. Training data format requirements for Cortex Training will be confirmed in official documentation.

7. How does Snowflake Cortex Training compare to Databricks for LLM fine-tuning?

Cortex Training’s primary advantage over Databricks Mosaic AI is data residency: training runs entirely inside Snowflake, so proprietary data never leaves the warehouse. Databricks offers the full HuggingFace ecosystem, H100 GPU clusters, complete hyperparameter control, MLflow experiment logging, and full model weight ownership, advantages for ML-engineering-first teams. Cortex Training is the better fit for organizations where Snowflake is the primary data platform and minimizing data movement is a priority.

Sources

Snowflake CoWork: Powers the Agentic Enterprise as the Personal Agent for Knowledge Workers. Snowflake Press Release. June 2, 2026.
Fine-Tune LLMs on Snowflake with ArcticTraining and ML Jobs. Snowflake Engineering Blog. 2026.
Snowflake Cortex Fine-tuning Documentation. Snowflake Docs. 2025.
ArcticTraining on GitHub. Snowflake. 2026.
How to Ensure LLM Training Data Quality. Atlan. 2026.
Fine-Tuning vs RAG: Which Approach Fits Your Use Case. Atlan. 2026.
SiliconANGLE: Why Custom Model Training Matters for Enterprise AI. SiliconANGLE. June 2, 2026.

Share this article

Atlan is the context layer for AI, the governed infrastructure that delivers enterprise knowledge to every model, every agent, and every team from a single source of truth.

Book a Demo Context Studio Live