Personalized Data Discovery for Snowflake Data Assets

September 14th, 2022

header image for Personalized Data Discovery for Snowflake Data Assets

In its 2020 article titled Reducing data costs without jeopardizing growth, McKinsey identified that data users can spend between 30 and 40 percent of their time searching for data if a clear inventory of available data is not available.

When it comes to the discovery of Snowflake data assets — the challenge persists. Snowflake has more than 6,800 customers, including 510 of the Forbes Global 2000, and continues to grow rapidly. Snowflake’s architecture empowers them with non-disruptive scaling to virtually any capacity.

There’s immeasurable value in this massive wealth of diverse data sitting in Snowflake data warehouses. Data discovery is one of the foundational actions toward realizing that value.

What is data discovery?

Data discovery requires identifying interesting or relevant datasets that enable informed data analysis - as defined in this paper which studies the problem of discovering joinable datasets at scale.

The paper also mentions that the proliferation of large repositories of heterogenous data (e.g. Snowflake) and unprecedented web-scale volume of diverse data sources make manual data discovery an unfeasible task. What are the exact challenges? Let’s discuss.

Data Discovery in Snowflake: Challenges

Some common challenges faced by data teams in the process of discovery of Snowflake data assets:

  • The deluge of requests to data teams
  • Not enough context to trust the data
  • Scaling & surfacing tribal knowledge around data
  • Visibility into data usage
  • Looking for verified documentation

The deluge of requests to data teams

Engineers, analysts, scientists, marketers, executives - all types of data users want to dip their toes in data. But not everyone can write a SQL query. So what does that result in? A deluge of requests to the data team with questions about the availability of data and understanding of lineage etc. This becomes a blocker as teams scale.

Not enough context to trust the data

Even if a person is able to reach a data asset that looks relevant to them, there are several gaps in understanding the nature of the data before a person can start using them. Questions like when was the data last used? Is it verified by a data engineer? Does it power a dashboard downstream? — are common during discovery. These again lead to more requests to data teams.

Scaling & surfacing tribal knowledge around data

It’s possible that important context around data is locked between data users working across different teams/domains in an organization. As more people start using data, getting all that knowledge together, getting it updated by everyone, and finding a way to surface it for consumption is a challenge.

Visibility into data usage

How is a particular data being used? Is it being used at all? What kind of queries are being run on a data asset — this lack of visibility around data usage can hinder the data discovery process.

Looking for verified documentation

Data assets contain metrics and terms related to business logic and processes. There needs to be a place where these are defined, shared, and constantly updated; so that a data user doesn’t have to toggle through tools to go looking for this information.

Data Discovery in Snowflake using Snowsight

Snowsight is Snowflake’s web interface that enables many critical Snowflake operations. It includes data discovery capabilities like surfacing contextual metadata for columns, contextual filtering of data assets, the ability to browse through schemas, etc.

However, all of these can only be enjoyed by users who are proficient in writing SQL queries. So that excludes the typical business user from the data discovery process.

Take a quick look at Snowsight and how it can be used to your advantage here.

Snowsight is the web interface to interact and work on Snowflake data

Snowsight is the web interface to interact and work on Snowflake data. Source: Snowflake Essentials: Getting Started with Big Data in the Cloud, Apress

Atlan + Snowflake: Personalized and curated data discovery experience

Atlan is an active metadata platform that serves as a collaborative workspace for diverse data users to discover, understand, trust and use data. Atlan allows users to discover relevant data assets along with tribal knowledge and business context.

Atlan automatically crawls Snowflake metadata, which enables you to search and discover any data across your Snowflake warehouse

Atlan automatically crawls Snowflake metadata, which enables you to search and discover any data across your Snowflake warehouse. Source: Atlan

Here are some ways in which Atlan supercharges the data discovery of Snowflake data assets:

Google-like search experience

Atlan’s search encompasses diverse data assets such as columns, databases, SQL queries, BI dashboards, and much more. Here are some power features of Atlan search:

  • Intelligent keyword recognition: Atlan search is capable of recognizing typos, singular/plural mix-ups, and other human errors. It returns results that are most relevant to the intent of the query.
  • Search from anywhere: There are multiple places from where you can start your data search as mentioned in detail in this documentation. You can also use Cmd/Ctrl + K to fire up the search bar anywhere in the product.
  • Search using context: You can control your search via various facets of your data (like connectors, owners, classification, etc).
  • Sorting by relevance, popularity, or asset name: Atlan also allows you to sort your results via relevance, popularity, and alphabetically — depending on what serves you best.
  • Amazon-like browsing experience: You can create filters in Atlan with any metadata property - general, business, or technical. Depending on your role in a data team you are served with a personalized result akin to an experience of shopping for stuff online.

With Atlan's Google-like search experience, discovering the right data in Snowflake is now faster and easier

With Atlan's Google-like search experience, discovering the right data in Snowflake is now faster and easier. Source: Atlan

Trust signals for data assets

The companion sidebar to every search result uncovers trust signals for every asset from table to term. Trust signals include (but are not restricted to):

Overview: A space that gives you the most relevant context about your assets

Column preview: Provision to preview columns within your table

Attached resources: Historical slack conversations, JIRA tickets, docs related to your assets

Atlan gives you a 360-degree profile of a data asset for maximum comprehension and ease of decision-making. The companion sidebar also helps you view the column-level lineage of any data asset generated via automated SQL parsing.

Atlan as a platform also includes components like an in-built business glossary, collaboration nudges within the platform, and more — that make the data discovery experience smooth and effortless for all levels and types of data users.

Atlan gives you a 360° view for every data asset on Snowflake

Atlan gives you a 360° view for every data asset on Snowflake. Source: Atlan

A demo of Snowflake data discovery with Atlan

Atlan + Snowflake: Getting started

If you are looking for a data discovery tool to leverage your Snowflake assets better, take Atlan for a spin.

Quoting Tarik Dwiek, Head of Technology Alliances at Snowflake,

Atlan's open API-based approach, pay as you go, model, & delightful user experience aligns well with Snowflake’s own ethos, and what customers are demanding from their tools.

Here are a few resources to quickly get you started with using Atlan with Snowflake:

