In its 2020 article titled Reducing data costs without jeopardizing growth, McKinsey identified that data users can spend between 30 and 40 percent of their time searching for data if a clear inventory of available data is not available.
When it comes to the discovery of Snowflake data assets — the challenge persists. Snowflake has more than 6,800 customers, including 510 of the Forbes Global 2000, and continues to grow rapidly. Snowflake’s architecture empowers them with non-disruptive scaling to virtually any capacity.
There’s immeasurable value in this massive wealth of diverse data sitting in Snowflake data warehouses. Data discovery is one of the foundational actions toward realizing that value.
What is data discovery?
Data discovery requires identifying interesting or relevant datasets that enable informed data analysis - as defined in this paper which studies the problem of discovering joinable datasets at scale.
The paper also mentions that the proliferation of large repositories of heterogenous data (e.g. Snowflake) and unprecedented web-scale volume of diverse data sources make manual data discovery an unfeasible task. What are the exact challenges? Let’s discuss.
[Download] → Forrester Wave™: Enterprise Data Catalog for DataOps, Q2 2022
Data Discovery in Snowflake: Challenges
Some common challenges faced by data teams in the process of discovery of Snowflake data assets:
- The deluge of requests to data teams
- Not enough context to trust the data
- Scaling & surfacing tribal knowledge around data
- Visibility into data usage
- Looking for verified documentation
The deluge of requests to data teams
Engineers, analysts, scientists, marketers, executives - all types of data users want to dip their toes in data. But not everyone can write a SQL query. So what does that result in? A deluge of requests to the data team with questions about the availability of data and understanding of lineage etc. This becomes a blocker as teams scale.
Not enough context to trust the data
Even if a person is able to reach a data asset that looks relevant to them, there are several gaps in understanding the nature of the data before a person can start using them. Questions like when was the data last used? Is it verified by a data engineer? Does it power a dashboard downstream? — are common during discovery. These again lead to more requests to data teams.
Scaling & surfacing tribal knowledge around data
It’s possible that important context around data is locked between data users working across different teams/domains in an organization. As more people start using data, getting all that knowledge together, getting it updated by everyone, and finding a way to surface it for consumption is a challenge.
Visibility into data usage
How is a particular data being used? Is it being used at all? What kind of queries are being run on a data asset — this lack of visibility around data usage can hinder the data discovery process.
Looking for verified documentation
Data assets contain metrics and terms related to business logic and processes. There needs to be a place where these are defined, shared, and constantly updated; so that a data user doesn’t have to toggle through tools to go looking for this information.
A Guide to Building a Business Case for a Data Catalog
Data Discovery in Snowflake using Snowsight
Snowsight is Snowflake’s web interface that enables many critical Snowflake operations. It includes data discovery capabilities like surfacing contextual metadata for columns, contextual filtering of data assets, the ability to browse through schemas, etc.
However, all of these can only be enjoyed by users who are proficient in writing SQL queries. So that excludes the typical business user from the data discovery process.
Take a quick look at Snowsight and how it can be used to your advantage here.
Atlan + Snowflake: Personalized and curated data discovery experience
Atlan is an active metadata platform that serves as a collaborative workspace for diverse data users to discover, understand, trust and use data. Atlan allows users to discover relevant data assets along with tribal knowledge and business context.
Here are some ways in which Atlan supercharges the data discovery of Snowflake data assets:
Google-like search experience
Atlan’s search encompasses diverse data assets such as columns, databases, SQL queries, BI dashboards, and much more. Here are some power features of Atlan search:
- Intelligent keyword recognition: Atlan search is capable of recognizing typos, singular/plural mix-ups, and other human errors. It returns results that are most relevant to the intent of the query.
- Search from anywhere: There are multiple places from where you can start your data search as mentioned in detail in this documentation. You can also use Cmd/Ctrl + K to fire up the search bar anywhere in the product.
- Search using context: You can control your search via various facets of your data (like connectors, owners, classification, etc).
- Sorting by relevance, popularity, or asset name: Atlan also allows you to sort your results via relevance, popularity, and alphabetically — depending on what serves you best.
- Amazon-like browsing experience: You can create filters in Atlan with any metadata property - general, business, or technical. Depending on your role in a data team you are served with a personalized result akin to an experience of shopping for stuff online.
Trust signals for data assets
The companion sidebar to every search result uncovers trust signals for every asset from table to term. Trust signals include (but are not restricted to):
Overview: A space that gives you the most relevant context about your assets
Column preview: Provision to preview columns within your table
Attached resources: Historical slack conversations, JIRA tickets, docs related to your assets
Atlan gives you a 360-degree profile of a data asset for maximum comprehension and ease of decision-making. The companion sidebar also helps you view the column-level lineage of any data asset generated via automated SQL parsing.
Atlan as a platform also includes components like an in-built business glossary, collaboration nudges within the platform, and more — that make the data discovery experience smooth and effortless for all levels and types of data users.
A demo of Snowflake data discovery with Atlan
Atlan + Snowflake: Getting started
If you are looking for a data discovery tool to leverage your Snowflake assets better, take Atlan for a spin.
Quoting Tarik Dwiek, Head of Technology Alliances at Snowflake,
Atlan's open API-based approach, pay as you go, model, & delightful user experience aligns well with Snowflake’s own ethos, and what customers are demanding from their tools.
Here are a few resources to quickly get you started with using Atlan with Snowflake:
Snowflake data discovery: Related reads
- Data catalog for Snowflake data warehouse
- How to manage data governance for Snowflake data warehouse
- How to manage metadata for Snowflake data assets
- Visualize data lineage for Snowflake data objects
- Automated data dictionary for Snowflake data warehouse
- Snowflake data access control made easy and scalable