Databricks Data Discovery: What to Expect & How to Maximize Its Value

Updated December 04th, 2024

Share this article

Databricks enhances its data cataloging and governance capabilities with Unity Catalog, which also serves as the backbone for its data discovery features. Building on an internal data dictionary and other tools, Databricks enables connections with various data sources and engineering tools, providing essential metadata and context for assets. These become the foundation for search and discovery within the Databricks ecosystem.
See How Atlan Simplifies Data Cataloging – Start Product Tour

This article explores Databricks’ data discovery features, interfaces, and integrations, while introducing the concept of a control plane for data. A control plane broadens and simplifies the discovery experience, especially when your organization relies on a diverse set of tools beyond Databricks.


Table of contents #

  1. Databricks data discovery: Native features
  2. How to extend seamless data discovery to your entire data ecosystem
  3. Enhancing Databricks data discovery with Atlan
  4. Summing up
  5. Databricks data discovery: Related reads


Databricks data discovery: Native features #

Databricks offers multiple interfaces for searching and discovering data assets. You can interact with metadata using SQL queries, REST APIs, or the Catalog Explorer UI, the latter being particularly popular among business users for its accessibility and ease of use.

Here’s a closer look at each of the interfaces of Databricks data discovery:

  • AI-based insights, summary, and search: Databricks lets you add context to your data and AI assets through comments. This includes AI-generated summaries based on metadata such as table structures, column names, and data types. These insights help users better understand their data assets and improve discoverability.
  • Keyword search: This is an advanced search interface that leverages DatabricksIQ’s capabilities for navigational, semantic, and text-based searches. It also lets you search data assets based on object types, popularity, tags, and knowledge cards.
  • Catalog exploration using the UI: Unity Catalog’s Catalog Explorer interface allows you to search data assets, while also enabling cataloging and data sharing. From the catalog, you can visualize relationships using lineage graphs or the Entity Relationship Diagram (ERD) representations. This makes it easier to navigate and understand data asset connections.
  • Programmatic metadata exploration: Databricks supports direct interaction with data asset storage locations via SQL commands, magic fs commands (for automating volumes and DBFS objects), and the dbutils library. These programmatic tools allow you to search workspace-level objects and automate metadata exploration tasks.

Databricks data discovery with an updated Catalog Explorer #


A few months ago, Databricks shared major updates to the Catalog Explorer, aiming to simplify search, discovery, and governance processes.

The Catalog Explorer serves as a single pane of glass for your Unity Catalog discovery and governance journey, where you can find and manage all your data and AI assets.” - Databricks Platform Blog

Here are a few highlights from that update:

  • Quick access experience: You can now mark your most used data assets under the Favorites tab, while your most recent usages are automatically updated in the Recents tab.
  • Streamlined navigation: The interface has been revamped, with the Delta Sharing, Clean Rooms, and External Data tabs relocated from the sidebar to the top panel for a more intuitive layout.
  • Asset overview: Databricks has added a brand new page to provide a crisp summary of each data asset in the catalog. All the essential metadata and AI-generated descriptions and comments can now be found on this page. It also contains information about ownership, data format, popularity, and tags, among other things.
  • Lineage retention: Lineage metadata retention has been extended from 90 days to one year. As such, you can trace data history over a much longer period for better auditability and insights.
  • Entity relationship diagrams (ERDs): Visualizing relationships between data assets is now easier with ERDs, helping you understand dependencies and connections. While primary and foreign key constraints aren’t strictly enforced in Databricks, defining them enhances discovery and relationship mapping activities.

How to extend seamless data discovery to your entire data ecosystem #

While the above updates improve search and discovery within Databricks, most organizations use multiple tools for data analysis, visualization, and governance. Managing data assets effectively across such diverse ecosystems requires a unified control plane that goes beyond the capabilities of Unity Catalog.

A singular unified control plane for data would let you:

  • Search and discover data across your organization, irrespective of the tools in use
  • Leverage all of the native capabilities of the connected platforms and tools, including Databricks
  • Tap into value-add features, such as advanced lineage, policy enforcement, and metadata synchronization across tools, enhancing overall governance and usability,

Atlan acts as this unified control plane, integrating deeply with Databricks and other tools in your data ecosystem. It ensures seamless management of data assets across platforms, enhancing discovery, governance, and collaboration for modern data teams. Let’s explore how.


Enhancing Databricks data discovery with Atlan #

Atlan’s integration with Databricks Unity Catalog extends its native capabilities, offering a more comprehensive approach to data cataloging, governance, access management, lineage, quality, and discovery. Here’s how Atlan enhances the Databricks data discovery experience:

  • Advanced natural language search: Atlan’s Elastic-based full-text search engine is powered by Elastic’s query DSL, which provides greater flexibility in searching, sorting, and filtering data assets.
  • Higher degree filtering: To narrow the search surface area for more accurate results, Atlan lets you add several filters, such as asset types, domains, owners, tags, properties, among others. It also enables you to create filters on custom metadata.
  • Discover based on trust signals: Atlan lets you build trust in data assets by allowing you to certify assets fit to be used. You can have four certification levels: Verified, Draft, Deprecated, and No certificate.
  • Personalization & curation: With the ability to define Personas and Purposes, along with the ability to save searches and filters, you can add your unique take on your search and discovery work.
  • Business glossary: As part of the Asset Profile, Atlan allows you to create a README file for every data asset in the catalog. This file holds all the important context about the data asset, including text, diagrams, and embeds, among other things.
  • Custom metadata creation: Atlan also lets you create custom metadata fields that fit your organization’s domain, processes, and workflows better. Custom metadata is especially useful when dealing with thousands of data assets across hundreds of data sources.
  • 360° visibility for every asset: With all the metadata and context that Atlan holds and the information it gathers from a tool like Databricks, it provides full visibility into the data asset’s purpose, journey, constraints, and usage directions.

These features, combined with the native features of Databricks, make for a more enhanced data discovery experience. This integration helps organizations save time and focus on higher-value tasks.

Numerous organizations like Montreal Analytics, Contentful, carVertical, and Porto, have significantly improved their data discovery processes with Atlan.


Summing up #

Data and business teams often face challenges finding and trusting data assets for impactful analysis or reporting. This highlights the importance of a unified control plane for data—one that facilitates consistent, reliable discovery and interaction with data assets.

This article explored how Databricks and Atlan’s data discovery features complement each other to create a holistic, intuitive, and comprehensive discovery experience.

For more on connecting Databricks with Atlan and enhancing your organization’s governance, cataloging, lineage, and quality, you can refer to Atlan’s official documentation about connecting Databricks and Atlan.



Share this article

[Website env: production]