Databricks Data Catalog: Native Capabilities, Benefits of Integration with Atlan, and More

Updated October 15th, 2024

Share this article

Unity Catalog is Databricks’ data catalog and a built-in governance solution for data and AI. It automatically creates a technical data catalog in INFORMATION_SCHEMA for every catalog.

Additionally, a central system catalog governs tables and views across the entire Databricks Workspace, housing metadata like auditing, table and column lineage, job history, and more. This metadata forms the foundation of a Databricks metadata layer, accessible to all Unity Catalog-enabled workspaces.
See How Atlan Simplifies Data Cataloging – Start Product Tour

This article provides an overview of the Databricks data catalog and explores how to set up a unified control plane for metadata across your entire data stack—not just Databricks.


Table of contents #

  1. Databricks data catalog: Native cataloging features
  2. Atlan as the unified metadata control plane for Databricks
  3. Bringing cataloging, discovery, and governance together
  4. How organizations making the most out of their data using Atlan
  5. Summary
  6. Related reads

Databricks data catalog: Native cataloging features #

Databricks manages technical and operational metadata separately within each Workspace. Unity Catalog stitches this metadata together into a singular metastore, storing all securable data objects such as catalogs, schemas, tables, views, and more.

Securable objects in Unity Catalog

Securable objects in Unity Catalog - Source: Databricks.

Databricks offers the following native features for search, discovery, and governance (via Unity Catalog):

  • Data access control: Grant and revoke access to any securable data object.
  • Data isolation: Control where and how your data is stored, in terms of cloud storage accounts or buckets.
  • Auditing: Create audit trails for all metastore actions, capturing query history across Unity Catalog-enabled workspaces.
  • Lineage: Query table and column-level lineage for different types of computes run in notebooks, dashboards, jobs, and queries.
  • Lakehouse federation: Manage query federation when external data sources are involved in data processing and queries, and maintain the data lineage across all sources uniformly.
  • Data sharing: Use the open-source Delta Sharing protocol to securely share data internally and externally, with full control over permissions.

While these features enable technical cataloging, they don’t provide a unified control plane for metadata that interacts with non-Databricks systems. That’s where Atlan can help, by creating a unified metadata control plane for your data stack. Let’s see how that works.


Atlan as the unified metadata control plane for Databricks #

As data tools and technologies increase, the data ecosystem becomes more complex, resulting in the evolution of plain-old information_schema metadata to actual big data. This big data is big not just because of the volume alone. It has more to do with the impact on the security, quality, discoverability, and governance of all things in your data ecosystem.

Atlan’s lakehouse approach to metadata ensures:

  • End-to-end data enablement of your data ecosystem
  • Automation workflows for both technical and operational workloads

Atlan builds on the native features of Databricks’ data catalog, i.e., Unity Catalog, offering an intuitive interface for all your data needs, such as:

  • Finding, previewing, and accessing data assets
  • Controlling data permissions
  • Classifying data assets
  • Sharing data assets internally or outside your organization in a secure manner

A Databricks + Atlan setup can help you lay the foundation for AI-ready data ecosystems. Let’s explore the specifics in the next section.


Bringing cataloging, discovery, and governance together #

Built on top of the metadata crawled from the Databricks data catalog, Atlan provides the following capabilities:

The above capabilities offer a glimpse of how Atlan can enhance your existing Databricks setup. There are other use cases centered around automation, business glossary, metadata activation, personalization, and more.


How organizations making the most out of their data using Atlan #

Forrester Wave™ recently published a report, ranking Atlan as a leader in enterprise data catalogs for its ability to:

  • Automatically catalog the entire technology, data, and AI ecosystem
  • Enable the data ecosystem AI and automation first
  • Prioritize data democratization and self-service

Atlan has helped numerous enterprises unlock the full potential of their data and leverage it for their AI use cases. Let’s look at one such instance.

Yape, the payment app built by Peru’s largest bank, uses Databricks on Azure as the core data platform. Yape wanted an easy-to-use data cataloging tool that made data accessible to everyone in the team, instead of a small number of people in the engineering team.

They chose Atlan, as it “[had] the best UI in the market right now.”


Summary #

As data ecosystems become more diverse, the need for a unified control plane grows. Such a control plane becomes the one-stop go-to place for all of your organization’s data needs, irrespective of the role of the data consumer.

Atlan provides this control plane, managing governance and compliance across all systems, not just Databricks. With Atlan, your organization is better equipped to succeed in data and AI use cases.



Share this article

[Website env: production]