Understanding Databricks Unity Catalog: How to Unlock Its Full Potential

Updated September 29th, 2024

Share this article

Unity Catalog is Databricks’ built-in, centralized metadata layer designed to manage data access, security, and lineage. It also serves as the foundation for search and discovery within the platform.

In a recent move to promote open standards, Databricks has open-sourced Unity Catalog, with the open-source version now available on GitHub. Built to support interoperability, openness, and unified governance for data and AI, this project is currently in the sandbox stage and is hosted by the LF AI & Data Foundation.
See How Atlan Simplifies Data Cataloging – Start Product Tour

The open-source Unity Catalog is based on the OpenAPI spec and is compatible with Apache Hive and Apache Iceberg for data cataloging. Recently, support for Apache XTable has been added as well. Companies like AT&T and Rivian are already using Databricks Unity Catalog for search, discovery, and governance on their data platforms.

In this article, you’ll explore the key features of Databricks’ built-in Unity Catalog, along with a summary of how to set it up and tips on maximizing its value. Let’s dive in!


Table of contents #

  1. Unity Catalog: Overview
  2. How to set up Unity Catalog for a Databricks account
  3. Getting the most out of Databricks Unity Catalog with Atlan
  4. Unity Catalog + Atlan: How to integrate
  5. Metadata plane for data and AI-readiness
  6. Databricks Unity catalog: Related reads

Unity Catalog: Overview #

Databricks accounts have Workspaces, which are unified environments for working with Databricks assets for specific sets of users. Workspaces can be for business units, teams, individuals, etc.

The way Databricks is structured is that every workspace has its own user management, metastore, and compute. While you don’t need shared compute, it would be good to have shared user management and metastore for better access management, governance, search, and discovery.

Unity Catalog does exactly that. It brings user management and metastore for different Databricks workspaces while allowing those workspaces to have their own separate compute, as shown in the image below:

Databricks with and without Unity Catalog

Databricks with and without Unity Catalog - Source: Databricks website.

Unity Catalog is hierarchically arranged in a three-level namespace that consists of the metastore, the catalog, and the schema, as shown in the image below:

Three-level namespace for a hierarchical arrangement of objects in the technical catalog

Three-level namespace for a hierarchical arrangement of objects in the technical catalog - Source: Databricks website.

This three-level namespace departs from the usual native technical catalog that comes with database and data warehousing systems like MySQL, SQL Server, Oracle, etc. This arrangement of securable objects allows you to use Unity Catalog features like access control, data sharing, discovery, lineage, and logging, among other things.

Let’s now look at how to set up a Unity Catalog for your Databricks account.


How to set up Unity Catalog for a Databricks account #

You need to follow the below-mentioned steps to set up Unity Catalog for your Databricks account:

  1. Enable Unity Catalog : You need to enable Unity catalog for your account if it’s not already enabled, by default.
  2. Set up Workspace admin : Then you need to create a user in the admins workspace-local group; this user should be able to grant the account admin and metastore admin roles.
  3. Provision Databricks compute: Unity Catalog workloads need to comply with the access and security requirements. These are defined in the four access modes, of which only two (single-user and shared) support Unity Catalog.
  4. Grant permission to users: Next, you need to grant your users permission to create objects and access them in Unity Catalog catalogs and schemas.
  5. Create a catalog : Before you can use Unity Catalog, you need to create at least one catalog, as some Databricks workspaces won’t have catalogs created by default. Follow these best practices when creating a new catalog.

Once you have set up Unity Catalog, you can query all the securable objects, such as schemas, tables, views, etc., and use them for discovery, lineage, access control, and data sharing purposes.

Once Unity Catalog is set up, you can query all securable objects, such as schemas, tables, and views, for purposes like discovery, lineage, access control, and data sharing.

While Unity Catalog excels as a technical data catalog and helps you manage your Databricks environment more effectively, it doesn’t provide a unified control plane to manage metadata across your entire data stack from one place. This is where Atlan steps in, leveraging the information from Unity Catalog to create that missing unified control plane. Let’s explore how Atlan and Unity Catalog work together.


Getting the most out of Databricks Unity Catalog with Atlan #

The data control plane is a centralized place for managing all your data assets across your wider data ecosystem, not just Databricks. While it is a centralized place, the underlying data architecture and ways of working can follow any operating model.

Atlan integrates with Unity Catalog to offer such a control plane for your data with the following features:

Intuitive user experience that allows you easy access to data assets across your data ecosystem

  1. Organization-wide business glossary enabling a common business language, making it easy for everyone to understand the KPIs, metrics, and overall business goals better
  2. Domain-driven data product marketplace for self-contained teams, especially in decentralized organizations
  3. Governance and quality automation with automatic data classification, tag and lineage propagation, and data contract enforcement
  4. Embedded collaboration with trust-enabling features like verification, certification, and freshness flags.

Many companies using Databricks to power their core data needs have leveraged Atlan as the metadata foundation for all of their data stack.

General Motors: Building their Insight Factory using Unity Catalog + Atlan #


Screenshot from keynote at the DataAISummit

Screenshot from keynote at the DataAISummit - Source: Brian Ames - Leading AI/ML from concept to Production, Head of the AI Center and Senior Manager for Transformation and Enablement at General Motors

“We realized that AI & ML needed to be our competitive advantage and we knew that if that’s our vision, we couldn’t function like a traditional automotive company, we needed to become a software company … our data ecosystem is complex, so we needed to build GM’s Insight Factory right … [partnering with Databricks and other leading solutions], Atlan’s governance solution helps us with end-to-end lineage to understand our ecosystem.”

Brian and the team at GM are transforming their industry and already achieving significant business impact with time-to-insight down from 28 days to 3 hours and $330M added to their bottom line.


Unity Catalog + Atlan: How to integrate #

Making Atlan and Unity Catalog work together is quite easy. You can go through the following steps that are detailed on the Atlan + Databricks Connectivity page to complete the set up:

  • Set up authentication between Atlan and Databricks depending on your cloud platform.
  • Grant the BROWSE privilege to access an object’s metadata, the lineage graph,

information_schema, and the REST API, among other things.

Once you’ve set up the connection, Atlan can crawl the metadata from Unity Catalog.


Metadata plane for data and AI-readiness #

Thanks to Atlan’s innovative design, architecture, and cataloging features, it was recognized as a market leader in the Forrester Wave report, surpassing other enterprise data cataloging tools.

This comparison was based on 24 different aspects of data cataloging, which can broadly be categorized under the following three themes:

  1. Automatic cataloging of the entire technology, data, and AI ecosystem
  2. Enabling the data ecosystem AI and automation first
  3. Prioritizing data democratization and self-service

With the rise of generative AI in recent years, organizations now have the ability to do more with both structured and unstructured data—provided that the data is searchable, discoverable, and trustworthy. A metadata plane like Atlan can help you prepare to tackle any data or AI-driven challenges your organization may face.

Curious how? Talk to us!



Share this article

[Website env: production]