Understanding Databricks Unity Catalog: How to Unlock Its Full Potential
Share this article
Unity Catalog is Databricks’ built-in, centralized metadata layer designed to manage data access, security, and lineage. It also serves as the foundation for search and discovery within the platform.
In a recent move to promote open standards, Databricks has open-sourced Unity Catalog, with the open-source version now available on GitHub. Built to support interoperability, openness, and unified governance for data and AI, this project is currently in the sandbox stage and is hosted by the LF AI & Data Foundation.
See How Atlan Simplifies Data Cataloging – Start Product Tour
The open-source Unity Catalog is based on the OpenAPI spec and is compatible with Apache Hive and Apache Iceberg for data cataloging. Recently, support for Apache XTable has been added as well. Companies like AT&T and Rivian are already using Databricks Unity Catalog for search, discovery, and governance on their data platforms.
In this article, you’ll explore the key features of Databricks’ built-in Unity Catalog, along with a summary of how to set it up and tips on maximizing its value. Let’s dive in!
Table of contents #
- Unity Catalog: Overview
- How to set up Unity Catalog for a Databricks account
- Getting the most out of Databricks Unity Catalog with Atlan
- Unity Catalog + Atlan: How to integrate
- Metadata plane for data and AI-readiness
- Databricks Unity catalog: Related reads
Unity Catalog: Overview #
Databricks accounts have Workspaces, which are unified environments for working with Databricks assets for specific sets of users. Workspaces can be for business units, teams, individuals, etc.
The way Databricks is structured is that every workspace has its own user management, metastore, and compute. While you don’t need shared compute, it would be good to have shared user management and metastore for better access management, governance, search, and discovery.
Unity Catalog does exactly that. It brings user management and metastore for different Databricks workspaces while allowing those workspaces to have their own separate compute, as shown in the image below:
Unity Catalog is hierarchically arranged in a three-level namespace that consists of the metastore, the catalog, and the schema, as shown in the image below:
This three-level namespace departs from the usual native technical catalog that comes with database and data warehousing systems like MySQL, SQL Server, Oracle, etc. This arrangement of securable objects allows you to use Unity Catalog features like access control, data sharing, discovery, lineage, and logging, among other things.
Let’s now look at how to set up a Unity Catalog for your Databricks account.
How to set up Unity Catalog for a Databricks account #
You need to follow the below-mentioned steps to set up Unity Catalog for your Databricks account:
- Enable Unity Catalog : You need to enable Unity catalog for your account if it’s not already enabled, by default.
- Set up Workspace admin : Then you need to create a user in the admins workspace-local group; this user should be able to grant the account admin and metastore admin roles.
- Provision Databricks compute: Unity Catalog workloads need to comply with the access and security requirements. These are defined in the four access modes, of which only two (single-user and shared) support Unity Catalog.
- Grant permission to users: Next, you need to grant your users permission to create objects and access them in Unity Catalog catalogs and schemas.
- Create a catalog : Before you can use Unity Catalog, you need to create at least one catalog, as some Databricks workspaces won’t have catalogs created by default. Follow these best practices when creating a new catalog.
Once you have set up Unity Catalog, you can query all the securable objects, such as schemas, tables, views, etc., and use them for discovery, lineage, access control, and data sharing purposes.
Once Unity Catalog is set up, you can query all securable objects, such as schemas, tables, and views, for purposes like discovery, lineage, access control, and data sharing.
While Unity Catalog excels as a technical data catalog and helps you manage your Databricks environment more effectively, it doesn’t provide a unified control plane to manage metadata across your entire data stack from one place. This is where Atlan steps in, leveraging the information from Unity Catalog to create that missing unified control plane. Let’s explore how Atlan and Unity Catalog work together.
Getting the most out of Databricks Unity Catalog with Atlan #
The data control plane is a centralized place for managing all your data assets across your wider data ecosystem, not just Databricks. While it is a centralized place, the underlying data architecture and ways of working can follow any operating model.
Atlan integrates with Unity Catalog to offer such a control plane for your data with the following features:
Intuitive user experience that allows you easy access to data assets across your data ecosystem
- Organization-wide business glossary enabling a common business language, making it easy for everyone to understand the KPIs, metrics, and overall business goals better
- Domain-driven data product marketplace for self-contained teams, especially in decentralized organizations
- Governance and quality automation with automatic data classification, tag and lineage propagation, and data contract enforcement
- Embedded collaboration with trust-enabling features like verification, certification, and freshness flags.
Many companies using Databricks to power their core data needs have leveraged Atlan as the metadata foundation for all of their data stack.
General Motors: Building their Insight Factory using Unity Catalog + Atlan #
“We realized that AI & ML needed to be our competitive advantage and we knew that if that’s our vision, we couldn’t function like a traditional automotive company, we needed to become a software company … our data ecosystem is complex, so we needed to build GM’s Insight Factory right … [partnering with Databricks and other leading solutions], Atlan’s governance solution helps us with end-to-end lineage to understand our ecosystem.”
Brian and the team at GM are transforming their industry and already achieving significant business impact with time-to-insight down from 28 days to 3 hours and $330M added to their bottom line.
Unity Catalog + Atlan: How to integrate #
Making Atlan and Unity Catalog work together is quite easy. You can go through the following steps that are detailed on the Atlan + Databricks Connectivity page to complete the set up:
- Set up authentication between Atlan and Databricks depending on your cloud platform.
- Grant the BROWSE privilege to access an object’s metadata, the lineage graph,
information_schema, and the REST API, among other things.
- Grant Atlan the permissions to import tags from Databricks, reverse sync tags back to Databricks, and extract lineage and usage metadata from Databricks.
Once you’ve set up the connection, Atlan can crawl the metadata from Unity Catalog.
Metadata plane for data and AI-readiness #
Thanks to Atlan’s innovative design, architecture, and cataloging features, it was recognized as a market leader in the Forrester Wave report, surpassing other enterprise data cataloging tools.
This comparison was based on 24 different aspects of data cataloging, which can broadly be categorized under the following three themes:
- Automatic cataloging of the entire technology, data, and AI ecosystem
- Enabling the data ecosystem AI and automation first
- Prioritizing data democratization and self-service
With the rise of generative AI in recent years, organizations now have the ability to do more with both structured and unstructured data—provided that the data is searchable, discoverable, and trustworthy. A metadata plane like Atlan can help you prepare to tackle any data or AI-driven challenges your organization may face.
Curious how? Talk to us!
Databricks Unity catalog: Related reads #
- Databricks Lineage — Overview, Benefits, How to Set Up?
- Databricks Governance: What To Expect, Setup Guide, Tools
- Databricks Metadata Management — FAQs, Tools, Getting started
- Data Catalog: What It Is & How It Drives Business Value
- What Is a Metadata Catalog? - Basics & Use Cases
- Modern Data Catalog: What They Are, How They’ve Changed, Where They’re Going
- Open Source Data Catalog - List of 6 Popular Tools to Consider in 2024
- 5 Main Benefits of Data Catalog & Why Do You Need It?
- Enterprise Data Catalogs: Attributes, Capabilities, Use Cases & Business Value
- The Top 11 Data Catalog Use Cases with Examples
- 15 Essential Features of Data Catalogs To Look For in 2024
- Data Catalog vs. Data Warehouse: Differences, and How They Work Together?
- Snowflake Data Catalog: Importance, Benefits, Native Capabilities & Evaluation Guide
- Data Catalog vs. Data Lineage: Differences, Use Cases, and Evolution of Available Solutions
- Data Catalogs in 2024: Features, Business Value, Use Cases
- AI Data Catalog: Exploring the Possibilities That Artificial Intelligence Brings to Your Metadata Applications & Data Interactions
- Amundsen Data Catalog: Understanding Architecture, Features, Ways to Install & More
- Machine Learning Data Catalog: Evolution, Benefits, Business Impacts and Use Cases in 2024
- 7 Data Catalog Capabilities That Can Unlock Business Value for Modern Enterprises
- Data Catalog Architecture: Insights into Key Components, Integrations, and Open Source Examples
- Data Catalog Market: Current State and Top Trends in 2024
- Build vs. Buy Data Catalog: What Should Factor Into Your Decision Making?
- How to Set Up a Data Catalog for Snowflake? (2024 Guide)
- Data Catalog Pricing: Understanding What You’re Paying For
- Data Catalog Comparison: 6 Fundamental Factors to Consider
- Alation Data Catalog: Is it Right for Your Modern Business Needs?
- Collibra Data Catalog: Is It a Viable Option for Businesses Navigating the Evolving Data Landscape?
- Informatica Data Catalog Pricing: Estimate the Total Cost of Ownership
- Informatica Data Catalog Alternatives? 6 Reasons Why Top Data Teams Prefer Atlan
- Data Catalog Implementation Plan: 10 Steps to Follow, Common Roadblocks & Solutions
- Data Catalog Demo 101: What to Expect, Questions to Ask, and More
- Data Mesh Catalog: Manage Federated Domains, Curate Data Products, and Unlock Your Data Mesh
- Best Data Catalog: How to Find a Tool That Grows With Your Business
- How to Build a Data Catalog: An 8-Step Guide to Get You Started
- The Forrester Wave™: Enterprise Data Catalogs, Q3 2024 | Available Now
- How to Pick the Best Enterprise Data Catalog? Experts Recommend These 11 Key Criteria for Your Evaluation Checklist
- Collibra Pricing: Will It Deliver a Return on Investment?
- Data Lineage Tools: Critical Features, Use Cases & Innovations
- OpenMetadata vs. DataHub: Compare Architecture, Capabilities, Integrations & More
- Automated Data Catalog: What Is It and How Does It Simplify Metadata Management, Data Lineage, Governance, and More
- Data Mesh Setup and Implementation - An Ultimate Guide
- What is Active Metadata? Your 101 Guide
Share this article