Unity Catalog: An overview
Permalink to “Unity Catalog: An overview”When working with Databricks, Unity Catalog is an important tool that allows you to manage access from one place, explore lineage, and search and discover data in your Databricks account. While it is possible to import data from other sources, the end-to-end data governance ends at the Databricks boundary. Other features, such as lineage and metadata enrichment, also face similar limitations.
Understanding the five primary limitations of the Unity Catalog
Permalink to “Understanding the five primary limitations of the Unity Catalog”Let’s take a look at how Unity Catalog’s limitations affect various data workflows in an organization and why they exist.
1. End-to-end governance is limited to the Databricks boundary
Permalink to “1. End-to-end governance is limited to the Databricks boundary”Unity Catalog is designed to have authority within a Databricks account and the Databricks runtime. It doesn’t control the data itself. When data flows to other connected systems —visualization tools, external query or compute engines, and orchestrators—most of the governance-related features don’t work.
There are exceptions to this where you can use some of the deep Unity Catalog integrations to synchronize governance metadata with external tools, but these are few and far between. There are also Unity Catalog’s Open APIs, but for them too, the enforcement still rests with the Databricks runtime.
For organizations running hybrid or multi-platform data stacks, this boundary creates a persistent governance blind spot.
2. External data lineage capabilities and limitations
Permalink to “2. External data lineage capabilities and limitations”The same boundary restriction applies to lineage metadata; however, solving the lineage limitation is slightly easier because it has more integration paths to bridge the Databricks boundary. There are many tools, such as dbt, that have deep integrations with Unity Catalog, while others like Airflow and Fivetran can push lineage into Unity Catalog via the API.
That said, there are clear limitations in what you can do with the lineage you push to Databricks. The depth and automation don’t compare with native lineage, and some additional features, such as editing lineage metadata and syncing metadata with external systems, aren’t supported.
For tools like Power BI, Tableau, and Salesforce, you can register them as lineage sources via the External Lineage API or Catalog Explorer, but this requires manual handling. Even with the deeper integrations, granular end-to-end lineage across your data stack remains a challenge.
3. Metadata enrichment has improved over time, but is still mostly limited to Databricks
Permalink to “3. Metadata enrichment has improved over time, but is still mostly limited to Databricks”Features like governed tags, Unity Catalog metrics, automated data classification, and AI-driven documentation help with governance and metadata enrichment for data catalogs. That said, many of these features are still in the early stages of release. While they’re great additions, these features are limited to the Databricks ecosystem, all within Unity Catalog. The same limitations apply to discovery and lineage, too.
Some of these limitations, especially related to querying data through other systems, are mitigated by Lakehouse Federation, which has its own quirks and limitations.
4. Delta-centric design creates friction for diverse data ecosystems
Permalink to “4. Delta-centric design creates friction for diverse data ecosystems”Early on, Databricks created a table format called Delta Lake for the object storage layer. Databricks’ subsequent features were designed specifically around this format, but outside of a few tools, Delta Lake wasn’t widely adopted. Newer table formats, such as Iceberg, got much wider adoption.
As many key Databricks features rely on Delta, this dependency doesn’t translate well when working with external systems. The governance features don’t work quite the same way; partition management is also quite different. Iceberg support for managed tables is now in Public Preview, which is narrowing the gap, but it’s still catching up to Delta’s native depth.
5. Key security features need specific Databricks runtimes
Permalink to “5. Key security features need specific Databricks runtimes”The availability of security features depends on many factors, such as the type of Databricks Object or the Databricks Workspace you’re working with. Some of these issues relate to high-level governance, some to identity, and some others to access control.
Unity Catalog features like Row-level Security (RLS) and column masking policies combined with Attribute-based Access Control (ABAC) need DBR 16.4 or higher if you’re using dedicated compute, or serverless compute.
How can you solve Unity Catalog’s limitations?
Permalink to “How can you solve Unity Catalog’s limitations?”Some of the Unity Catalog limitations mentioned in the previous section, such as metadata enrichment and runtime requirements, are already handled by Databricks, so they will be resolved over time.
But there are others that are deeper and more structural, and won’t be fixed in future releases. Broadly speaking, these limitations relate to the scope and the surface area of governance, lineage, and discovery.
- Fragmented governance with a strict Databricks boundary results partly because the enforcement is tightly coupled with the compute layer of Databricks, and not the data itself. This means that when data flows to other tools, such as BI tools and orchestration engines, Unity Catalog-specific enforcements for access control, data masking, etc., don’t translate well.
- Siloed and fragmented lineage across regions and other tools in the data stack, as lineage is scoped at the metastore level and doesn’t seamlessly synchronize with other platforms. While there are some deep integrations for lineage, most others require manual registration and don’t match the depth of Databricks’ native lineage.
- Lack of context propagation and synchronization between Unity Catalog and other tools leads to a broken business language and mixed signals for data quality, trust, and security when constructs like domains, tags, and certifications don’t translate.
All of these issues stem from the lack of a unified control plane that can, first, consolidate all metadata from every source in your organization and, second, integrate lineage metadata from any data source your organization works with.
Atlan is one such platform that helps bring all metadata, whether it’s structural, governance-related, or lineage metadata, into a single place. Let’s look at how Atlan helps mitigate Unity Catalog’s limitations.
How can Atlan’s metadata control and context plane help?
Permalink to “How can Atlan’s metadata control and context plane help?”While Databricks’ native solution lays a solid foundation for governance, lineage, and discovery, it lacks the flexibility to integrate deeply with other tools in the data landscape, thereby preventing a consistent framework across multiple areas.
Atlan solves this as a sovereign context layer: an open, interoperable, and enterprise-governed metadata control plane that the organization owns and controls, independent of any single platform.
Atlan uses a standardized metadata schema, and a broad range of connectors to ingest metadata from wherever you need it. Rather than just using Unity Catalog, you bring all data, including that from Unity Catalog, and manage it all from Atlan’s metadata control plane.
Built in close partnership with Databricks, Atlan layers on top of Unity Catalog to extend its value across your broader data ecosystem.
Key capabilities include:
- A mature ecosystem of connectors supporting a wide variety of data tools.
- Domain-based organization, persona-based search and discovery.
- End-to-end granular data lineage with the ability to edit and synchronize it with other systems.
- Two-way synchronization and propagation of attributes like tags, policies, and classifications.
- Automated data quality monitoring leading to better and live trust signals.
- A business glossary that works across all the tools in the data stack of your organization.
- Governance both from or using AI and also for AI assets within your organization.
The control plane, combined with the wealth of information in Atlan’s metadata lakehouse, is what enables an organization to get the most out of its metadata. This applies not only to lineage, but also to data quality, governance, security, and observability — precisely the areas where Unity Catalog’s design creates friction for organizations using multiple data systems and platforms.
Real stories from real customers using Atlan with Databricks
Permalink to “Real stories from real customers using Atlan with Databricks”Activating Databricks metadata with Atlan’s unified context layer
Permalink to “Activating Databricks metadata with Atlan’s unified context layer”Activating Databricks metadata with Atlan's unified context layer
"More than Databricks, we needed a platform for innovation to stay ahead of our competitors. We might know what we need right now, but if the market is moving in a new direction, with AI and ChatGPT, for example, we need to have an answer for that, and the opportunity to try these tools in our data catalog. That's what I really liked about Atlan."
Jorge Plasencia, Data Catalog & Data Observability Platform Lead
Yape
Listen to the podcast: Why Yape chose Atlan to govern Databricks
53% less engineering workload and 20% higher data-user satisfaction
Permalink to “53% less engineering workload and 20% higher data-user satisfaction”53% less engineering workload and 20% higher data-user satisfaction
"Kiwi.com has transformed its data governance by consolidating thousands of data assets into 58 discoverable data products using Atlan. 'Atlan reduced our central engineering workload by 53% and improved data user satisfaction by 20%,' Kiwi.com shared. Atlan's intuitive interface streamlines access to essential information like ownership, contracts, and data quality issues, driving efficient governance across teams."
Data Team
Kiwi.com
Listen to the podcast: How Kiwi.com Unified Its Stack with Atlan
Moving forward with a sovereign context layer for your data and AI ecosystem
Permalink to “Moving forward with a sovereign context layer for your data and AI ecosystem”Unity Catalog is a solid built-in governance, lineage, and cataloging tool for Databricks-heavy environments. While it has some great features, it lacks multi-platform support, which results in siloing and fragmentation — the very thing this type of tool is supposed to solve. Some of the other limitations are due to product maturity and will be mitigated by future releases of Unity Catalog.
A unified metadata control plane that effectively aggregates and activates metadata across your data stack is what you need to mitigate these issues. Such a control plane serves as the foundation for many higher-order features, such as lineage impact analysis, tag and classification propagation, and organization-wide search and discovery. With its current limitations, Unity Catalog can’t address these challenges on its own.
Atlan’s sovereign context layer fills that gap. Atlan aggregates metadata from Unity Catalog and every other tool in your stack into a single, enterprise-governed plane that activates governance, lineage, and discovery at scale.
See how a unified context plane extends your Databricks stack.
Book a personalized demoFAQs about Unity Catalog limitations
Permalink to “FAQs about Unity Catalog limitations”1. What are the key limitations of Unity Catalog?
Permalink to “1. What are the key limitations of Unity Catalog?”The key limitations are related to support for and integration with other tools in the data stack. For governance, lineage, and discovery, the biggest limitation is clearly the inability to fully synchronize this metadata with other tools.
2. Does Unity Catalog work with external tools?
Permalink to “2. Does Unity Catalog work with external tools?”Yes, but in a limited capacity. Databricks’ Open APIs and Lakehouse Federation help it connect to various tools, but the governance, lineage, and discovery functionality is quite limited and still bound by the Databricks perimeter.
3. Does Unity Catalog support data lineage across platforms?
Permalink to “3. Does Unity Catalog support data lineage across platforms?”Yes, but to a limited extent. You can set up external lineage ingestion manually using Unity Catalog or the External Lineage API, but coverage is quite limited and typically doesn’t go down to the granular level.
4. Can Unity Catalog govern data outside of Databricks?
Permalink to “4. Can Unity Catalog govern data outside of Databricks?”While some synchronization capabilities exist, Unity Catalog currently cannot govern or enforce policies outside Databricks. All policies are enforced at the Databricks compute layer.
5. What Databricks Runtime version is needed for ABAC?
Permalink to “5. What Databricks Runtime version is needed for ABAC?”Attribute-based Access Control (ABAC) with granular row-level security (RLS) and column masking requires Databricks Runtime 16.4 or higher on dedicated compute. You can use serverless compute, and this limitation goes away. Also, remember that older runtimes can’t access tables secured by ABAC policies.
Share this article
