Databricks Data Security: A Complete Guide for 2024
Share this article
A data platform like the Databricks Data Intelligence Platform, responsible for data storage, ingestion, transformation, governance, and consumption, must data security and integrity at every stage. A security lapse could lead to a data breach, which typically costs $4.88 million, according to the 2024 IBM Cost of a Data Breach Report.
See How Atlan Simplifies Data Governance – Start Product Tour
To support data security, Databricks offers features like masking, encryption, role-based access management, and fine-grained access control.
This article explores Databricks’ data security features and how an organization-wide control plane for data can enhance them.
Table of contents #
- Databricks data security features: An overview
- Enhancing your Databricks data security posture with Atlan
- Summary
- Databricks data security: Related reads
Databricks data security features: An overview #
Databricks takes a multi-layered approach to data security with several lines of defense to protect data. These lines of defense cover aspects of network security, infrastructure security, user management, fine-grained access control, encryption, and detailed logging and monitoring, among other things.
Like most cloud and data platforms, Databricks follows a shared responsibility model for securing data, the details of which are captured in cloud platform-specific (AWS, Azure, GCP) documents.
The following image from Databricks’ documentation provides an overview of the various security features at your disposal to handle data in both the compute and the control planes.
There are five core pillars of data security that the above Databricks features translate into:
- Unified security for data and AI
- Extensive encryption options
- Private networking
- Fully isolated serverless workloads
- Enhanced security and compliance
Let’s look into the specifics of each pillar.
1. Unified security for data and AI #
Unity Catalog is Databricks’ internal technical data catalog that powers data discovery, governance, lineage, and fine-grained access control. It tracks all assets within Databricks, capturing structural and data changes over time.
The metadata collected by Unity Catalog then becomes the foundation for any AI-led enrichment of data or writing of code to consume, process, and ship data in Databricks. For security purposes, all activity on any object is logged in system tables, where Unity Catalog is the main provider of the captured metadata.
Unity Catalog also acts as the foundation for external metadata tools like data catalogs and governance tools.
Also, read → Enterprise data catalogs 101
2. Extensive encryption options #
By default, Databricks encrypts all your data at rest using server-side encryption with keys managed by your cloud platform. Databricks also gives you the option of using Customer Managed Keys (CMKs) so that you can have greater control over your data.
It allows encryption for code, models, credentials, and intra-cluster data transfers.
3. Private networking #
Private networking is critical to avoid data exposure over the public internet. The settings and configurations of private networking might differ slightly from one cloud platform to another, but the end result is the same.
Whether it is AWS PrivateLink, Azure Private Link, or Google Cloud Private Service Connect, you can leverage Databricks Private Link to ensure end-to-end private networking, while protecting your company from data exfiltration.
To take this a step further, you can use Private Link with dedicated network connections with services like AWS DirectConnect, Azure ExpressRoute and Google Cloud Interconnect.
4. Fully isolated serverless workloads #
Serverless compute attracts additional scrutiny because it uses shared infrastructure. So, Databricks ensures resource isolation and network segmentation for security. Along with the isolation, all serverless workloads are also encrypted at rest and in transit to maintain a secure environment.
5. Enhanced security and compliance #
For advanced security and compliance requirements, Databricks provides an add-on supporting enhanced security, audit logs, and regulatory requirements. For instance, the add-on provides you with a hardened Ubuntu distro with enhanced CIS Level 1 compliance.
Furthermore, if your organization needs advanced logging and monitoring capabilities in terms of security, Databricks can deliver security event logs, audit logs, and enhanced security monitoring logs to the SIEM tools of your choice. All of these features help with rapid root cause analysis.
This add-on also lets you create a Compliance Security Profile, which supports testing your Databricks environment and workloads to comply with country and industry-specific data protection regulations (PCI-DSS, IRAP, FedRAMP, and HIPAA). For instance, this profile provides you with FIPS 140 Level 1 validated encryption modules ready to be used in your workloads.
Also, if you’ve deployed Databricks on AWS, you can use AWS Nitro VM enforcement to decrypt data at rest and in transit.
The above Databricks data security features provide a solid foundation for protecting your Databricks assets. However, a typical data platform is surrounded by and works with many other tools besides the core data platform, which, in this case, is Databricks. These tools handle data storage, movement, processing, and delivery.
Unity Catalog’s purview ends with Databricks (and might only extend to external storage accounts), but to handle everything else, you will need a control plane for data. A control plane can bridge the gap across other tools, providing a consistent framework for data security, access management, and discovery.
Let’s look at how such a control plane for data would enhance your data security and governance posture while giving you consistent experience managing it.
Enhancing your Databricks data security posture with Atlan #
Atlan’s well-documented security-first approach complements Databricks with a unified control plane for data cataloging, discovery, and governance.
The Atlan + Unity Catalog integration provides (but isn’t limited to):
- No caching, temporary, or permanent storage of data: Atlan allows you to preview data and run queries from the UI, but it never caches or stores any of the data. Only the metadata, i.e., table structure, roles, groups, etc., gets stored in a secure VPC and various backend databases.
- Advanced authentication & authorization: Atlan allows you to manage identity and authentication using SSO with SAML 2.0 and SCIM. It supports all popular identity providers, while also allowing you to use the IdP of your choice.
- Data access control: Atlan implements a fine-grained role-based access control model adhering to the principle of least privilege that denies access by default.
- Data access policies: Access to data assets in Atlan is controlled by defining three types of access policies: data, metadata, and glossary.
Also, read → Data governance policy enforcement 101
- Transparency Center: The Transparency Center is a centralized hub to monitor access, enforce policies, and manage data governance.
- Personas: Atlan enables you to define policies based on the operating model that you have in place at your organization. For instance, you can create team-based personas to curate and control access to your data assets.
- Purposes: Purposes can be used to define domains and enable tag-based data protection, especially for PII and PHI data.
- Infrastructure security provisions: Atlan has advanced networking and security controls with Grafana endpoint integration for real-time insights.
- Enterprise-grade encryption: Atlan applies encryption at rest and in transit. It encrypts moving data over HTTPS using TLS. It also encrypts any data stored in object-based storage in your cloud platform with their native server-side storage using the industry-standard AES-256 data encryption.
Key takeaway → Atlan integrates with Databricks and all the other tools in your data ecosystem to create a centralized control plane, unifying policy management and enhancing security across systems. Combined with Unity Catalog’s security features, Atlan gives you a more robust security posture than if you were to manage data security for all your tools separately.
Also, read → The unified control plane in action
Summary #
Databricks’ data security and Unity Catalog features cover essential security aspects, but when combined with Atlan’s control plane, the posture is enhanced significantly for a multi-tool ecosystem.
Atlan’s capabilities enhance Databricks’ data security, ensuring compliance and control across diverse data assets. For more details on how Atlan integrates with Databricks, explore the official documentation on the Atlan + Databricks integration.
Databricks data security: Related reads #
- Databricks Unity Catalog: A Comprehensive Guide to Features, Capabilities, Architecture
- Data Catalog for Databricks: How To Setup Guide
- Databricks Lineage: Why is it Important & How to Set it Up?
- Databricks Governance: What To Expect, Setup Guide, Tools
- Databricks Metadata Management: FAQs, Tools, Getting Started
- Data Catalog: What It Is & How It Drives Business Value
- Snowflake Cost Optimization: Typical Expenses & Strategies to Handle Them Effectively
- Databricks Cost Optimization: Top Challenges & Strategies
Share this article