Databricks Data Compliance: Enhancing Native Features With A Unified Control Plane For Data
Share this article
The Databricks data compliance framework has built-in features to meet industry-standard compliance and security requirements for data handling and processing. Databricks also provides a detailed due diligence package covering compliance and security features for AWS, Azure, and Google Cloud.
See How Atlan Simplifies Data Governance – Start Product Tour
Additionally, Databricks data compliance covers certifications and standards, such as GDPR, CCPA, FedRAMP, HITRUST, HIPAA, and IRAP, ensuring that your organization complies with relevant data protection, security, and privacy laws.
This article will look at key Databricks security and compliance features and explore how a unified control plane for data governance, like Atlan, can further enhance those capabilities.
Table of contents #
- Databricks data compliance: Native features
- Enhanced Databricks data compliance with Atlan
- Summary
- Databricks data compliance: Related reads
Databricks data compliance: Native features #
Databricks offers an Enhanced Security and Compliance add-on, which includes a compliance security profile for specific standards, such as HIPAA, IRAP, PCI-DSS, FedRAMP (High, Moderate), UK Cyber Essentials Plus, and CCCS Medium (Protected B).
Key features include:
- Enhanced security monitoring and automatic cluster updates to apply all the latest security-related patches to the hardened host OS image
- A CIS Level 1 hardened image for secure baseline configuration
- FIPS 140-validated encryption modules for data at rest and in transit
- Egress communications encryption with TLS 1.2 (or higher)
Databricks also provides extensive activity and query logs as system tables that you can query and investigate. These logs can also be configured to be delivered to a log monitoring tool or an object store in your cloud platform for advanced analysis.
Additional features, such as row filters, column masks, and time travel with Delta Lake table history, further support data protection and compliance.
Let’s see how.
GDPR compliance in Databricks #
Here are some of the key features in Databricks’ Delta Lake that can support GDPR compliance:
- Right to erasure (right to be forgotten): Deletes in columnar files can be quite slow, but Delta Lake optimizes deletes with Z-order indexing. The point-in-time recovery feature allows you to archive data for a specified period before deleting it completely.
- Transparent information, communication and modalities for the exercise of the rights of the data subject: Databricks supports sending notifications via HTTP webhooks and some popular business tools like Slack, Teams, Email, and PagerDuty. For any statistical change in data, you can use Databricks Alerts. Consumers can be notified of any GDPR-related activity using these notification and alerting features.
- Right to restriction of processing: Databricks uses pseudonymization or reverse tokenization to protect sensitive data. It also uses built-in and custom data classifiers to restrict certain data processing. Other features like row filters and column masks also help with on-the-fly data access and processing restrictions.
For example, a row filter on the employees table can be defined to show only relevant data to specific user groups:
CREATE FUNCTION dept_filter (department STRING)
RETURN IF(IS_ACCOUNT_GROUP_MEMBER('SALES_MANAGER'), true, department='sales');
ALTER TABLE employees SET ROW FILTER dept_filter ON (department);
You can do something similar using column masks.
These features support GDPR compliance and extend to other regulations, which you can review on the Databricks Compliance Security Profile page.
Also, read → Implementing the GDPR ‘Right to be Forgotten’ in Delta Lake
Auditing and monitoring for data compliance and security #
As mentioned earlier, Databricks offers extensive logging using system tables, which can optionally be delivered to a location like AWS S3 for further consumption and analytics. When external auditors require raw system access and query history data, this proves to be extremely beneficial.
Moreover, other measures can be put in place to ensure your data is protected at a system, infrastructure, and identity level:
- Private connectivity, IP access lists, and network firewall rules: You can define IP access lists for the Databricks access console and the workspaces separately. You can also implement domain-based traffic restrictions along with header validation. Lastly, you can configure private connectivity on all three cloud platforms.
- Granular details of who logged in and accessed what data: Databricks system tables like
system.access.audit
provide the details of the logged-in users and the objects they accessed. Meanwhile, other tables likesystem.query.history
offer a detailed view of queries that were executed using SQL Warehouses. Similar audit tables are available for jobs, clusters, nodes, etc.
These Databricks features set up a solid foundation for meeting your compliance requirements. However, modern data ecosystems often involve multiple tools, creating a need for a more centralized compliance management solution – essentially, a control plane of data. . This is where Atlan comes in.
Let’s see how an Altan + Databricks setup can help you meet your organization’s compliance requirements.
Enhanced Databricks data compliance with Atlan #
Much like Databricks, Atlan’s architecture is also based on a security-first approach – security and compliance are at the product’s core.
Atlan integrates with Databricks closely to provide a unified view of all the data assets in the system, along with a consistent method for using and managing them. Altan’s features cover some of the major compliance-related requirements with the most popular regulations, such as GDPR, CCPA, and HIPAA.
Atlan’s compliance features include (but aren’t limited to):
- Automatic PII tag propagation and tag-based access policies enablement
- Column-level access controls that work with Databricks’s data access control model
- Data access policies through column-level data lineage (captured natively) and also from Databricks
- Extensive logs for any activity with systems connected to Atlan
Atlan also offers industry-specific solutions, helping organizations proactively manage compliance requirements across sectors, from healthcare to hospitality.
Atlan also helps you set up security and compliance reporting to proactively manage compliance and avoid any last-minute surprises.
For transparency, Atlan regularly updates the latest statuses of all compliance standards and assessments.
The FAQs concerning compliance are also useful for anyone looking to connect Databricks and Atlan, especially for ensuring smooth operations while guaranteeing compliance. For a more detailed look into Atlan’s compliance and security reports, including certifications, pen tests, and security profiles, among other things, check out Atlan Trust Center.
Summary #
Databricks native features and certifications provide a solid foundation for data compliance. We’ve explored some of its native data compliance tools, along with the Enhanced Security and Compliance add-on.
Atlan enhances Databricks’ compliance capabilities by providing a unified control plane to manage governance across all data assets, strengthening compliance and data governance across the board.
For more details on how Atlan integrates with Databricks, explore the official documentation on the Atlan + Databricks integration.
Databricks data compliance: Related reads #
- Databricks Data Security: A Complete Guide for 2024
- Data Governance and Compliance: An Act of Checks and Balances
- Data Compliance Management: Concept, Components, Steps (2024)
- Databricks Unity Catalog: A Comprehensive Guide to Features, Capabilities, Architecture
- Data Catalog for Databricks: How To Setup Guide
- Databricks Lineage: Why is it Important & How to Set it Up?
- Databricks Governance: What To Expect, Setup Guide, Tools
- Databricks Metadata Management: FAQs, Tools, Getting Started
- Databricks Cost Optimization: Top Challenges & Strategies
- Databricks Data Mesh: Native Capabilities, Benefits of Integration with Atlan, and More
- Data Catalog: What It Is & How It Drives Business Value
Share this article