Benefits of Data Governance on AWS: What’s Available and How Can You Build On It?

Updated December 22nd, 2023
Benefits of Data Governance on AWS

Share this article

Data governance in AWS covers processes, policies, roles, metrics, and standards to manage and use AWS data assets.

In this article, we’ll look at the governance capabilities available in AWS, the benefits of data governance on AWS, and ways to enhance them further.

Want to make data governance a business priority? We can help you craft a plan that’s too good to ignore! 👉 Talk to us

Table of contents

  1. Data governance in AWS
  2. Benefits of data governance on AWS with DataZone
  3. AWS + Atlan: Active data governance for your AWS workflows
  4. Wrapping up
  5. Related Reads

Data governance in AWS

In the AWS ecosystem, different services are dedicated to specific aspects of data governance. For instance, AWS offers the following to manage data governance for your environments:

  • AWS Organizations to manage all AWS accounts from a single place
  • AWS Config to monitor and record AWS resource configurations
  • AWS Identity and Access Management (IAM) to create and manage users, groups, roles, and policy documents across all AWS resources and services
  • AWS Data Exchange to find, subscribe to, and use third-party data in the cloud
  • AWS Clean Rooms to create a space to collaborate with your partners without sharing raw data

AWS also offers the following services to manage your AWS assets:

  • AWS Lake Formation (a data store) to handle your data lake assets and centrally govern, secure, and share them for analytics and machine learning
  • AWS Glue (an ETL service) to discover, prepare, and integrate all your Lake Formation data
  • Amazon DataZone to catalog and govern data assets (from AWS Lake Formation, AWS Glue Data Catalog)

Data governance in AWS with Amazon DataZone

Amazon DataZone is “a data management service to catalog, discover, share, and govern data across AWS, third-party sources, and on-premises.”

Data discovery and sharing at scale with Amazon DataZone

Data discovery and sharing at scale with Amazon DataZone - Source: AWS.

Amazon DataZone components

Amazon DataZone components - Source: AWS Blog.

AWS Lake Formation and the AWS Glue Data Catalog are integral to DataZone. These products help in cataloging AWS assets and making them discoverable and accessible.

For example, in Amazon DataZone, when a producer makes data available for a subscription, it has to be in the Data Catalog. When a subscription is granted, Amazon DataZone orchestrates the creation of Lake Formation grants.”

AWS DataZone for data discovery and access

AWS DataZone for data discovery and access - Source: AWS re:Invent 2023.

The role of AWS Lake Formation in Amazon DataZone

Specifically, AWS Lake Formation is a data store that helps in managing permissions and sharing data products. It centralizes data lake security management.

Lake Formation gives you a central console where you can discover data sources, set up transformation jobs to move data to an S3 data lake, remove duplicates and match records, catalog data for access by analytic tools, configure data access and security policies, and audit and control access from AWS analytic and machine learning services.”

Simplify data permissions and sharing with AWS Lake Formation

Simplify data permissions and sharing with AWS Lake Formation - Source: AWS.

The role of the AWS Glue Data Catalog in Amazon DataZone

Meanwhile, AWS Glue extracts data from other AWS services and incorporates it into data warehouses and data lakes. The AWS Glue Data Catalog is a metastore used by AWS Glue.

Simplify data integration with AWS Glue

Simplify data integration with AWS Glue - Source: AWS.

Amazon DataZone supports data assets published directly from the AWS Glue Data Catalog and Amazon Redshift.

Benefits of data governance on AWS with DataZone

Amazon DataZone (generally available since October 2023) offers several data governance benefits for your AWS data assets, such as:

  • A centralized data management platform that integrates with AWS Glue, Amazon Redshift, Athena, QuickSight, and Lake Formation for consistent data governance across platforms
  • Built-in workflows to request access, as well as review and approve those requests
  • A federated, domain-based organization of AWS data assets so that domain owners govern their data and manage access
  • The ability to create business use case-based groupings of people, data, and analytics tools
  • Automated recommendations for business descriptions powered by LLMs
  • Insights into data cataloged from data sources using 70+ AWS Glue crawlers and 100+ Amazon AppFlow connectors

AWS + Atlan: Active data governance for your AWS workflows

AWS offers several ways to govern your AWS assets. However, handling non-AWS data sources and assets can be challenging.

For instance, AWS Glue has limitations when supporting data lake frameworks, such as Delta Lake, Apache Iceberg, and Hudi. Other AWS products might present similar challenges.

That’s where an active data governance platform like Atlan can help. Atlan, accessible via AWS Marketplace, offers active data governance for all data assets — embedding governance within your daily workflows.

For example, you can discuss a data asset on Slack or raise a Jira ticket without switching apps. You can integrate Atlan with AWS EventBridge and get Slack notifications when there is a change in the ownership of an AWS asset.

AWS + Atlan: Active data governance for your AWS workflows - Source: Atlan on Youtube.

Atlan compliments the AWS ecosystem’s existing data governance support with capabilities, such as:

  • Effective data governance, metadata management, and data cataloging for all data assets (AWS and non-AWS assets)
  • Column-level data lineage with in-line actions to alert the right users, create support tickets, start discussions around data assets, etc.
  • Propagation of policies, column descriptions, PII tags, etc. via the lineage map
  • Automatic updates (i.e., classifications, certificates, alerts) to your data with Playbooks, i.e., rule-based automation
  • Custom masking and access policies (i.e., persona-based access control) based on a user’s access level in terms of their role, team function, and project objectives
  • AI-assisted documentation and classification

Read more → How to collaborate across your AWS data stack with Atlan

Deploying Atlan for governing your AWS workflows: AWS Glue

If you’re connecting AWS Glue to Atlan, you must:

  1. Create an IAM policy.
  2. Choose an authentication method — user-based, role-based, or role delegation-based.
  3. Now, you can add AWS Glue as your source and create new workflows.
  4. Next, provide the credentials that you’ve configured to let Atlan fetch metadata. You can choose which metadata you wish to include/exclude from crawling.
  5. Run the crawler and schedule it to run hourly, daily, weekly, or monthly.

Atlan will crawl and map databases, schemas, tables, views, and columns from AWS Glue.

Wrapping up

AWS offers data governance solutions to manage AWS assets effectively. Benefits include centralized access management, domain and role-based classification of data assets, auto-recommendations for context, and more.

With a partner like Atlan, you can set up a flexible, user-friendly, and secure environment for data governance of all data assets (including the non-AWS assets).

Atlan’s integration with the AWS ecosystem further simplifies policy management, streamlines compliance, and automates several aspects of data governance.

Share this article

[Website env: production]