Benefits of Data Governance on AWS: What’s Available and How Can You Build On It?
Share this article
Data governance in AWS covers processes, policies, roles, metrics, and standards to manage and use AWS data assets.
In this article, we’ll look at the governance capabilities available in AWS, the benefits of data governance on AWS, and ways to enhance them further.
Want to make data governance a business priority? We can help you craft a plan that’s too good to ignore! 👉 Talk to us
Table of contents #
- Data governance in AWS
- Benefits of data governance on AWS with DataZone
- AWS + Atlan: Active data governance for your AWS workflows
- Wrapping up
- Related Reads
Data governance in AWS #
In the AWS ecosystem, different services are dedicated to specific aspects of data governance. For instance, AWS offers the following to manage data governance for your environments:
- AWS Organizations to manage all AWS accounts from a single place
- AWS Config to monitor and record AWS resource configurations
- AWS Identity and Access Management (IAM) to create and manage users, groups, roles, and policy documents across all AWS resources and services
- AWS Data Exchange to find, subscribe to, and use third-party data in the cloud
- AWS Clean Rooms to create a space to collaborate with your partners without sharing raw data
AWS also offers the following services to manage your AWS assets:
- AWS Lake Formation (a data store) to handle your data lake assets and centrally govern, secure, and share them for analytics and machine learning
- AWS Glue (an ETL service) to discover, prepare, and integrate all your Lake Formation data
- Amazon DataZone to catalog and govern data assets (from AWS Lake Formation, AWS Glue Data Catalog)
Data governance in AWS with Amazon DataZone #
Amazon DataZone is “a data management service to catalog, discover, share, and govern data across AWS, third-party sources, and on-premises.”
AWS Lake Formation and the AWS Glue Data Catalog are integral to DataZone. These products help in cataloging AWS assets and making them discoverable and accessible.
“For example, in Amazon DataZone, when a producer makes data available for a subscription, it has to be in the Data Catalog. When a subscription is granted, Amazon DataZone orchestrates the creation of Lake Formation grants.”
The role of AWS Lake Formation in Amazon DataZone
Specifically, AWS Lake Formation is a data store that helps in managing permissions and sharing data products. It centralizes data lake security management.
“Lake Formation gives you a central console where you can discover data sources, set up transformation jobs to move data to an S3 data lake, remove duplicates and match records, catalog data for access by analytic tools, configure data access and security policies, and audit and control access from AWS analytic and machine learning services.”
One of the key benefits of AWS Lake Formation is its ability to automate the process of setting up and configuring a data lake. This can save your organization valuable time and resources that would otherwise be spent on managing infrastructure.
— AcadianSchool | Python | Cloud | SQL (@AcadianSchool) June 21, 2023
The role of the AWS Glue Data Catalog in Amazon DataZone
Meanwhile, AWS Glue extracts data from other AWS services and incorporates it into data warehouses and data lakes. The AWS Glue Data Catalog is a metastore used by AWS Glue.
Amazon DataZone supports data assets published directly from the AWS Glue Data Catalog and Amazon Redshift.
Benefits of data governance on AWS with DataZone #
Amazon DataZone (generally available since October 2023) offers several data governance benefits for your AWS data assets, such as:
- A centralized data management platform that integrates with AWS Glue, Amazon Redshift, Athena, QuickSight, and Lake Formation for consistent data governance across platforms
- Built-in workflows to request access, as well as review and approve those requests
- A federated, domain-based organization of AWS data assets so that domain owners govern their data and manage access
- The ability to create business use case-based groupings of people, data, and analytics tools
- Automated recommendations for business descriptions powered by LLMs
- Insights into data cataloged from data sources using 70+ AWS Glue crawlers and 100+ Amazon AppFlow connectors
#AWS #DataZone sounds like a genuinely useful tool and as with many AWS services over the years, seems to be aiming to make things as easy as possible for users. So far so good! #reInvent pic.twitter.com/zSSRXJbTnr
— Alex Galbraith ☁️ (@alexgalbraith) November 29, 2022
AWS + Atlan: Active data governance for your AWS workflows #
AWS offers several ways to govern your AWS assets. However, handling non-AWS data sources and assets can be challenging.
For instance, AWS Glue has limitations when supporting data lake frameworks, such as Delta Lake, Apache Iceberg, and Hudi. Other AWS products might present similar challenges.
That’s where an active data governance platform like Atlan can help. Atlan, accessible via AWS Marketplace, offers active data governance for all data assets — embedding governance within your daily workflows.
For example, you can discuss a data asset on Slack or raise a Jira ticket without switching apps. You can integrate Atlan with AWS EventBridge and get Slack notifications when there is a change in the ownership of an AWS asset.
Atlan compliments the AWS ecosystem’s existing data governance support with capabilities, such as:
- Effective data governance, metadata management, and data cataloging for all data assets (AWS and non-AWS assets)
- Column-level data lineage with in-line actions to alert the right users, create support tickets, start discussions around data assets, etc.
- Propagation of policies, column descriptions, PII tags, etc. via the lineage map
- Automatic updates (i.e., classifications, certificates, alerts) to your data with Playbooks, i.e., rule-based automation
- Custom masking and access policies (i.e., persona-based access control) based on a user’s access level in terms of their role, team function, and project objectives
- AI-assisted documentation and classification
@AtlanHQ Debuts as a Leader in 3 Categories—Data Governance, Machine Learning Data Catalogs, and Data Quality—in the G2 Spring 2023 Grid Reports https://t.co/eF2pFSJw5X #martech #marketing #Technology #Atlan
— MarTech Series (@MarTechSeries) April 5, 2023
Read more → How to collaborate across your AWS data stack with Atlan
Deploying Atlan for governing your AWS workflows: AWS Glue #
If you’re connecting AWS Glue to Atlan, you must:
- Create an IAM policy.
- Choose an authentication method — user-based, role-based, or role delegation-based.
- Now, you can add AWS Glue as your source and create new workflows.
- Next, provide the credentials that you’ve configured to let Atlan fetch metadata. You can choose which metadata you wish to include/exclude from crawling.
- Run the crawler and schedule it to run hourly, daily, weekly, or monthly.
Atlan will crawl and map databases, schemas, tables, views, and columns from AWS Glue.
Wrapping up #
AWS offers data governance solutions to manage AWS assets effectively. Benefits include centralized access management, domain and role-based classification of data assets, auto-recommendations for context, and more.
With a partner like Atlan, you can set up a flexible, user-friendly, and secure environment for data governance of all data assets (including the non-AWS assets).
Atlan’s integration with the AWS ecosystem further simplifies policy management, streamlines compliance, and automates several aspects of data governance.
Data governance benefits in AWS: Related Reads #
- What is Data Governance? Its Importance, Principles & How to Get Started?
- Key Objectives of Data Governance : How Should You Think About Them?
- Data Governance Framework — Examples, Templates, Standards, Best Practices & How to Create One?
- Data Governance and Compliance: Act of Checks & Balances
- How to implement data governance? Steps, Prerequisites, Essential Factors & Business Case
- How to Improve Data Governance? Steps, Tips & Template
- 7 Steps to Simplify Data Governance for Your Entire Organization
- Snowflake Data Governance — Features, Frameworks & Best Practices
- Automated Data Governance : How Does It Help You Manage Access, Security & More at Scale?
- Enterprise Data Governance — Basics, Strategy, Key Challenges, Benefits & Best Practices
Share this article