COST OPTIMIZATION
Clean up data landscape by removing duplicate assets
Prevent duplicate assets to reduce costs.🔄 Tasks
- Detect duplicates:
- Analyze lineage: Detect the creation of the same downstream asset through lineage.
- Calculate differences: Compare the asset to existing data assets, using a data diff tool.
- Recommend alternatives: Present the user with alternative data assets that are similar.
The user can then decide whether to reuse an existing (similar) data asset or continue with creating their own asset.
For example, imagine a user is writing a query that creates an output table. This workflow would use a data diff tool to compare that output table to existing tables. If there is a significant amount of overlap, the user could reuse the existing table. This would reinforce the existing table as a data product rather than creating duplicate assets.
🎉 Outcome
Prevent duplicate assets to reduce costs.
Related
COST OPTIMIZATION
Purge stale or unused assets
Maintain a clean data landscape.
COST OPTIMIZATION
Dynamic data pipeline optimization
Reduce unnecessary data processing and improve resource utilization.
COST OPTIMIZATION
Allocate compute resources dynamically
Improve resource utilization and reduce processing delays.