COST OPTIMIZATION

Clean up data landscape by removing duplicate assets

Prevent duplicate assets to reduce costs.

🔄 Tasks

Detect duplicates:
1. Analyze lineage: Detect the creation of the same downstream asset through lineage.
2. Calculate differences: Compare the asset to existing data assets, using a data diff tool.
Recommend alternatives: Present the user with alternative data assets that are similar.

The user can then decide whether to reuse an existing (similar) data asset or continue with creating their own asset.

For example, imagine a user is writing a query that creates an output table. This workflow would use a data diff tool to compare that output table to existing tables. If there is a significant amount of overlap, the user could reuse the existing table. This would reinforce the existing table as a data product rather than creating duplicate assets.