- Detect duplicates:
- Analyze lineage: Detect the creation of the same downstream asset through lineage.
- Calculate differences: Compare the asset to existing data assets, using a data diff tool.
- Recommend alternatives: Present the user with alternative data assets that are similar.
The user can then decide whether to reuse an existing (similar) data asset or continue with creating their own asset.
For example, imagine a user is writing a query that creates an output table. This workflow would use a data diff tool to compare that output table to existing tables. If there is a significant amount of overlap, the user could reuse the existing table. This would reinforce the existing table as a data product rather than creating duplicate assets.
Prevent duplicate assets to reduce costs.
Purge stale or unused assets
Maintain a clean data landscape.
Dynamic data pipeline optimization
Reduce unnecessary data processing and improve resource utilization.
Allocate compute resources dynamically
Improve resource utilization and reduce processing delays.