Dynamic data pipeline optimization
Reduce unnecessary data processing and improve resource utilization. 🔄 Tasks
- Collect metrics:
- Collect runtime metrics from processing engines, including when, how often, and for how long data processing occurs.
- Collect metrics from each data store (per data asset), including how often the data changes and is accessed.
- Make recommendations:
- Avoid processing data that changes or is accessed less often than it is processed. For example, avoid daily reprofiling of a data source whose data changes by less than 1-2% per week.
- Stagger the processing schedule for better resource utilization.
- Auto-schedule: Apply the recommended scheduling changes to the data processing environment.
🎉 Outcome
Reduce unnecessary data processing and improve resource utilization.
Related
Purge stale or unused assets
Maintain a clean data landscape.
Allocate compute resources dynamically
Improve resource utilization and reduce processing delays.
Clean up data landscape by removing duplicate assets
Prevent duplicate assets to reduce costs.