Automated quality control of data pipelines
Avoid propagation and use of poor quality data. 🔄 Tasks
On any change to data:
- Stage data: Stage the changes to the data in the lowest quality tier (e.g. bronze).
- Test data: Test the data in the lowest tier against defined checks and thresholds.
- Promote data:
- If the data passes the checks, promote it to the next tier (e.g. silver).
- If the data does not pass the checks, prevent further promotion without a human override.
- Repeat: Repeat the steps until the data reaches the top tier (e.g. gold) or is stopped, without a human override.
🎉 Outcome
Avoid propagation and use of poor quality data.
Related
Continuous validation of metric calculations
Ensure that calculations are accurate and identify problems.
Build an exception queue for certification
Maintain a minimum acceptable standard for certified metadata.
Automatically assign a freshness status to assets
Avoid accidental reuse of stale assets