QUALITY
Continuous validation of metric calculations
Ensure that calculations are accurate and identify problems.🔄 Tasks
Do the following each time a data pipeline or definition of a metric changes.
- Calculate lineage: Recalculate the lineage that results from the changed data pipeline.
- Identify calculations: From the lineage, identify fields that are the combination of two or more elements.
For example, the fields
COST
andREVENUE
may combine to create a columnGROSS_PROFIT
. - Find relationships: Check for relationships between these data elements and their business concepts (terms).
Continuing the example,
COST
may be related to a term Cost,REVENUE
to a term Revenue, andGROSS_PROFIT
to a term Gross profit. - Validate relationships: Confirm alignment between these conceptual relationships and the actual calculation in the lineage. Continuing the example, if the term Gross profit is defined as a calculation from Cost and Revenue, then this definition matches what we found in lineage. If Gross profit is defined to include other inputs not present in lineage, or lineage indicates there were other inputs not in the definition, there is a discrepancy.
- Notify stakeholders: If there are any discrepancies between the terms and columns, notify the appropriate stakeholders. For example, notify the owners or experts of the data assets and terms involved. The stakeholders can then align on correcting either the definition or the actual calculation itself. This active metadata use case should run continuously. That way, regardless of whether someone has changed the data pipeline or the definition of the metric, you can validate that the two are aligned.
🎉 Outcome
Ensure that calculations are accurate and identify problems.
Related
QUALITY
Build an exception queue for certification
Maintain a minimum acceptable standard for certified metadata.
QUALITY
Automatically assign a freshness status to assets
Avoid accidental reuse of stale assets.
QUALITY
Automated quality control of data pipelines
Avoid propagation and use of poor quality data.