Do the following each time a data pipeline or definition of a metric changes.
- Calculate lineage: Recalculate the lineage that results from the changed data pipeline.
- Identify calculations: From the lineage, identify fields that are the combination of two or more elements.
For example, the fields
REVENUEmay combine to create a column
- Find relationships: Check for relationships between these data elements and their business concepts (terms).
Continuing the example,
COSTmay be related to a term Cost,
REVENUEto a term Revenue, and
GROSS_PROFITto a term Gross profit.
- Validate relationships: Confirm alignment between these conceptual relationships and the actual calculation in the lineage. Continuing the example, if the term Gross profit is defined as a calculation from Cost and Revenue, then this definition matches what we found in lineage. If Gross profit is defined to include other inputs not present in lineage, or lineage indicates there were other inputs not in the definition, there is a discrepancy.
- Notify stakeholders: If there are any discrepancies between the terms and columns, notify the appropriate stakeholders. For example, notify the owners or experts of the data assets and terms involved. The stakeholders can then align on correcting either the definition or the actual calculation itself. This active metadata use case should run continuously. That way, regardless of whether someone has changed the data pipeline or the definition of the metric, you can validate that the two are aligned.
Ensure that calculations are accurate and identify problems.
Build an exception queue for certification
Maintain a minimum acceptable standard for certified metadata.
Automatically assign a freshness status to assets
Avoid accidental reuse of stale assets.
Automated quality control of data pipelines
Avoid propagation and use of poor quality data.