Balance equalities are a useful addition to the data quality toolbox for checking arithmetic consistency between data points, often to ensure financial observations are in balance.
In FMR they are defined as structural metadata using a non SDMX standard artefact called Validation Schemes.
When designing a Validation Scheme, one option is to define explicit arithmetic expressions on specific variables in a dataset. However hierarchical codelists can also be used for checking aggregate consistency within datasets.
Aggregate balance equalities can be defined using custom expressions on each dimension of a Dataflow for the purpose of checking the consistency of reported aggregates with lower-level observations. They’re typically of the form:
Dimension REF_AREA
(reference area)
[EUR] = [DE]+[FR]+[ES]+[IT]
The expression states that observations for REF_AREA=EUR
must be the sum of observations for DE
, FR
, ES
and IT
.
Expressions can use the four basic arithmetic operators: + - * /, plus parentheses (). And in addition to the form above, the following is also valid:
0 = [EUR] – ([DE]+[FR]+[ES]+[IT])
… where 0 can be any constant.
Custom expressions are a good solution where the number of consistency checks required is small. For more complex scenarios involving aggregation consistency, there is another other option.
In SDMX, Hierarchical Codelists can be used to define how aggregates should be calculated.
It’s worth noting here that the information model for Hierarchical Codelists changed in SDMX 3.0:
We can achieve the same result as our example custom expression by creating a Hierarchy of the relevant codes in the CL_REF_AREA
Codelist which defines the enumerated representation for the REF_AREA
dimension:
EUR
|- DE
|- FR
|- ES
|- IT
The approach starts to deliver greater benefits when aggregation hierarchies are deep and complex - maintenance of a single Hierarchy being simpler than multiple individual custom expressions.
The limitation with this approach is that the aggregation function is always assumed to be sum. Use custom expressions if more complex calculations are required, for example:
[EUR] = (([UK]+[DE]+[FR]) - (([ES]+[IT]+[LUX]*10) / 2) )
Remember also that Hierarchies are distinct from any simple parent-child relationships that may be defined as part of the Codelist. Indeed there can be any number of different Hierarchies should that need arise and a code can appear multiple times in a Hierarchy. Hierarchies can also reference codes from different Codelists but, for this use case, stick to a single Codelist.
In the FMR web user interface:
Wizard Step 4 optionally allows the hierarchy levels to be explicitly named but this is not necessary for the validation use case.
In the FMR web user interface:
The new balance equality validation rule will be automatically applied whenever data is loaded for the Dataflow using the Data > Convert Data UI option, or through the REST API data validation and transformation web service .
In the UI, validation results are reported under the Valid Calculations category.
FMR’s Validation Schemes allow the definition of data quality rules to check consistency between observation values.
Custom Expressions provide the flexibility of targeted rules and arithmetic expressions including addition, subtraction, multiplication and division.
However, for checking consistency between aggregation levels where the sum of lower levels should equal the values at higher levels, Hierarchies (Hierarchical Codelists in SDMX 2.1) could be the answer particularly where there are many levels. In these cases, a single Hierarchy replaces many discrete Custom Expressions reducing the creation and maintenance burden.
Fusion Metadata Registry:
download FMR 11
Fusion Metadata Registry:
FMR 11 Docker image
BISIT
Learn how to setup an external database for FMR's operational storage in preference to MySQL.
July 5, 2022