Unit 1: Introduction to Recoding of Data

As indicated in the module introduction, there are many use cases for recoding data. This unit introduces what is possible using SDMX and what is needed to implement it.

What is recoding of data?

Recoding data may be given different labels such as mapping data or transforming data but fundamentally it all refers to the same objective:

Changing how data are described using a structural model.

It is critical to note that recoding data in SDMX does not modify data or create new data and should not be thought of as a mechanism to aggregate or disaggregate data. Recoding of data re-organises and recodes it so that an observation is described according to a different structural model.

Differences in the structural models

The differences in the structural model could be any or all of the following:

A codelist could be recoded on a code-by-code basis, or by using regular expressions. Examples:
- Mapping between ISO-2 country codes and ISO-3 country codes
- Mapping between SDMX cross-domain codes and internal bespoke custom codes
- Mapping between national statistical systems codes and international codes (e.g. SDGs, ISCED, ISIC, NACE)
A many-to-one mapping of concepts could be defined to collapse a structural model from many dimensions to fewer dimensions, typically for data reporting or dissemination.
A one-to-many mapping of concepts could be defined to expand a structural model from one concept to many concepts. A common use case is the recoding of a series or indicator from one dimension to many dimensions.
A many-to-many mapping which maps multiple sources to multiple targets.

About concept mapping

The mapping of concepts can range from the simple approach of a lookup table whereby concept A maps to concept B to complex pattern matching with regular expressions (regex).

Select each question to discover more.

The SDMX v3.0 artefacts used to support this recoding of data are:

Structure maps
Representation maps

If the goal is to automate these mappings as a part of a production process, then a web service such as the FMR data transformation web service would be used.

The SDMX artefacts which can be recoded are:

Data structure definitions (DSDs)
Dataflows

Structure mapping does not create new data, it should not be thought of as a mechanism to aggregate data, only to reorganise and recode them.

Two SDMX maintainable structures are required for defining how two datasets relate to each other:

The structure map which is used to define how components from the source DSD relate to components in the target DSD.
The representation map which is used to describe how values reported for source components should be converted to conform to the desired output DSD.

A simple example is a relationship between the source COUNTRY component to the target REF_AREA component.

The structure map has a source COUNTRY and target REF_AREA.
The rules used to define how the values are mapped are maintained in the corresponding representation map, an example rule would be GB maps to GBR, FR maps to FRA, US maps to USA, and UY maps to URY.

Structure and representation maps

Structure maps define the source and target DSD or the source and target dataflow. While representation maps are used to define mapping rules between source and target values.

Select each type of map to learn more.

Structure map

The structure map defines:

the source and target DSD, or
the source and target dataflow if the mapping is dataflow specific.

The source DSD/dataflow is the one from which the data will be input into the mapping.

Although the mapping rules are bi-directional (data mapped one way can be mapped back again) for more complex mappings, which include regular expression matches or substring matches on the source, it is not always possible to map back again. Therefore, the source DSD or dataflow should be selected based on where the data are coming from, and the target is where the data are going to because of the mapping.

Representation map

An SDMX v3.0 representation map defines:

The mapping rules between a source value (or combination of values) and a target value (or combination of targets).

When a source value is matched, the target value is output.

Representation maps can be used:

To map from one classification to another (e.g. from ISO2 to ISO3 character country codes).
To map from non-coded to coded values (e.g. $ maps to USD).
More complex mapping which require a combination of source values to generate target values.

A representation map is more than a simple lookup table, as more complex rules can be introduced such as regular expression matches which can include capture groups to transfer patterns from source to target, substring matches, as well as rules which are only applicable for certain periods of time.

Dataset/dataflow transformation

The transformation from one dataset/dataflow to another using structure maps may be actioned:

Manually using the FMR user interface.
Automatically using the FMR data transformation web service which converts the dataset submitted using an HTTP POST method to the data transmission format specified by the HTTP accept header, optionally transforming it to a different DSD if FMR has a structure map defining the mapping.

What do you know?

Now that you’ve completed our introduction, try this.

The SDMX method to recode data according to various structural model mappings in an automated way is by using structure maps and representation maps along with an SDMX transformation application.

Which of the following best describes the role of a structure map?

Select your answer and then select Submit.

Define the mapping rules between source and target values.

Define the source and target DSD, or the source and target dataflow.

A mechanism to aggregate or disaggregate data.

Map each component from the source DSD to the target DSD.

Describe how the source values map to the target values.

That's right.

The structure map defines the source and target DSD, or source and target dataflow if the mapping is dataflow specific.

The representation map defines the mapping rules between a source value (or combination of values) and a target value (or combination of targets).

Recoding data in SDMX does not modify data or create new data and should not be thought of as a mechanism to aggregate or disaggregate data.

That's not right.

The correct answer is option 2.

The structure map defines the source and target DSD, or source and target dataflow if the mapping is dataflow specific.

The representation map defines the mapping rules between a source value (or combination of values) and a target value (or combination of targets).

Recoding data in SDMX does not modify data or create new data and should not be thought of as a mechanism to aggregate or disaggregate data.