Unit 2: Recoding with SDMX

This unit explains the SDMX components, or building blocks, for recoding data.

Structure maps and recoding with SDMX

Structure maps link components together in a source/target relationship where there is a semantic equivalence between the source and the target components.

A typical use of structure maps is to provide mappings between:

an SDMX data structure used in an internal system, with
an SDMX structure of an external dataset, when
imported to or exported from the internal system.

SDMX v3.0 structure mapping provides the ability to define:

a relationship between datasets conforming to a source DSD,
to datasets which conform to a target DSD.

This relationship allows for the automatic recoding of data from one structure to another. For example, a source dataset may contain eight dimensions and use certain coding schemes, which may map to a dataset with only five dimensions using different coding schemes.

Component and representation maps

The structure map defines one or more component maps which can then link to a representation map.

Some output components may have a fixed value and there may be instances when values do not require mapping.

Select each aspect of recoding with SDMX to learn more.

Component map

Component map

The structure map defines one or more component maps, each component map has one or more components from the source DSD, mapping to one or more components in the target DSD.

Each component map can link to a representation map, which is used to describe how the source values map to the target values.

Representation map

Representation map

The linked representation map links to source and target codelists, valuelists, or free text.

Like the component map, the representation map may contain multiple sources and multiple targets. The number and order of sources and targets to a representation map must match exactly that of the component map. For example,

if a component map has two sources REF_AREA and CURRENCY, then
the linked representation map must also have two sources, one for the REF_AREA codelist and the other for the CURRENCY codelist.

Representation maps can include complex rules, such as regular expressions on source values and can even define periods of time for which a mapping relationship is true. For example,

if a relationship between the source country and target currency is defined, then
one could map France to the French Franc up until 2002,
and then map France to the Euro from 2002 onwards.

Note: Cell-level footnotes and dataset-level footnotes are defined in SDMX models as attributes. The same is true for observation status codes (confidentiality, provisional data, estimated data etc.). This concept is represented as an attribute and attached to observations.

Non-mapping and fixed values

Non-mapping and fixed values

If values do not require mapping, for example if the source FREQ maps to target FREQ and the values are the same in both the source and target DSD, then the component map should not link to a representation map. The lack of a link will inform the system that the value should be copied across verbatim.

Some components on the output may have a fixed value, for example frequency is always ‘M’ regardless of the input data. This is defined at the level of the structure map. As mappings can be bi-directional the input can also have a fixed value, so when mapping the other way (from target to source) the input becomes the output.

Mapping: Example use cases

Let’s now take a moment to consider three example uses cases to illustrate the component map and representation map relationship.

Select each example use case for the details.

Use case 1: One-to-one mapping

An example use case is a dimension with ID REF_AREA whose values are ISO2 country codes. The mapped DSD has a REF_AREA whereby countries are represented by ISO3 country codes.

One solution is to create a single component map which maps the source REF_AREA based on the ISO2 coding scheme to the output REF_AREA based on the ISO3 coding scheme.

Component map: Source= UNIQUE_ID Target=REF_AREA

A single representation map is required to map each UNIQUE ID to the matching output.

Representation Map = (source) REF_AREA -> (output) REF_AREA
Values:
- AF=AFG
- AL=ALB
- AU=AUS
- CA=CAN
- CN=CHN
- JP=JPN
- …

Use case 2: One-to-many mapping

An example use case is a dimension with ID UNIQUE_KEY whose values are used to uniquely define a series, example SERIES1, SERIES2, SERIES3. The mapped DSD splits this into multiple Dimensions FREQ, REF_AREA, INDICATOR.

The mapping rules split the unique key:
SERIES1 maps to FREQ:M, REF_AREA:UK and INDICATOR:EMPLOYED.
SERIES2 maps to FREQ:M, REF_AREA:FR and INDICATOR:EMPLOYED.
SERIES3 maps to FREQ:A, REF_AREA:UK and INDICATOR:EMPLOYED.

Solution 1
This type of use case can be solved by creating three component maps:

Component map 1: Source=UNIQUE_KEY Target=FREQ
Component map 2: Source=UNIQUE_KEY Target=REF_AREA
Component map 3: Source=UNIQUE_KEY Target=INDICATOR

Each component map is backed by a representation map, which maps the value of the unique key to the output.

Representation map 1 = UNIQUE_KEY -> FREQ
Values:
- SERIES1=M
- SERIES2=M
- SERIES3=A

Representation map 2 = UNIQUE_KEY -> REF_AREA
Values:
- SERIES1=UK
- SERIES2=FR
- SERIES3=UK

Representation map 3 = UNIQUE_KEY -> INDICATOR
Values:
- SERIES1=EMPLOYED
- SERIES2=EMPLOYED
- SERIES3=EMPLOYED

Solution 2
An alternative solution is to create a single component map which maps the source UNIQUE_KEY to three outputs FREQ, REF_AREA, INDICATOR.

A single representation map is required to map each UNIQUE KEY to the three outputs.

Representation map = UNIQUE_KEY -> FREQ:REF_AREA:INDICATOR
Values:
- SERIES1=M:UK:EMPLOYED
- SERIES2=M:FR:EMPLOYED
- SERIES3=A:UK:EMPLOYED

The choice of whether to split the mapping up into separate components vs a single rule should be based on what will be more maintainable, understandable, and whether individual mapping rules will be reused by other structure maps.

Use case 3: Many-to-one mapping

As indicated in the module overview, these recoding capabilities may be bidirectional whereby the source to target mapping for scenario A may be reversed and become the target to source mapping in scenario B.

The many-to-one use case therefore is simply the reverse of the one-to-many use case detailed above and has the same two solutions:

split the rules into individual maps, or
describe the relationship in a single map.

The example use case is dimensions FREQ, REF_AREA, INDICATOR whose values are to be combined in a mapped DSD as a single dimension with ID UNIQUE_KEY and unique series SERIES1, SERIES2, SERIES3.

The mapping rules combine the dimensions:

FREQ:M, REF_AREA:UK and INDICATOR:EMPLOYED maps to SERIES1.
FREQ:M, REF_AREA:FR and INDICATOR:EMPLOYED maps to SERIES2.
FREQ:A, REF_AREA:UK and INDICATOR:EMPLOYED maps to SERIES3.

Solution 2
The simplest solution to understand and maintain is to create a single component map which maps the source FREQ, REF_AREA, INDICATOR combinations to the output UNIQUE_KEY.

A single representation map is required to map the three inputs to the UNIQUE_KEY.

Representation map = FREQ:REF_AREA:INDICATOR -> UNIQUE_KEY
Values:
- M:UK:EMPLOYED=SERIES1
- M:FR:EMPLOYED=SERIES2
- A:UK:EMPLOYED=SERIES3

Map maintenance

Using FMR, the creation and maintenance of structure maps and representation maps are accessible from the main menu as shown below.

What do you know?

In this unit you learned about the SDMX components, or building blocks, for recoding data.

Which of the following best describes the role of a representation map?

Select your answer and then select Submit.

Describe how components in the source and target DSDs map.

Convert datasets submitted using an HTTP POST method to the data transmission format specified by the HTTP accept header.

Validate and to automate structure and component mappings.

Describe how the source values map to the target values.

Identify components on the output with fixed values.

That's right.

The structure map defines one or more component maps, each component map has one or more components from the source DSD, mapping to one or more components in the target DSD.

Each component map can link to a representation map, which is used to describe how the source values map to the target values.

The linked representation map links to source and target codelists, valuelists, or free text.

That's not right.

The correct answer is option 4.

The structure map defines one or more component maps, each component map has one or more components from the source DSD, mapping to one or more components in the target DSD.

Each component map can link to a representation map, which is used to describe how the source values map to the target values.

The linked representation map links to source and target codelists, valuelists, or free text.