Unit 1: Essential SDMX Concepts

In this unit, you’ll be presented with a review of the essential SDMX concepts related to identifying and describing statistical data.

Review of essential SDMX terminology

The SDMX Information Model, which is used to describe statistics, consists of many concepts and artefacts. The majority of SDMX structural models can be described using a small subset of the Information Model. This subset will be the focus of this introductory module.

More advanced courses and other learning resources will be made available to address specialised and more complex structural modelling requirements and use cases.

About SDMX and the SDMX Information Model

So, what is SDMX and the SDMX Information Model?

Here’s a quick refresher.

Statistical Data and Metadata eXchange (SDMX)

  • Does not introduce any new concepts for statisticians – it simply provides a framework for what statisticians already do.
  • Can be used to describe any multi-dimensional dataset regardless of statistical domain.
  • Provides a way to describe the structure of data.

The SDMX Information Model

  • Forms the core of SDMX.
  • Describes statistics in a standard way.
  • Identifies objects and their relationships.
  • Allows central management and standard access.

Statistical data, metadata, and data exchange processes may be modelled.

SDMX concepts for data

A statistical observation is the actual data point or number being measured and can be described using three concepts: dimension, attribute, and measure.

A (statistical) series, the measurement of some phenomenon over time, is specified by a grouping of dimensions.

Select each concept to learn more.

Dimension

Dimension

  • A statistical concept used to both identify and describe the statistical observation.
  • Is necessary to understand the meaning of the data.
  • Taken together, the dimensions uniquely identify a statistical observation.
  • Is always mandatory.
  • Is always categorical.
  • The value set is always a codelist except in the case of the time dimension, which is a special case.
  • The value set is either ordered or unordered.
  • Is attached to a group/series, except for the time dimension and the measure dimension.
Time-series data
Time dimension: attached to the observation.
Measure dimension: attached to the group/series.
Cross-sectional data
Time dimension: attached to the group/series.
Measure dimension: attached to the observation.
Attribute

Attribute

  • A statistical concept used to describe a statistical observation. An attribute provides additional qualitative information about a statistical observation.
  • Is either coded or uncoded.
  • If coded, the value set for an attribute is either ordered or unordered.
  • Is either mandatory or optional.
  • Is attached to an observation, dimension, group/series, or dataset.

Note: Cell-level footnotes and dataset-level footnotes are defined in SDMX models as attributes. The same is true for observation status codes (confidentiality, provisional data, estimated data, etc.), this concept is represented as an attribute and attached to observations.

Measure

Measure

  • Describes (quantifies) the actual values.
  • Is numeric.
  • Is either discrete or continuous.
  • The primary measure is usually referred to as “observation value”.

SDMX terminology and datasets

Dataset

  • A dataset is an organised collection of statistical observations.
  • Statistical observations are described using dimensions, attributes, and measure.
  • A statistical observation in a dataset is unambiguously identified by dimensions.
  • The dimensions, attributes and measure used to describe statistical observations in one dataset may be reused to also describe statistical observations in other datasets.

In a statistical system:

  • There will be many datasets.
  • Datasets will be received from data providers, changed by internal processes, published in online portals, and reported to other organisations.
  • There will be many dimensions, attributes, and measures.
  • Some dimensions, attributes, and measures will be reused in multiple datasets.
  • There will be many codelists.
  • Some codelists will be reused in multiple dimensions and coded attributes. For example, sex, age, reference area, frequency, unit of measure. It is recommended to use the numerous SDMX cross-domain codelists which address common use cases.
  • Managing these structural metadata, mazimising coherence and consistency, both within an individual organisation as well as throughout the entire data lifecycle, including with external data partners, is an important aspect of assuring quality in the statistical system.

What do you know?

Now that you’ve completed our review of Essential SDMX Concepts, try the three questions that follow.

A statistical observation can be described using three concepts: dimension, attribute, and measure.

Which of the following best describe a dimension?

Select all that apply and then select Submit.

What do you know?

Now see if you can identify which of the following best describe an attribute?

Select all that apply and then select Submit.

What do you know?

Finally, a measure is numeric and describes actual values, but how do we usually refer to the value of the primary measure?

Select your answer and then select Submit.

Coming next …

The SDMX concepts you covered in this opening unit provide a common vocabulary for statisticians to identify and describe statistical data.

These concepts will be used extensively throughout the remainder of this module, along with those for structural modelling covered in the next unit.