Unit 2: Introduction to Structural Modelling

In this unit, you’ll be presented with the three types of data models, their definitions, and practical applications.

The Statistical Data and Metadata eXchange (SDMX) standard will be introduced and aligned with the three data models.

Introduction to data modelling

To have a complete picture of a set of statistical data and to be able to create IT tools and databases to support their use, there are three types of data models which need to be produced. The models are known as Conceptual, Logical, and Physical Data Models.

The models are created in sequence and the complexity of the model increases from conceptual to logical to physical. The Conceptual Data Model is concerned with the real-world view and understanding of data; the Logical Data Model is a more detailed definition of the variables, concepts, and relationships identified in the Conceptual Data Model.

Select each type of data model to find out more, then follow this link for a takeaway summary of the three models.

Step 1: Conceptual Data Model

The Conceptual Data Model identifies the structural statistical concepts (statistical units, populations, variables, measures, time) in the data and how they relate to one another. This model represents how statisticians and methodologists view the world. Non-critical details are suppressed to emphasize the essential statistical concepts and any relationships to concepts defined elsewhere in the statistical system or in international classifications and standards.

Step 2: Logical Data Model

The Logical Data Model expands upon the Conceptual Data Model by identifying the structure details of the data. The Logical Data Model provides a complete description of the statistical concepts, variables, and value sets. For example, if the Conceptual Data Model identified the variable Sex, it is in the Logical Data Model that the value set for Sex (Male, Female) would be identified and if it was a categorical variable, like Sex, the codification would also be specified (example: M for Male; F for Female).

Step 3: Physical Data Model

The Physical Data Model identifies how the Logical Data Model will be implemented in IT tools and databases. The Physical Data Model is of primary interest to IT experts and will not be addressed in this module.

Statistical Data and Metadata eXchange (SDMX)

This module is a foundation module for implementing SDMX, so let’s first take a moment to focus on the connection between SDMX and the three data models introduced earlier.

Select to find out more about the SDMX standard, model and terminology.

SDMX Standard

The SDMX Standard has two components:

A robust Information Model.
A detailed Technical Standard which defines how to create IT tools that fully support the SDMX Information Model throughout the data lifecycle.

SDMX Information Model	The Information Model can be used to describe any multi-dimensional dataset regardless of domain and is the area of primary focus for statisticians.
SDMX Technical Standard	Transforming the Information Model into tools and databases which respond to the needs of the entire statistical data lifecycle in accordance with the SDMX Technical Standards is the focus of IT experts.

The SDMX Standard is further supported by the creation of international good practices and shared standards, such as domain-specific data models and cross-domain codelists.

SDMX Information Model

Just like the statistical terminology presented earlier in this module, the SDMX Information Model also provides a set of very similar concepts to describe the structure of data. In fact, the SDMX information model can be used to describe any multi-dimensional dataset regardless of statistical domain.

The SDMX Information Model is closely aligned to the Conceptual Data Model and Logical Data Model. To see how, follow this link for a takeaways sequence of steps to produce a complete Data Model mapped to the SDMX Standard and a tool which supports model maintenance.

SDMX Terminology

SDMX terminology differs from traditional statistical terminology and the effort to learn structural modelling and in parallel, learn SDMX terminology, has proven to be challenging for many statisticians.

This module therefore is a part of a series which focusses on producing high-quality conceptually sound structural models of statistical data, mapping the models to the SDMX Information Model, and then demonstrating how to manage these statistical models using a modern free open-source metadata management tool.

Once statistical data are described using the statistical terminology you covered in Unit 1, the process of mapping these definitions to the SDMX Information Model becomes straightforward. This transformation and mapping activity is the subject of the next module in this series, Essential SDMX Structural Modelling.

What do you know?

To have a complete picture of a set of statistical data and to be able to create IT tools and databases to support their use, there are three types of data models which need to be produced.

Which of the following best describes the Conceptual Data Model?

Select your answer and then select Submit.

Classifies how to implement data in systems and databases.

Provides a complete description of the statistical concepts, variables, and value sets.

Identifies the structural statistical concepts in the data and how they relate to one another.

That's right.

The models are created in sequence and the complexity of the model increases from conceptual, to logical, to physical. The Conceptual Data Model is concerned with the real-world view and understanding of data; the Logical Data Model is a more detailed definition of the variables, concepts, and relationships identified in the Conceptual Data Model.

That's incorrect. The correct answer is option 3.

Coming next…

Now that you know the three types of data models and how they relate to the SDMX standard, let’s use what you’ve learned to evaluate a statistical table and produce a Conceptual Data Model.