Doc. Eurostat/ITDG/October 2009/5.1.1

IT Directors Group

21 and 22 October 2009

BECH Building, 5, rue Alphonse Weicker, Luxembourg-Kirchberg

Room QUETELET

9.30 a.m. - 5.30 p.m.
9.00 a.m – 1.00 p.m.


SDMX

Progress and implementation

Item 5.1.1 of the agenda

SDMX – progress and implementation

1.  Purpose of the document

The aim of this document is to continue the reporting on the progress of work on SDMX (Statistical Data and Metadata eXchange), with regard to the SDMX standards and guidelines on the one hand and with regard to the use and implementation of SDMX within the European Statistical System and beyond on the other hand.

Member States are asked

–  to take note of the further progress achieved in SDMX and to comment on any issue of special interest;

–  to provide ideas for accelerating the implementation of SDMX within the European Statistical System

–  to discuss the future developments and perspectives of SDMX also with regard to the new Eurostat strategy on the production method of EU statistics.

2.  Background

SDMX consists of technical and statistical standards and guidelines, together with an IT architecture and IT tools, to be used for the efficient exchange and sharing of statistical data and metadata. Full information on the SDMX standards and organisation are available on http://www.sdmx.org.

At its meeting in October 2005, the IT Directors Group (ITDG) endorsed Eurostat's proposal for an implementation strategy of SDMX in the ESS. This strategy involves a gradual implementation of SDMX where there is a clear business case and where the use of SDMX can help Eurostat and Member States to rationalise data or metadata production and dissemination. This strategy was endorsed by the Statistical Programme Committee (SPC) at its meeting in February 2007.

In the subsequent meetings of the STNE[1] Working Group and the Metadata Working Group in 2008 and in 2009 and also at the meeting of the IT Directors Group in October 2008, the ongoing activities and future detailed plans related to SDMX were presented and endorsed.

In April 2009, the Eurostat senior management reiterated that SDMX should be broadly used within the European Statistical System (ESS) in order to improve data collection, production and dissemination processes. This is fully in line with the new Eurostat strategy on the production methods of EU statistics.

This document continues the communication on SDMX to the ITDG in emphasising the work and the achievements reached since the last ITDG meeting in 2008.

3.  Progress in the SDMX work at international level

3.1.  The SDMX Global Conference 2009

One major SDMX event in 2009 was the Global SDMX Conference in January 2009 in Paris, hosted by the OECD. The conference dealt with the following issues: the SDMX standards and guidelines, the SDMX implementation projects and an outlook on further plans. Separate sessions on hands-on capacity building (training) were also organised.

A summary report on this SDMX Global Conference is annexed to this document in annex 1. The key message coming out of this conference was: The SDMX technical and statistical standards and guidelines have reached a certain maturity now. It is therefore time for statistical organisations to use and implement these standards.

The following findings of a survey amongst statistical organisations, organised in the frame of the SDMX Global Survey 2009, underline the intention of many statistical organisations around the world to embark into SDMX soon:

Figure 1: Some findings from the 2009 Survey about SDMX

3.2.  The SDMX Content-oriented Guidelines 2009 (SDMX COG)

After a period of public consultation, the SDMX Content-oriented Guidelines (version 2009) were released in January 2009. The SDMX COG package consists of the following components: Statistical Cross-Domain Concepts, Cross-Domain code lists, Statistical Subject-Matter domains, and a Metadata Common Vocabulary.

Figure 2: Screenshot of the SDMX Content-oriented Guidelines web page

Compared to previous versions, the SDMX Content-oriented Guidelines (version 2009) are now much more comprehensive and have reached much higher quality.

On the other hand, annex 2 of the SDMX COG (on the Cross-domain code lists) still does not contain many code-lists and needs to be further enlarged in the future.

3.3.  Additional activities of the SDMX Sponsors and Secretariat

The SDMX Sponsors and Secretariat, in their regular meetings and telephone conferences, have discussed a broader range of activities for 2008, 2009 an 2010. More in particular those are:

3.3.1. SDMX capacity building

Capacity building mainly refers to training and knowledge building activities on SDMX at various levels. The following main activities were undertaken or are planned:

  • The release of the User Guide (version 2009.1) on the SDMX website in January 2009; this user guide contains technical and statistical information which makes SDMX better understandable for statisticians and IT experts;
  • The release of standard structure and standard contents for SDMX workshops on the SDMX website which can be used by all statistical organisations who want to organise such workshops;
  • The preparation of a SDMX e-learning package which should be available on the SDMX website in 2010.

3.3.2. SDMX website

The SDMX secretariat and sponsors aim to enrich the contents of the SDMX website, in particular with regard to the better dissemination of existing and used data structure definitions and IT tools. Also the launch of a user forum on SDMX (hosted externally by the Open Data Foundation) is on its way. The latter forum will be co-ordinated by the SDMX sponsors and moderated (on a voluntary basis) by a number of colleagues from national statistical organisations.

3.3.3. Better involvement of domain specific working groups

A landscape of domain specific working groups was drawn-up by the SDMX secretariat which should lead to a more structured involvement of those groups at the level of the sponsoring organisations. Additional concrete actions in this respect still need to be defined.

3.3.4. Liaison to international standard groups

Efforts were started for getting the approval of the International Organisation for Standardisation (ISO) of the SDMX version 2 standards. These efforts need to be followed-up by the secretariat.

3.3.5. SDMX chair and SDMX secretariat

A change in the SDMX chair can be expected for the next two-annual period 2010/2011. A reorganisation of the SDMX secretariat also gets necessary. For both organisational changes, internal discussions are ongoing. These discussions should lead to conclusions in the course of the autumn 2009.

4.  sdmx within the European statistical system

4.1.  Implementation of SDMX within the ESS

4.1.1. Portable IT tools

In the framework of the X-DIS project (XML for Data Interoperability in Statistics) financed by the Commission programme IDABC, Eurostat has developed a number of IT applications supporting the implementation of SDMX:

·  SDMX Registry – a metadata registry which implements the SDMX specifications. This application provides a web-based user interface as well as web services for interacting with the structural metadata objects in use within Eurostat and with statistical partners. It will enable NSIs and other external organisations to obtain metadata such as Data Structure Definitions (DSD), Metadata Structure Definitions (MSD) and Code Lists. The Eurostat SDMX Registry currently installed is populated with the all the SDMX artefacts currently available within Eurostat.

·  Data Structure Wizard (DSW) – a desktop application designed to work with SDMX-compliant registries for editing and viewing structural metadata. The Data Structure Wizard can be used both off-line and on-line, depending on user choices and access rights. The off-line mode is intended to be used for the creation and maintenance of SDMX objects. In the on-line mode, users can interact with any standard-compliant SDMX registry.

·  SDMX Converter – application offering the ability to convert between all the existing formats of the SDMX version 2.0 standard (generic, compact, utility and cross-sectional) as well as GESMES/TS, GESMES/2.1, GESMES/DSIS (SDMX-EDI 2.0) and CSV formats. It can be used as a service in a “service-oriented architecture”, called from a command line script, or the relevant Java classes could be linked into other programs.

·  Business Cycle Clock version 2.0 – an interactive application for dynamic visualisation of short-term economic indicators, which can be fed by data in SDMX-ML format. It is now integrated into Eurostat's web site and a community version is available as OSS in the OSOR repository.

·  X-DIS Visualization Tool version 1.0 – transforming SDMX-ML data (generic and compact SDMX-ML messages) into readable tables in HTML format, using XSLT (Extensible Style sheet Language Transformations).

These tools are available for Member States as Open Source Software under the EUPL Open Source Software licence, as packages containing the application, documentation and source code files; the download links are available from CIRCA and will also be added to the Open Source Observatory and Repository, OSOR (www.osor.eu).

4.1.2. Tools for reference metadata

Other tools are being developed to facilitate the delivery by Member States of SDMX reference metadata. These allow Member States to prepare reference metadata files using an online editor linked to the SODI infrastructure at Eurostat, or offline using a standard template. The reference metadata tools will soon be available but are still at the prototype and test stage:

·  The National Reference Metadata Editor deals with national metadata: through this tool, national producers who are not in a technologically advanced state of producing SDMX-ML files directly, can produce domain-specific metadata and send them to Eurostat via the "single entry point" in SDMX format. The Reference Metadata Editor uses a standard web questionnaire based on metadata concepts and report structures, such as the Euro-SDMX Metadata Structure. It is installed as part of the Eurostat IT environment and will be accessible to Member States as a web application.

·  The SDMX Metadata Template, built in the Metadata Editor, will be also usable in off-line mode for the compilation of national reference metadata.

4.1.3. Structural and reference metadata

In SDMX, "structural metadata" are those metadata acting as identifiers and descriptors of the data, such as names of variables, dimensions of statistical cubes or titles of tables. Structural metadata must be associated with the data, otherwise it becomes impossible to identify, retrieve and browse the data.

"Reference metadata" are metadata that describe the contents and the quality of the statistical data (conceptual metadata, describing the concepts used and their practical implementation, methodological metadata, describing methods used for the generation of the data, and quality metadata, describing the different quality dimensions of the resulting statistics, e.g. timeliness, accuracy). While these reference metadata exist and may be exchanged independent of the data and its structural metadata, they are often linked (“referenced”) to the data.

Structural metadata

As mentioned above, a new version of the SDMX Content-Oriented Guidelines has been released in January 2009. The Annex 2 of this document contains the 9 SDMX recommended cross-domain code lists (e.g 'Sex' and 'Frequency').

In parallel, and based on these SDMX standards and guidelines, Eurostat proceeded with the harmonisation of structural metadata to be used within the ESS. A series of more than 20 Standard Code Lists (SCL) have been released on the Eurostat webpage (under Ramon, the Eurostat metadata server). More harmonised ESS code lists will be released in the months to come in order to further increase the stock of those code lists.

Reference metadata based on the Euro SDMX Metadata Structure (ESMS)

As communicated at the last ITDG meeting, the Euro SDMX Metadata Structure (ESMS) was developed for the European Statistical System. This ESMS contains statistical SDMX concepts related to the production and dissemination of data as well as on their quality. The work on the implementation of the ESMS progressed with the aim to release all Eurostat reference metadata files (around 350 files) in ESMS by the end of 2009.

Released in June 2009, the Commission Recommendation (2009/498/EC) on reference metadata for the ESS recommends the ESMS for the use for the production and dissemination of national reference metadata within the European Statistical System.

Eurostat started to implement the ESMS for national reference metadata collections in converting other existing reference metadata structures into the ESMS structures when the appropriate opportunities arise. Also due to the lower frequency of the collection of this information from countries, this conversion and adaptation process will however take more time.

4.2.  The implementation of SDMX in the Eurostat CVD

The Eurostat CVD (Cycle de Vie des Données, or Data Life Cycle) aims at a fundamental revision in the way Eurostat treats its statistical data, by providing a consistent set of metadata structures and IT tools to be applied in all statistical domains. Therefore, SDMX standards and guidelines play a key role in the whole CVD for enabling the exchange and sharing of data and metadata not only in the transmission of data to Eurostat but also in the exchange between CVD components.

In the CVD, metadata are a basic integrating concept: metadata's role in the statistical production chain is ubiquitous and overwhelming. The achievement of a coherent, integrated and comprehensive set of actions for the statistical production phases of collection, creation and treatment of metadata is therefore the guiding principle which determines the functions and architecture of single CVD components, such as:

§  The Single Entry Point (SEP). The SEP is fully operational and supports the transmission of statistical data from Member States to Eurostat also using SDMX formats.

§  The Metadata Handler (MH). The CVD-MH will provide a single environment to store and manage harmonized structural and reference metadata to be shared by all CVD applications. While the core component of the MH (the Eurostat SDMX registry) is based on the version 2 of SDMX technical standards, other metadata applications (the EMIS database for reference metadata, RAMON and CODED,…) will be made fully SDMX-compliant in 2010.

§  The Reference Environment. The new reference database "Eurobase" is loaded with data from production databases ready to be disseminated and will produce output files in SDMX-ML format for all the data; this is foreseen before the end of 2009.

§  The Eurostat Internet portal, which is the main dissemination tool for Eurostat statistics, integrating all information to be disseminated into a single Internet Site. The Data Explorer and the Table Graph Maps tool, providing common easy-to-use display tools with a new user interface, have already replaced, since April 2009, the former multitude of access tools. They also provide access to the ESMS reference metadata. In the coming months, Eurostat will also provide two options for retrieval of data in SDMX-ML, via the "bulk download" facility (which allows registered users to download all Eurostat datasets) and via web services ("e-services" project).