Doc. Eurostat/ITDG/October 2010/3.3.d
IT Directors' Group
19 and 20 October 2010
BECH Building, 5, rue Alphonse Weicker, Luxembourg-Kirchberg
Room AMPÈRE
Census Hub
Progress report
Item 3.3.d of the agenda
1
Census Hub progress report
1.Purpose of this document
The aim of this document is to report on the progress of the ongoing Census Hub pilot project in order to summarize the issues encountered or identified until now, as well as threats and opportunities.
The ITDG is asked to:
- recognise the progress made;
- comment on issues of special interest;
- discuss the achievements and the problems when dealing with Census Hub project;
- discuss actions stimulating the implementation of Census Hub and similar ESS wide projects at national statistical organisations.
2.Background
For the first time, the European Union has a legislation aiming at the availability of harmonised high-quality data from the population and housing censuses conducted in the EU Member States in 2011. It is a common ambition of the ESS to disseminate the results in a way that provides the users with easy access to detailed data that are structured in the same way and methodologically comparable between the EU Member States.
Major steps have been made:
- Regulation (EC) No 763/2008 of the European Parliament and of the Council on population and housing censuses has been adopted in August 2008.
- Commission Regulation (EC) No 1201/2009 on the technical specifications of the topics and breakdowns has been adopted in December 2009.
- Commission Regulation (EU) 519/2010 on the EU programme of data and metadata for the 2011 censuses has been adopted in June 2010, after the ESSC issued a positive opinion in February.
- The Draft regulation on the quality assessment has been discussed at the Census Working Group and will be presented to the ESSC in October 2010.
- Eurostat co-operates with the NSIs to concludedevelopment and installCensus Hub dissemination platform for Census results. The EESC stressed that the census hub is a test case for Eurostat's vision.
The EU legislation on censuses is strictly output oriented: Member States are free to use the data sources and methodologies they consider the best for them to produce harmonised Census data. They are encouraged to investigate innovative solutions on how information from different data sources, including administrative registers, can be linked to provide the required statistics. This way, Member States can reduce the burden to respondents and administration, and improve the efficiency of the statistical production of census data.
Prior to the technical tools, the definition of harmonised concepts for structural metadata was essential: 25 harmonised code lists have been defined in the context of the project "Standard Code Lists" (SCL). These lists will be used to define the multi-dimensional datasets for the next Census.
3.Innovative transmission and dissemination of Census data
To take on the challenge, Eurostat launched the EU Census Hub project.On the basis of the EU census legislation, a modern and innovative technical solution for the transmission and dissemination of census data is beingdeveloped counting a number of advantages:
- high geographical resolution and possibility to cross tabulate data offered to the user to the maximum possible extent;
- user-friendly dissemination tool;
- favourable speed of access to the census data;
- NSIs keep control over own data;
- NSIs reuse the IT platform that they already employfor their national purposes with the added advantage of using SDMX standards (harmonised concepts, definitions and specifications);
- in the case of revisions or updates, NSI will just have to upload the new data in their own system;
- SDMX Infrastructure built for the Census Hub project can be reused in other statistical domains with few or no changes.
The hub approach offers an efficient solution meeting the requirements for dissemination of the 2011 Census data at EU level:
- Data providers can:
notify the hub of new sets of data and corresponding structural metadata (measures, dimension, code lists, etc.);
make data available directly from their systems through a querying system.
- Data users can:
browse the hub to define a dataset of interest;
retrieve the dataset from the NSIs.
4.The Census Hub SDMX Infrastructure
The European Census Hub is the proposal of a conceptually new system to achieve the dissemination of the 2011 Census data via the Eurostat website. It is based on the concept of data sharing, where a group of partners agree on providing access to their data according to standard processes, formats and technology.
The European Census Hub can be divided in two parts:
- central Hub, Eurostat side
- SDMX NSI Infrastructure
From the data management point of view, the hub is based on the hypercubes[1] agreed upon in the EU programme of 2011 Census data.The hypercubes are not sent to the central system. Instead, data are stored in the national data warehouses and are fetched when a data user requires them.
For this purpose, SDMX standards will be used. In fact,SDMX infrastructuresupports both the Hub and the Pull approach through the use of SDMX Query[2] messages and Web Services[3] technologies.
The scope of this project is to build the Hubinfrastructurehosted in Eurostat and to create a favourable environment forsettingupSDMX IT infrastructures in the Member States.The SDMX standards, besides defining standard data and metadata structures, allow the definition of a particular service infrastructure for data exchange. Each organisation can develop its own infrastructure or components or alternatively use components taken from other solutions.To that end, Eurostat provides the SDMX NSI infrastructure that could be used by interested NSIs.
The project aims at three main objectives:
- to develop the Hub Web application that will act as a client towards MSs' Web services;
- to support with technical advice the implementation of the NSIs IT infrastructure;
- to facilitate the sharing of software among countries involved in the exercise.
SDMX provides guidelines and tools to support the "pull" mode of data sharing, where the collecting organisation retrieves the data from the providers' websites. The data may be made available for download in a SDMX-conformingfile, or they may be retrieved from a database in response to an SDMX-conformingquery. In both cases, the data are made available to any organisation (with access right) requiring them, in formats thatensure that the data are consistently described by appropriate metadata, whose meaning is common to all parties in the exchange.
This infrastructure planned for implementation of the European Census Hub isas follows:
To satisfy user query requirements, the following process takes place:
- To generate tables, auser defines a dataset (subset of a hypercube) through the Web Graphical User Interface; then browses the dimensions and selects a dataset. Then chooses the output layout specifying which dimension will match X-axis and Y-axis, and which dimension will vary among tables.
- Taking into account the user' choices, the central Hub constructs one or more SDMX queries that are sent to the related NSIs’ SDMX Infrastructure;
- The NSIs’ SDMX Infrastructure NSIs systems process the query,extract from the NSIs’ warehouse the requested data and send it to the central Data Hub as an SDMX-ML data[4] message.
- The central Hub assembles all the SDMX-ML data messages originated in NSIs and presents the result to the user in the web browser in a readable format.
5.Reusability of the Solution, Quality and Cost benefits
The Census users will benefit from a number of quality improvements:
- A single point of access to Census data coming from all European countries.
- The cross-country aerial dimension can be crossed against other dimensions of the data hypercubes.
Costs for implementing the SDMX infrastructure needed for the Census Hub project are also very limited. The use of an XML-based data format will help to reduce costs of implementation as follows:
- many NSIs are already using or planning to use XML as the basis for their data management and dissemination systems;
- wide selection of IT commercial applications and tools are available to work with XML-based data;
- expertise for working with XML is readily available and will often be available in-house.
The pilot phase has clearly demonstrated that sharing experiences among Eurostat and NSIs, and reusing the software developed in other SDMX projects or available in the “SDMX community” can dramatically reduce development costs.
To this end, Eurostat has been actively contributing as follows:
- Developing and making available theSDMX NSI Infrastructureas open source download. The source code can be used in its entirety, orcomponents can be easily integrated into own IT systems in statistical organisations;
- Financingthe ESSNet on SDMX, where some member Statesare working to produce software and best practises to be shared with other countries;
- Providingon-line tutorialsand training sessions on SDMX for statisticians and IT staff.
The following is considered to be beneficial:
- SDMX infrastructure (of the Census Hub project) can be reused in other statistical domains with few or no changes.
- actual Census Hub infrastructure could be used for dissemination of other NSI data with the added advantage of using SDMX standards recognized at international level.
- Participants are part of a project that will allow sharing experiences among the different actors, both statisticians and IT personnel, of different project stages (planning, production, etc.) at international level.
6.The Project Status
6.1.Phase 1 (2008)
The first phase of the pilot project consisted in the development of the prototype of the central hub, and the installation of NSIs modules, with the main objectives of testing the “data flow” with the peripheral web services and an appropriate graphical user interface. NSIs developed their web services allowing the access to their data warehouse by external applications.
Each involved MemberState(Germany, Ireland, Italy and Portugal) produced a document specifying their experience during the pilot, the support costs and the gained benefits. Those documents are available on CIRCA and can be used as case studies.
Eurostat produced the Census Hub Web Service Implementing Guidelines version 1.0. This document explains how to build web services as part of an overall SDMX infrastructure, dealing with topics such as the approach to follow when different IT technologies (JAVA and .NET) are used and how to handle errors.
6.2.Phase 2 (2009)
The main milestones for the pilot phase 2 were the following:
–Involving more Member States in the project. In order to facilitate this process, Eurostat launched an action to support SDMX implementation in Member States. The purpose of this action was to provide support to Member States in the area of SDMX, focusing on providing technical advice in implementing the SDMX infrastructure, and contributing with open source components to a generic infrastructure for NSIs.
–Developing and testing additional functionalities for the central hub: new Graphical User Interface, Cache system.
6.3.Phase 3 (2010)
The main milestone for the pilot phase 3 is the consolidation of the SDMX Hub software, together with the specification of the Data Structure Definitions (DSD) needed for the 60 hypercubes defined by the EU Regulation.
Another important step has been the setup of a Census Hub IT Working Group which met on 2-3 June 2010 for the first time. This new technical working group will allow Eurostat to present the further steps for improving the pilot exercise and move forward to a “production phase”. The Census IT WG will meet tentatively twice a year.
6.4.Phase 4 (2011)
The main objective for the phase 4 is to implement a preparatory phase to the 2011 Census Hub through:
– A consolidation and extension of the architecture developed in the pilot, in order to handle correctly SDMX Data Structure Definitions representing the description of all the hypercubes defined in the Census regulation, using the latest SDMX 2.1 technical specifications being finalised.
– Creating a suitable technical documentation in order to facilitate the hand-over to a new contractor in 2011.
– Preparatory actions which are needed to support the rest of the development tasks, such as setting the necessary development and test environments before the Census Hub is finally made operational in 2012-2013.
– Conduct of a feasibility study on how to generalise the current implementation of the Census Hub so that it could be used for national purposes and in other domains.
6.5.Census Hub implementation status by country
The following is a short account of the work progress in each national statistical office, as of August 2010. Eurostat is dedicating more resources to assisting national advancement on the Census Hub project with training actions and through direct support to the installation of the national infrastructure, when this is requested by countries.
Austria –interested in participating but Statistics Austria cannot directly use the Census Hub reference infrastructure as the Restful API is not currently supported by the Hub. Eurostat is providing more technical details to Statistics Austria's technical consultants.
Belgium – expressed its interest;ongoing communication with Eurostat in order to get knowledge of the SDMX NSI Reference infrastructure and its modules.
Bulgaria – expressed its interest to participate.
Cyprus – expressed its interest to participate. A first bilateral meeting was organized during the Census IT Working Group (1-2 June).
CzechRepublic – fully integrated into the Census Hub.
Denmark–no declared interest in participating.
Estonia – expressed its interest to participate.
Finland – considering the use of the infrastructure service developed by Eurostat for the Census Hub implementation, also using the results of the ESSnet project on the integration between PC-AXIS and SDMX.
France– expressed its interest to participate.French dissemination cannot start before 2014.
Germany – fully integrated into the Census Hub.
Greece– expressed its interest to get more information with a view to participating.
Hungary – expressed its interest to participate and requested starting package and to install software.
Ireland – fully integrated into the Census Hub.
Italy – fully integrated into the Census Hub.
Latvia – investigating the issue and all possible options for transmitting census data to Eurostat.
Liechtenstein – considering the use of the infrastructure service developed by Eurostat for the Census Hub implementation, also using the results of the ESSnet project on the integration between PC-AXIS and SDMX.
Lithuania – expressed its interest to participate.
Luxembourg –after a bilateral meeting with Eurostat, it is currently testing the Hub SDMX infrastructure before a full integration.
Malta – expressed its interest to participate.
Netherlands– Expressed its interest but a decision has been delayed to 2011. In the meantime, there is an ongoing communication with Eurostat in order to get a better knowledge of the SDMX NSI Reference infrastructure and its modules.
Norway – expressed its interest to participate: a bilateral meeting is foreseen. Considering the use of the infrastructure service developed by Eurostat for the Census Hub implementation, also using the results of the ESSnet project on the integration between PC-AXIS and SDMX.
Poland – fully integrated into the Census Hub.
Portugal – fully integrated into the Census Hub.
Romania– expressed its interest to participate.
Spain – expressed its interest to participate.
Slovakia – expressed its interest to participate.
Slovenia – expressed its interest to participate.
Sweden – bilateral meeting is foreseen. Considering the use of the infrastructure service developed by Eurostat for the Census Hub implementation, also using the results of the ESSnet project on the integration between PC-AXIS and SDMX.
United Kingdom– expressed its interest but a decision has been delayed to 2011.
7.CONCLUSIONS – current status
- General acceptance: as of August2010,fifteen countries are participating or working on the pilot project. Among them, six NSIs (CzechRepublic, Germany, Ireland, Italy, Poland, and Portugal) have put in place the whole SDMX infrastructure. All but one (Denmark) expressed interest in participation.
- Need for working together. Participants will be part of a project that shares experiences among different actors, both statisticians and IT personnel at different levels (planning, production, etc.);
- Participants will build an IT infrastructure useful for their 2011 census data warehouse using SDMX standards.The data sharing infrastructure based on SDMX is one of the most advanced: therefore, NSIs can obtain– through the Census Hub infrastructure – an advanced knowledge of SDMX standardsand valuable experience in managing complex IT projects.
- The Census Hub project will enable countries to get resource savings due to the implementation of a common SDMX infrastructurewhich can be re-used in other statistical domains with few or no changes.
Doc. Eurostat/ITDG/October 2010/3.3.d1
[1] Hypercube: a generalization of a cube in more than three dimensions. Representation of statistical data tables.
[2] SDMX Query is one of the messages specified in the SDMX standards that allow querying statistical database using structure metadata. Therefore does not depend on the technical solution or product used.
[3]Technology enabling applications to communicate with each other via Internet, independently of the platforms and the programming languages on which they are built.
[4] SDMX-ML data message is an XML representation of a dataset