DIME/ITDG Plenaryfebruary 2018

DIME/ITDG Plenaryfebruary 2018

/ EUROPEAN COMMISSION
EUROSTAT
Directorate B: Methodology; Corporate statistical and IT services
Unit B-1: Methodology and corporate architecture

DIME/ITDG PlenaryFebruary 2018

Directors of Methodology/IT DIRECTORS

PLenary MEeting

22/23 FEBRUARY 2018

ESS roadmap on LOD (Linked Open Data) – state of play

1.Recommendation for action by the ITDG/DIME

The DIME/ITDG is invited to take note of the state of play of developments in the field of Linked Open Data (LOD) and provide guidance for next steps along the questions proposed in section 4.2.

2.Background

On the basis of exploratory work to identify the potential of Linked Open Data for the dissemination of official statistics carried outin 2016 in the context of the DIGICOM project, the DIME/ITDG, at its meeting in February 2017, supported the ESS framework for action and a roadmap in the field of LOD[1]. This comprised the following activities:

  • On the basis of further analysis, development of a proposal for a reference architecture for LOD and for the relevant governance aspects
  • Elaboration of joint pilots (involving several NSIs), focusing on data and metadata (in particular classifications).
  • Development of a community of practice for all NSIs investing in LOD or wishing to learn with a particular focus on:

- Engagement with users and stakeholders

- Training and knowledge sharing

- Guidance on how to design and build an LOD portal and to use it for standards

- Toolkit: evaluation of existing tools, selection of a standard set of tools, recommendationson performance optimisation

With the establishment of the ESSnet on Linked Open Statistics in November 2017, this work can now be planned in more detail, in collaboration with the DIGICOM Work Package 3 team. The ESSnet will also provide an umbrella for coordinating on-going and new work by individual NSIs. Paragraph 3 provides an overview of the steps taken so far and paragraph 4 sets out a roadmap for future activities.

3.State of play

In 2017, a first draft fora reference architecture for Linked Open Data has been elaborated (see section 3.1)

An ESTP course on LOD was conducted in September 2017. The 2-day course was targeted to official statisticians with IT skills who will start to work on LOD. The aim of the course was to provide an overview of the possibilities and challenges on Linked Data technologies in general and in relation to statistics. It was delivered by specialists from Statistics Finland and CSO Ireland, as well as from the University of Helsinki. 15 participants from 12 NSIs attended the training. The evaluation of the training was good. Main questions raised by participants were around techniques and tools as well as privacy and legal issues.

In order to progress with regard to URI policy, Eurostat has conducted a pilot project on providingthe NUTS classification as Linked Open Data, supported by the ISA²programme. An analysis was conducted to define the best approach to assign persistent identifiers and a conceptual model was developed. Several NSIs involved in LOD were consulted on the model. Namespaces, vocabularies, versioning and URI patterns were defined. To process data into RDF format and Geodata, two different aspects were addressed: the LOD itself and the geographic representation of it (single NUTS geographic features). For the first, a workflow was developed which transformed the data according to the data model.This is documented in a GitHub repository[2]. For the Geodata, Eurostat enhanced their existing data dissemination chain for geographic data from full dataset distribution to provide additionally single NUTS records for the geographic distribution.

The ESSnet “Linked Open Statistics”[3]has been established (see section 3.2) and Eurostat launched another pilot project to explore the benefits of providing data in a Linked Open Data format(see section 3.3).In addition, Eurostat proposeda project to be funded under the EuropeanCommission ISA² work programmeto develop during 2018-2020 ESS activities around linked data and metadata.

3.1 Reference architecture

The purpose of a Reference Architecture is to provide a template for statistical organisations in the development of their own LOD capabilities and infrastructure. It shows the organisations what capabilities to acquire and how to organise and structure systems to disseminate statistical information based on LOD concepts/standard.

The reference LOD architecture can be seen a subset of the ESS Reference Architecture. As the latter, it does not have a binding character but helps to spread common views and definitions and to foster collaboration between partners. It can provide a basis for mapping of ESS activities/Proofs of Concept, for kick-starting LOD activities in Member States and the identification of shared/common building blocks/solutions.

The ESS Enterprise Architecture team has prepared a draft documentbased on industry best practices. It comprises

1)Identification of major use cases

2)Scope

3)Definitions

4)Required capabilities

5)Reference Architecture Model

6)Standards

7)LOD Building blocks

8)LOD principles

9)LOD Governance roles

10)Template to map PoC.

The draft document will be refined on the basis of experiments / Proof of Concept currently realised at ESS and Eurostat level and taking into account the work of the ESSnet on Linked Open Statistics. Once elaborated further, the ESS Enterprise Architecture Board will be consulted with a view to have it adopted as a reference document for future ESS work, including steps towards and ESS LOD strategy.

3.2 ESSnet Linked Open Statistics

The ESSnet Linked Open Statistics (LOS) was launched in November 2017. The ESSnet is a consortium of four NSIs – the NSI of Bulgaria (Project Coordinator), INSEE (France), the CSO (Ireland) and ISTAT (Italy), representing different levels of experience in LOD. The work is going to be reinforced by academic experts from the research community and the consortium plans on subcontracting some activities for this purpose.The ESSnet project is running for 18 months, with a budget of EUR 620000.

The overall goals of the ESSnet are to make it easier for National Statistical Institutes (NSIs) to disseminate their output in a more appropriate way for users and to prepare the ESS for the integration of Linked Open Data approaches into the dissemination of official statistics. The key principles of the DIGICOM project – user-centricity and agility – are at the core of the work of the ESSnet towards reaching these objectives.

The pilot, including 4 use cases, will focus on linking data and metadata at local, national and international level as well as on linking statistical data and metadata with non-statistical web-based data – for example, wiki, georeferenced data, etc. The pilots will demonstrate the benefits of this approach for users, but also help to assess risks and costs. First developments will be made with census data.

The ESSnet also aims to stimulate cooperation and provide capacity building for the whole ESS. For this purpose, it will set up a platform for collaborative work among ESS experts on LOD topics and will provide training material and webinars.

3.3 Eurostat pilot and ISA² project

In parallel, Eurostat has launched a 12 months pilot project that will run in close cooperation with the ESSnet. The project will be managed with an agile approach including quarterly releases of the developments.

The project will aim at evaluating the benefits for users from accessingEuropean statistics as Linked Open Data and to shape and develop baseline capabilities in Eurostat. Key aspects to be explored are (1) URIs management (naming, dereferencing, …), reuse of overarching reference ontology models, (2) linking data to external sources and to the Web of data and (3) test tools for visualisation and analytics based on linked data. The project will also explore the opportunities of linked metadata to enhance data integration queries, enabling statisticians and data analysts to discover related data across different physical stores. The pilots will serve to analyse the components/technologies that are already available in the context of LODfocusing on the European level. As a starting point statistical datasets from LFS and SILC domainshave been chosen.

Preparing for scaling up LOD activities and support deployment, Eurostat has submitted a proposal for a project to be financed under European Commission ISA2 program. Activities in scope are among others the definition of a reference ontology for statistical data and metadata and its related governance, the setting of a common platform to expose ESS metadata and ontologies using linked technologies, the development of service and building block for managing these metadata to improve discoverability of EU related statistics, querying EU policies related data across ESS organizations and beyond (the Web of data), developing smart data analytics service using linked technologies. To qualify for the ISA2 program, the developments need to be directed towards reuse in the EU and foster interoperability of EU public administration. A decision about the Eurostat proposal will be taken in March 2018.

4.Roadmap and next steps

4.1 Proposed roadmap

In order to fully benefit from the work of the ESSnet and to build upon its deliverables, it is proposed to develop the strategy in early 2019 instead of 2018 as initially foreseen. The strategy will be oriented around the five key areas indicated in the figure below:

Status

/

Period

/

Activity

Completed

/

March 2017

/

PWC final report including overview of the use cases and Proofs of concept

Completed

/

March 2017- December 2017

/

Further exploration of governance aspects (use of existing governance frameworks or creation of new ones), in particular for URIs, and relevant projects (OpenGovIntelligence).

Pilot project in NUTS classification

Completed

/

September 2017

/

ESTP course on LOD

Completed

/

December 2017

/

Draft reference architecture for LOD

On-going

/

December 2017- April 2019

/

ESSnet LOS: Pilots and their evaluation (metadata and data) involving several NSIs

On-going

/

December 2017- April 2019

/ ESSnet LOS: Creation of a community of practice for all NSIs
- Guidance on how to design LOD portal and for implementation of standards

- Toolkit

- Training

- Engagement with users and stakeholders

On-going

/

January 2018- December 2018

/

Eurostat pilot

Planned

/

April 2018

/

Publication of the 2018 ISA² work programme by European Commission (end of the Eurostat ISA² project proposal evaluation)

Planned

/

Q1-Q2 2019

/

Development of a full ESS LOD strategy integrating early results of pilots and ISA2 opportunities

4.2 Topics for discussion

The DIME-ITDG is invited to:

Approve the next steps of the roadmap, based on the joint framework for action in the field of LOD, agreed in February 2017.

Share their most recent experiences in the field of LOD (stage of adoption, alternative use cases , …) and to

Provide guidance on the following points:

  • What is the need shared development /building blocks –in particular the need for common ontologies for statistical data and metadata (planned to start in 2018 with the ISA² project)?
  • Which are the links/synergies with other initiatives (such as those carried out by the High-Level Group)?
  • Are the existing governance arrangements (comprising the Steering Group, Work Package team and ESSnet and in the framework ofDIGICOM and the ESS Enterprise Architecture Board) sufficient for LOD?

1

[1]

[2]

[3]