StatDCAT-AP: A common metadata layer for making statistical data more visible, open and linkable
Makx Dekkers ()[1],
Chris Nelson () [2],
Marco Pellegrino () [3],
Nikolaos Loutas () [4],
Norbert Hohn () [5],
Vasilios Peristeras () [6]
Keywords:linked open data, data portals, catalogue, statistics, DCAT-AP, RDF, SDMX.
1.Introduction
Open data portals have been established throughout Europe. It has been observed that as statistical data is of great interest, for example for decision-making and research purposes, the catalogues of open data portals include numerous statistical datasets. The StatDCAT Application Profile (StatDCAT-AP) is an extension of the DCAT-AP open data standard andsupportstheintegration of the descriptive metadata of statistical datasets in the catalogues of open data portals, hence improving the visibility and discoverability of statistical datasets.
The StatDCAT-AP has been developed by a working group co-chaired by Eurostat and the EU Publication Office, within the framework of the ISA2 programme of the European Commission. The creation process of the new specification wasopen, transparent, visible to the public, and involvedthe main stakeholders to reach consensus in an open collaboration. This collaborative work took place in a wider context, both on the European level with the Directive on the re-use of Public Sector Information, and on the global level with the G8 Open Data Charter. At the same time, it appliedand exploited the technical standards developed by W3C towards a globally interoperable environment of Linked Open Data.
Building upon these two pillars, on one hand subscribing to the organisational goals to open up public data for reuse, and on the other hand applying the emerging technologies that facilitate linking data together, StatDCAT-AP aims to improve the opportunities for discovery and reuse of statistical data fromthe wide audienceusing open government data portals. In this context, the use of transformation mechanisms allows organisations using existing statistical standards for data and metadata exchange,such as SDMX,toalign their standard with the StatDCAT-AP in a much easier manner.
After a period of public review in summer-autumn 2016, StatDCAT version 1was published at the end of 2016 and has been endorsed by the EU member states in the context of the ISA2 Programme.
2.Building onthe DCAT-AP
The DCAT-AP is a specification based on W3C's Data Catalogue vocabulary (DCAT) for describing public sector datasets in Europe [1]. The development of DCAT-AP was a joint initiative of DG CONNECT, the EU Publications Office and the ISA Programme. The specification was elaborated by a multi-disciplinary Working Group with representatives from 16 European Member States, European Institutions and the US.
The DCAT-AP data model includes the following main entities:
- The Catalogue: this represents a collection of Datasets. It is defined in the DCAT Recommendation as “a curated collection of metadata about datasets”.
- The Catalogue Record: DCAT defines this as “a record in a data catalog, de-scribing a single dataset”. The Catalogue Record enables statements about the description of a Dataset rather than about the Dataset itself.
- The Dataset: this represents the published information. It is defined as “a collection of data, published or curated by a single agent, and available for access or download in one or more formats”.
- The Distribution: this, according to DCAT, “represents a specific available form of a dataset. Each dataset might be available in different forms, and these forms might represent different formats of the dataset or different endpoints. Examples of distributions include a downloadable CSV file, an API or an RSS feed”.
Figure 1 - DCAT main entities
The basic use case of DCAT-AP is tomake public sector data better searchable across borders and sectors, by enabling a cross-data portal search for datasets. The cross-data portal search is enabled by different actors. Metadata brokers exchange the descriptions of datasets created by data providers on one or more data portals. There are two enabling conditions behind this metadata flow. First, the data portals maintain a data catalogue including a collection of datasets and make the description metadata of the datasets in their collection freely available. Second, in order to maximise the interoperability, these descriptions should adhere to the specifications of the DCAT-AP for metadata. Thanks to the two conditions, a metadata broker can harvest catalogues of metadata from data portals and deliversthe description metadata in a validated and harmonised manner to data consumers.
This is shown in Figure 2.
Figure 2 - DCAT-AP basic use case: enable a search for datasets across various data portals
The data model of version 1.1 of DCAT-AP is available at full version of the application profile is posted on Joinup, the collaborative platform of the European Commission funded by ISA2 Programme[7].
3.What is The StatDCAT-AP
The StatDCAT Application Profile is an extension of the DCAT-AP, whose purpose is to achieve better integration of the descriptive metadata of statistical datasets in the catalogues of open data portals, hence improving the visibility and discoverability of statistical datasets.It is a common layer for the exchange of statistical metadata for a wide range of dataset types. This creates the opportunity for professional communities to hook onto the emerging landscape of interoperable portals by aligning with the common exchange format.
StatDCAT-AP defines a small number of additions to the DCAT-AP model that are particularly relevant for statistical datasets. Given that there are many statistical datasets that are of interest to the general data portals and their users, it is likely that by recognising and exposing the additions to DCAT-AP proposed by StatDCAT-AP general data portals will be able to provide enhanced services for collections of statistical data.The additions to the DCAT-AP concern a number of requirements for the description of statistical datasets, as listed below:
- Attributes and Dimensions:
- stat:attribute: Attributes enable specification of the decimals, any scaling factors and metadata such as the status of the observation (e.g. estimated, provisional).
- stat:dimension: Examples of dimensions include the time to which the observation applies, or a geographic region which the observation covers.
- Quality:
- dqv:hasQualityAnnotation:A statement related to quality of the Dataset, including rating, quality certificate, feedback that can be associated to datasets or distributions.
- Visualisation:
- This property is the nature or genre of the resource. The property is to be used to indicate the type of a Distribution, in particular when the Distribution is a visualisation.
- Other extensions such as an expression of the number of data series or unit of measurement.
4.The future: How to produce StatDCAT-AP metadata
StatDCAT-AP focuses on metadata elements that contribute to data discovery, encouraging the use of common controlled vocabularies and the re-use of metadata from existing repositories.
In the recent past, seven international organisations that are producing and coordinating the dissemination and sharing of statistical data, including Eurostat, defined and adopted the SDMX standard for data and metadata exchange, which is now an ISO standard (IS-17369). By harmonisingthe metadata descriptions provided by SDMX (e.g. data structures, standard code lists, quality descriptions and methodology) and open data standards, both worlds get better connected, improving at the end the discoverability of statistical datasets.
Therefore, StatDCAT-AP also includes a section describing the mapping of StatDCAT-AP to the SDMX Information Model. This is achieved by means of schematic diagrams of the SDMX Information Model and through a worked example where the SDMX-ML content is mapped to the classes and properties of DCAT-AP.
Figure 3 - StatDCAT-AP Model mapped to SDMX Model Classes
The intent of this mapping is twofold:
- It enables those organisations that are using SDMX to know which metadata structures to use in order to create StatDCAT-AP directly from existing SDMX metadata repositories (such as an SDMX Registry).
- It enables organisations that wish to use SDMX structural metadata as the format for a Transformation Mechanism to know which SDMX element or attribute maps to which StatDCAT-AP class or property.
A dissemination chain based on SDMX data descriptions is also able to produce StatDCAT-AP descriptions through a simple transformation.
The StatDCAP-AP specification contains more technical documentation about these aspects, which are relevant for organisations using SDMX infrastructures. SDMX is one of the main standards currently in use in the statistics field and this explains the focus on the SDMX mappings.Nevertheless, we actually expect more transformations to become available in the future, as the architecture of the StatDCAT-AP transformation mechanism could be easily used for DDI or CSV transformations. Some examples and pilot implementations are expectedto be produced and documented in the near future.
The work for the development of StatDCAT-AP was conducted in a transparent manner, publicly visible and interactive. The development was facilitated and moved forward as a result of the establishment of the StatDCAT-AP working group and the involvement of the main stakeholders towards reaching consensus in an open collaboration. The same open group remains responsible for the maintenance and future revisions of the specification under the process set up and led by the ISA2 Programme.
5.References
[1]Fadi Maali, Richard Cyganiak, Vassilios Peristeras, Enabling Interoperability of Government Data Catalogues, Lecture Notes in Computer Science, Vol. 6228, pp. 339-350, Springer, 2010
[2]European Commission. ISA – Interoperability Solutions for European Public Administrations.
[3]European Commission. ISA – DCAT Application Profile for data portals in Europe.
[4]StatDCAT-AP:
[5]SDMX:
[6]DIGICOM (European Statistical System's project for Digital communication, User analytics and Innovative products):
[7]EU Open Data Portal:
[8]European Data Portal:
1
[1]AMI Consult Sàrl, BP 1028, 1010 Luxembourg, Luxembourg
[2]Metadata Technology Ltd, 46 Bridge St, Godalming GU7 1HL, UK
[3]European Commission, Eurostat, Joseph Bech building 5, Rue Alphonse Weicker, Luxembourg 2721, Luxembourg
[4] PwC EU Services, Woluwedal 18, Sint-Stevens-Woluwe, 1932, Belgium
[5]European Union Publications Office, Rue Mercier 2, Luxembourg 2985, Luxembourg
[6]International Hellenic University, School of Science and Technology, 14 km Thessaloniki-Moudanion, Thermi, 57001, Greece
[7]