Open Geography as a concept: building spatial statistics around users

To allow for blinded review:
do NOT indicate author information or affiliation

Keywords: geography, geospatial, open, users, data

1.  Introduction

The use of the term ‘open’ in reference to data is usually used to indicate licensing conditions under which data can be used or reused. Within this narrow definition, there is a risk that publishing ‘open’ can become a box ticking exercise in transparency rather than giving real consideration to how, why or by who geospatial data is being used to support statistics.

Digital disruption has meant that the ways in which users of statistical and geospatial information want to discover, access and consume data has changed. For most users just having an Open Government License is no longer enough. Increasingly, data needs to be designed around a diverse set of user journeys that are complex and can often be conflicting.

2.  Methods

The easiest way for organisations to build understanding user requirement for the integration of statistics and geospatial data is through the metrics available from the dissemination mechanisms. Before the development of the Open Geography concept this would usually be an indication of the number of site visits and would perhaps map these over time to indicate growth or decline in system usage. More recently, the organisation has drawn on a much wider range of metrics to help us understand how our data, products and dissemination mechanisms are being used and to start to build use cases for the data.

Alongside the number of visits to our dissemination systems we are now able to extract information on location (at country or city level), organisation and operating systems to tell us who, where and how users are accessing our data. We can also discover how users are interacting with the site, ie most popular pages, datasets and downloads

Alongside the metrics we can extract from our systems, it is also important that our data, products and dissemination systems are built around what users say they want, not just what we can infer. To do this, we can use a range of engagement formats from traditional email bulletins to user consultations to ensure that we understand the diverse user requirements that exist for spatial statistics.

This also includes a Geography Services User Forum as a way to bring a large number of public sector, academic and commercial stakeholders together into a single room as part of an open dialogue that isn’t possible with traditional consultations. It also means that any answers provided can be discussed and elaborated on rather than having to take responses at face value.

More recently, this stakeholder engagement has been supplemented with a strong social media presence that allows a different, more informal engagement with users. Unlike channels such as email bulletins, the use of blogs and Twitter has allowed users of geographic data to offer instant feedback but perhaps more importantly has allowed ONS geography to ask specific questions of the geographic user community. Not only does using a social media approach to user requirements allow you to ask a different set of questions, the demographic of social media also means that we get a very different set of answers compared with more traditional forms of engagement. We recognise that the more traditional communication approaches still maintain relevance but the more diverse the set of stakeholders that we engage with, and the more wide ranging the set of questions that we ask is, then the more informed we can be for designing our data, products and dissemination systems.

All of this stakeholder engagement builds towards the development of use cases. Using all of the existing communication channels together with the metrics of how the data is being accessed and used allows us to build a more complete picture of who our users are, how they are using the data and what for. The more use cases we can build for the integration of geospatial data into the statistical process, the more value we can then add in to the data, products and dissemination systems.

3.  Results

The engagement that we had with users on their requirements for geospatial information to support statistics revealed that users want more than just open licensing, they want to discover, browse, search, view and download freely available open data products. They also want a range of data formats and increasingly data services. Data sharing capabilities and visualisation of products was also key. The delivery of integrated statistics and geospatial data is the most noticeable shift from our current systems and this need to deliver a more integrated approach then needed to be reflected through our data, products and dissemination systems.

3.1.  Data

As users want a more integrated delivery of statistical and geospatial data, by linking the data and attributes at the point of publication, the linked data approach reduces the burden on the user of statistics to identify and link datasets based on their perceived relationships. Instead, producers of data are able to explicitly define the relationships between data so that distributed datasets across a wide range of organisations can now be analysed as a single dataset, particularly where geographic codes provide a framework for linking this data.

As users begin to get more direct access to data outside of the constraints of software packages, it makes it easier for them to begin linking together datasets, particularly where both the statistical and geographic data is structured as linked data. It has been possible to test some of these links through collaborative, cross government projects to bring together data from a range of agencies and link it together through location to produce an analytical platform that was not previously possible with the ‘product’ approach to data.

Where user requirement for more traditional products remained, we wanted to make that data more easily accessible through the use of data services such as Web Mapping Service (WMS) or Web Feature Service (WFS) APIs. This helps to remove the number of non-authoritative data siloes that exist within organisations who use direct downloads and allows users of statistical data to imbed geospatial data within their systems.

3.2.  Products

By engaging with users we were able to identify specific use cases for statistical-geospatial data. We then developed a series of web applications built around these use cases. This moved us from simply producing generic products for a wide range of users but no real understanding of how they were being used, to smaller, more focussed web app products that could be built quickly and tailored at specific users.

It was also clear from engagement with users that there was both an appetite for more spatial analysis of statistical data as well as a lack of spatial analysis within statistical reporting. To help support the spatial narrative behind statistical datasets we began to develop ‘story maps’ built on free platforms that, like the web apps could help us to provide data to users quickly in a structured fashion to support specific tasks.

3.3 Dissemination

The biggest change implemented through Open Geography was the development of new dissemination mechanisms based on the requirements that users highlighted through our various engagement methods. This has led to the development of two dissemination portals – both based on systems that had been in place previously – but redesigned to make them more specific to the use cases we highlighted for the integration of statistics and geospatial information.

The revised Open Geography portal provides free and open access to the definitive source of geographic products, web applications, story maps, services and APIs. Whilst the previous solution was well received it was based on legacy technology and was comparably expensive. The organisation therefore took a decision to migrate to a more open and generic platform that maintained functionality and service levels, and didn't adversely impact users’ access. The intention was to move away from bespoke development of a single application to an off-the-shelf platform that would allow quick and easy applications to be built from the data available. Having a loosely joined together set of applications and components which could be independently built, deployed and (to a large extent) self-managed allowed us to focus on the user requirements and business priorities without employing large development teams. The emphasis was on configuration rather than development.

The Open Geography portal allows users to find, view, map, style, chart and download or share data. The fresh approach means it is not simply a like for like replacement and the longer term aim is to extend the new portal's capabilities by adding further services and applications using our products.

This system was then supplemented with a Linked Data Portal to support users in working directly with our data. This system takes all of the geospatial data used to support statistics that had previously been split across a number of products and aggregates it into a single data cube based on graph relationships. This data is then made available both through a SPARQL endpoint and HTML interface. The purpose of providing the data through two different entry points is that it supports technical users who understand linked data in embedding the data feeds directly into their systems or being able to run complex queries using the SPAQL query language but it also allows non-technical users, who want to understand spatial statistics at a local level but don’t have the expertise in SPARQL to be able to run complex queries themselves, to come in and work directly with the data so that they can discover, view and download the geospatial data.

Since the launch of the new dissemination systems we have had an average of 7300 sessions of people visiting the geographic portals each month. This has demonstrated that users have identified value in the challenges made to the system. As with the initial redesign of the systems we will continue to evaluate a range of metrics to attempt to identify new users and use cases for the systems so that the systems can continue to simplify and modernise as the requirements of statistical users change.

The user response to these new systems and to the wider concept of Open Geography has been incredibly positive. Not only have we been able to use this positive feedback to demonstrate some return on the investment made in redesigning the systems, but we have also been able to use the feedback constructively, to prioritise the further development of the systems based on the responses of users.

4.  Conclusions

The design of dissemination systems around user journeys has meant that the product and linked data portals provide a cleaner and more modern way of accessing geospatial data based around a diverse range of ways in which users of statistics want to discover, view and access geographic data. Both portals deliver a powerful web resource for data interoperability and metadata harvesting consumable by various clients and API's.

Using the performance metrics of these systems to not only understand how many people are using the portals, but more importantly who is using them, what they are using them for and how this information can be used to support future developments has supported the integration of geospatial data into the Generic Statistical Business Process Model. Users have expressed their preference to access the geographic data from the same place as they access statistics for those geographies and the aim is to move towards a seamless integration of statistical and geospatial data that will allow our users to begin consuming data in flexible ways

References

1