Environmental Integrated Data InfrastructurePilot (e-IDI) – a shared infrastructure

BPS Seed Fund Bid - MAY 2016

SUPPORTING DOCUMENTATION

MAY 2016

Table of Contents

Environmental Integrated Data Infrastructure Pilot (e-IDI)– a shared infrastructure

1.Overview

Summary

High level aims

Technical aims

2.What problems are we fixing and what are the solutions and benefits?

Why do we need a new piece of infrastructure for environmental and natural resources data?

What is it that we are we trying to seed and grow?

Why are we starting with a water quantity scope?

Where are the benefits for data providers?

Where are the benefits for data users?

3.What do we plan to do over two years?

What is our overall plan?

What is our detailed plan?

What is out of scope for the e-IDI?

4.How will we deal with the varied information legal and policy settings across agencies?

What law and policy shape the way data is shared in this sector?

What are the information policy drivers for local authorities?

5.What operating principles do we need to work to?

What are the operating principles we need to agree to ensure smooth implementation across many agencies?

6.Architecture principles

What is our approach to designing the architecture?

Environmental Integrated Data Infrastructure Pilot(e-IDI) – a shared infrastructure

1.Overview
Summary
/
High level aims
The e-IDI Pilot will make environmental data more discoverable, shareable, accessible, traceable and aggregable. We will pilot a data infrastructure that users can query and access online. We will use real-time water quantity data to test the concept.
Regional council data providers report that it takes a long time to prepare, standardise and send data. Central government agencies report that they spend unnecessary time cleaning, aggregating and quality-checking data. The e-IDI will allow data providers to be much more efficient at transferring data to users. It will make the transformation - from manual data preparation and transfer - to direct digital search and access.
We expect the e-IDIto also improve productivity for data providers; they will be able to spend more time on value-add activity, such as improving data quality, consistency and representativeness. By building the infrastructure quickly, we hope to realise early efficiency benefits and minimise the transaction burden of drawn-out collaboration on data sharing.
In lowering the barriers to sharing, we hope the e-IDI will prompt agencies to publish their data openly using the NZ Government Open Government Open Access and Licensing framework.
The e-IDI will enable data users (researchers, iwi/Māori, businesses and decision-makers) to query New Zealand’s real-time water data directly. They will be able to easily find what they need.They will also be able to spend more time on using and interpreting the data rather than on finding it and cleaning it.
We will design an infrastructure which much more data can be added and into which many more agencies can join. What we learn from the pilot will be incorporatedinto a Better Business Case for a roll-out nationally, and across the other environmental domains, e.g.climate, air, marine, land and biodiversity.
The infrastructure should be equally applicable across other information domains including property and buildings, and asset management. We are aware of wider interdependencies, including the existing Integrated Data Infrastructure (IDI) housed at StatisticsNZ.
Our literature review shows that many other developed countries are moving in this direction, including Australia, so we will not need to start from scratch.
Technical aims
The e-IDI pilot will test a digital solution for the sector. It will provide a light, distributed and reusable geospatial infrastructure that relieson semantic web technologies. It will provide federated access to a harmonised and collated view of real-time water and soil moisture data. Data will continue to be stored and managed in the current repositories. The pilot will aim to enable:
  1. moving from transferring excel spreadsheets via email to a digital online environment
  2. searchable and accessible real-time data
  3. infrastructure that can be repurposed for other domains
  4. traceable and repeatable data, datasets and model parameters
There are four technical components to the e-IDI Pilot:
  1. an online data access tool
  2. a discovery and access layer that enables agencies toquery and access real-time data
  3. a simple store for the parameters needed to digitally navigate between models and data results, allowing transparency and reproducibility
  4. a simple store for openly available models for use in analysis.
Considering options and designing the detailed system architecture will be developed as part of the project. We will concentrate on using well-established standards, software and methodologies to provide the data standardisation, harmonisation, discovery and access.
Policy aims
An e-IDI will need data-sharing agreements. To be sustainable, it will need to be maintained over time. The policy programme will cover:
  1. development of Memoranda of Understandings for data sharing into and out of the e-IDI
  2. set-up of an interim e-IDI maintenance function
  3. advice to Ministers on how to fund (e.g. pooled-funding) e-IDI maintenance sustainably, and which agency should perform that function.

2.What problems are we fixing and what are the solutions and benefits?
Why do we need a new piece of infrastructure for environmental and natural resources data?
/ There are strong sectoral business drivers for a more technology-capable data infrastructure. New environmental reporting legislation has created a significant, ongoing, demand for data sharing between agencies. This has resulted in a significant compliance burden on up to 100 potential environmental data providers across the country. Cumulatively, the providers are burdened with manually preparing and cleaning data to send to national users. The current approach is to send data via spreadsheets and digital storage devices.
Many of the data collectors are also data users – they need to make environmental management decisions regularly (e.g.regional councils, government agencies). The types of decisions often require that they use data from each other. However, essentially, data is hard to share and what is shared is often difficult to use because it is not interoperable (i.e. can be ‘joined together’).
There is also strong demand from businesses and private individuals for easy to find, accessible, reliable, and up-to-date environmental data. At present, a small amount of the sector’s data is shared on open portals. Much of the environmental data held by councils and agencies is not commercially sensitive or personal.So, in theory, much of it can be made open. However, we find that:
  • Data is hard to share because it is not structured to be interoperable. This means itcannot be aggregated.
  • We commonly use out-of-date information when real-time, or more current data is available.
  • The data is not as useful as it could be because it is not spatial (i.e. referenced to a location).
  • We do not know what data or which models were used to make previous decisions. Therefore, we do not know how to build on previous decisions to improve our decision-making processes.
  • We do not put the data in places where others can find it and use it.
  • Data is not easily sorted into the groupings that users commonly need.

What is it that we are we trying to seed and grow?
/ Ultimately, the sector needs an integrated datainfrastructure to make the change from manual sharing to digital sharing. Seeding and building the infrastructure now will:
  • reduce the transaction costs of collaborating on data infrastructure (i.e. build it quickly - reduce drawn out collaboration costs)
  • take advantage of economies of scale (do the hard work once - use by all)
  • realise early benefits (build it now because we have to build it at some point – bank the benefits early)
  • reduce the friction costs of opening data (make it easy to open – prompt much more to be open that would not be otherwise).

Why are we starting with a water quantity scope?
/ Water quantity data isalready standardised in real-time,making it easier than other domains to test in a shared infrastructure. We hope to also bring in soil moisture data as soon as we can. Further, MfE has a worked-up scope of wider water data needed for managing water in NZ (e.g. including data on water consenting, biological data, and iwi/Māori, cultural and business indicators) and the wider scope is in the process of being standardised.
We anticipate that by 2017,we will be ready to bring in business and iwi/Māori water information users to test their e-IDI water data needs.
On their own, water and soil moisture data are important information domains for New Zealand. The data underpins critical regulatory and business decisions on land productivity, water flow, water takes,flooding and many cultural and recreational decisions. We have described a small portion of potential use cases for water and soil moisture data.
Where are the benefits for data providers?
/ Key data providers are regional and local councils, government agencies, and Crown Research Institutes.Data requests have increased in volume and frequency.In addition to fresh water, there are several other information domains for which a steady transfer of data is required – land, air, climate and marine, with important cross-cutting domains being biodiversity data and cultural data important to iwi and Māori. Data is also used for reporting under the Resource Management Act andmodelling by MPI.If we make data sharing more efficient and digital, we estimate that the savings to regional councils alone are $830,000 per annum.
Where are the benefits for data users?
/ Councils and Governmentregularly make important decisions based on water data. The data is needed for decisions on water allocation and quality, land use and management, mineral permitting, hazards (flooding), building and infrastructure design, transport, utilities, and property, etc. Decisions are currently made on the best data at hand. A wider and higher quality of data can be brought to bear on decisions.
Further, environmental decisions are informed by models. The type of model has significant bearing on decisions. The e-IDI will provide a place to record the models and versions used to underpin decisions, so that over time, we can understand which models give us the best outcomes. By recording the models, we will also be able to converge towards consistency in how we deliver advice to decision-makers.
A model repository will assist to ‘nudge’ the sector towards aligning with standard frameworks, including the ‘Living Standards Framework’ and ecosystem services models.
We estimate the savings to MfE alone on using data for the purpose of environmental reporting to be $350,000 per annum.
Scientific researchers are significant data users and advisors on environmental matters. The science community is often involved in both providing data, and in designing and implementing analytical models for use by decision-makers. MBIE reports that at least 30% of research time is taken up with finding and cleaning data so that it can be used for research purposes. This is an inefficient use of scarce scientific skills. We have not been able to estimate the direct savings to the science community.However, they are likely considerable and cumulative, as more data is added.
Data is also needed by water users in the private sector, particularly in the primary productive sector. In competitive international markets, there is increasing demand on NZ businesses to demonstrate that their practices are sustainable. This can only be done efficiently if there is granular, easily accessible and reproducible data.
Finally, there are many other end-users of environmental data - individuals, landowners, businesses, non-government organisations and iwi and Māori. All of these users need water and soil moisture data for a range of private land management decisions. They also need the data to be able to effectively engage in civil society on matters relating to the environment.
3.What do we plan to do over two years?
What is our overall plan?
/ Remedying the core problems requires, in overview:
  • new data architecture to be developed and agreed
  • new software to be procured as an ongoing service
  • testing the infrastructure so that it handles all data and all agencies and users (including business and iwi/Māori)
  • change management, training, roll-out and uptake by a range of agencies
  • an interim function to be established to maintain the new infrastructure
  • developing policy on where to house an infrastructure maintenance function and how to manage and fund it
  • exploring whether the infrastructure can be exported across other information domains in the local government context
  • building a business case to roll out the pilot to remaining regional councils and other agencies involved in providing data into the environmental information domain (i.e. including territorial authorities’ data).

What is our detailed plan?
/ Year One
  • Design, refine and test the new infrastructure - with Horizons, Landcare and NIWA, and later, with around 4-5 other regional councils. Our pilot will develop an infrastructure across a small set of agencies:
  • organise the data so it is discoverable and accessible – using agreed data architecture
  • structure the data and datasets so that they can be translated digitally – building ontologies and vocabularies
  • structure the models used for common purposes so that they are digitally traceable over time
  • build the software to find and query the data housed in agency databases – using semantic web technology
  • build the software for data linking for common user purposes
  • The approach to building the new architecture will need to be thoroughly tested:
  • against all other options for an infrastructure,
  • using a wide and varied information scope, including soils data and biological data,
  • against real world use, and
  • in a range of agency capabilities.
The approach to building the infrastructure will be iterative, agile and fast. This is the focus in Year 1.We will also test implementation at the end of Year 1.
  • Set up an interim maintenance function for the national infrastructure: each agency currently maintains its own data to common standards, but there is no central role to maintain a national infrastructure. The e-IDI pilot will create an interim function for maintaining the infrastructure to keep the e-IDI going for the duration of the pilot and test how much it will cost in the long term.
  • If NZ wishes to have shared access to its environmental data nationally, the e-IDI maintenance function needs to exist somewhere. It is a systemic issue that sits somewhere between local government, Crown Research Institutes and central government. Solving the problem of where a maintenance function would sit, and how to fund it, requires cross-agency policy work.
We will initiate policy work in year one, provide advice early in Year Two, and implement policy towards the end of Year Two. The options for costs associated with the role and function would be considered as part of the business case.
  • Develop operational policy on access, intellectual property and commercial sensitivity: the shared infrastructure is being designed so that data providers are able to secure and control access to their data. It needs to be agnostic as to whether data is free, sold, or otherwise licensed or proprietary because there are a range of business and security needs that need to be managed long term. Further, contributing agencies need to be very clear about ownership, use and sharing of data.
There will need to be an operational policy stream on developing agreed memoranda of understanding on data sharing; this will be developed early in Year One.
  • Implement the infrastructure in several other regional councils: the pilot will roll out the new infrastructure to the priority land productivity regions. This is to ensure that data improvements are realised in the regions that need water and soil moisture data most urgently. The pilot’s implementation activity includes training and upskilling data providers in the use of new software. Implementation and training represents a one-off cost.This pilot will test the costs of making the e-IDI operational so we can accurately cost the business case.
We know that the costs of set-up and training in each agency are likely to be variable depending on capability so would like to test this in late Year One – early Year Two.
Year Two
  • Test the e-IDI using data beyond water quantity - the infrastructure needs to deal with a wide range of natural resources and environmental data. This part of the bid will test the ease with which the full range of water domain data and other environmental data, including biological data can be incorporated into the e-IDI.
  • Roll-out the e-IDI to other councils if funding permits –if the pilot is under-budget in Year Two, we wouldcontinue to implement the e-IDI in other councils.
  • Develop use cases delivering to businesses and iwi/Māori–the e-IDI will not only benefit environmental regulators and science and researchers. Ultimately, it needs to deliver to business, iwi/Māori, and public users for a range of primary sector and civil engagement uses. We hope to work with NZ Data Futures to catalyse new tools for businesses.
  • Provide advice to government on where to house the infrastructure maintenance function with options for how to fund it –the pilot would provide advice to Ministers on where to house the function along with several different options for funding it. In the meantime, the pilot would need funding to continue for the interim maintenance function until the end of Year Two (this assumes a transition to a permanent home).
  • In the event that the business case is unsuccessful, agencies will consider permanently funding the maintenance function via club funding once we are in a position to assess the real long term costs.
  • Develop a business case by November 2017 in time for Budget 2018/19 - to take the shared infrastructure wider to other environmental domains and demonstrate customer benefits in regulatory, science, iwi/Māori and business decision-making. These use cases will test whether new types of data need to be brought under the shared infrastructure and to test the willingness of business and iwi/Māori data providers to contribute sharable data.