Farr-Administrative Data Research Network-Medical Bioinformatics: Joint eInfrastructure Technical Workshop

Fri 16th January 2015

Farr Institute @ London, UCL

Thirty-five attendees representing fourteen different MRC- and ESRC-funded ‘big data’ projects met for an eInfrastructure Technical Workshop held at Farr@London. The purpose of the meeting was to describe the hardware and software resources and services funded by the capital awards, and to identify common themes and challenges. The MRC and ESRC were represented as was Jisc. This eInfrastructure Workshop and network arose from initial discussions between staff working on Farr, ADRC and MB projects which led to the funding of the Jisc Safe Share project.

The key messages from the workshop were:

  1. Colocation of resources
  2. Sharing best practice for information governance
  3. Open source technology platforms
  4. Capacity building for key IT skills
  5. Coordination

Recommendations:

Funding for training – capacity building for IT staff builds skills required to manage complex eInfrastructure and secure data environments

Support for open source software development – funding to recruit software developers to work on OpenStack and iRODS

Coordination – resource to support meetings, travel and admin for the technology infrastructure group

  1. Colocation and building on existing infrastructure

Most projects have decided to build on existing resources where possible, and to integrate new equipment into existing infrastructure. For example in Swansea the same eInfrastructure supports Farr@Wales, ADRC-Wales, the MB CLIMB project, the Wales Biobank and UKDP. Colocation and extension of existing infrastructure makes best use of the capital funding given the lack of recurrent and operational costs in the awards. Design of systems, procurement and implementation have been supported by central IT teams working with researchers which has led to the success of the projects. However sustainability of resources and continued IT support for services and training is a challenge for institutions.

  1. Sharing best practice for information governance

A discussion on information governance was the most popular break-out group of the meeting. There are a number of models for delivery of data including a ‘walled garden’ approach to identifiable data (UCL), ‘just in time’ data linkage for specific projects (Scotland eDRIS) and large-scale repositories (Wales NDRA). There are different security levels for health and administrative data and the community would benefit from harmonisation between standards.

  1. Open source technology platforms

Two further break-out groups featured open source technologies. Some medical bioinformatics projects are looking to use OpenStack, open source ‘virtual machine’ software, on their hardware (eMedLab, CLIMB, KCL, Cambridge) and there is also a lot of interest from the physical sciences groups. One of the major vendors, OCF, is helping put together an interest group on OpenStack-GPFS integration. There is a need for considerable software development to enable OpenStack to work effectively. This should be resourced across the Farr-ADRN-MB network so that all projects will benefit from developments.

Research data management was the topic for the final break-out group, with Tim Cutts describing the Sanger’s experience with using iRODS. This open source software assists data management and is vendor-agnostic, sitting on top of tiered storage systems. Like OpenStack, iRODS requires further software development to work efficiently. There is an iRODS development consortium which allows members to propose new developments. Sanger have see value from their membership but is is costly (US$35k p.a.). A shared membership across this network was proposed, with contributions from partners, but this requires coordination. A shared approach to negotiating software licencing was raised.

  1. Capacity building for key IT skills

There is a need for training of central-funded IT staff as well as researchers using the infrastructure. Commissioning and maintenance of major IT resources requires considerable skills in systems networking, hardware configuration, software management and development. These staff are often part of university IT groups rather than research teams. Funding which encourages and enables these staff to gain extra training or frees up time for short secondments would be valuable in building skills and retaining staff within the university sector.

  1. Coordination

There was general agreement that this was a very valuable meeting for attendees and that this network should continue. However it was recognised that there is no resource to support this activity beyond the goodwill of the community as it is not in the remit of UK HIRN or ADRN. A number of participants are members of National eInfrastructure groups such as the Project Directors Group (Fergusson, Ainsworth, Calleja, Pallas, Yates), HPC-SIG (Real, Fergusson) or Big Data-SIG (Fergusson, Yates) which helps bring learning from large-scale projects in the physical sciences (STFC/EPSRC-funded). A Jisc group, the NHS-HE Forum, is organised by Malcolm Teague and has a major Information Governance strand. There is, however, a need for some coordination effort for the MRC/ESRC-funded activities in the health and admin data area.

A mailing list has been established by Jisc for the group . Requests to be added to the mailing list should be sent to Malcolm Teague, Jacky Pallas, John Ainsworth or David Fergusson. A follow-up meeting will be held at the end of June, with Paul Calleja, Cambridge, agreeing to host the event.

Farr – ADRN – Medical Bioinformatics

eInfrastructure Workshop

Friday 16 January 2015, 10.00 – 4.00

Farr Institute, 222 Euston Road, London, NW1 2DA

AGENDA

10.00 – 10.30Registration and networking (coffee and pastries)

10.30 – 10.35Introduction (Jacky Pallas)

10.35-10.45Genomics England and KCL (Tim Hubbard)

10.45 – 11.15Farr eInfrastructure projects

John Ainsworth, Manchester, Farr HeRC

Anthony Peacock, UCL, Farr London

Simon Thompson, Swansea, Farr Wales

Steve Pavis, NHS Scotland, Farr Scotland

11.15 – 11.45Administrative Data Research Network and Centres

Steve Pavis, NHS Scotland, ADRC-Scotland

Simon Thompson, Swansea, ADRC Wales

11.45 – 12.30Medical Bioinformatics projects

Simon Thompson, Birmingham, CLIMB

David Ferguson, Crick, eMedLab

Sarah Butcher, Imperial, Imperial MB

Josh Randall, Sanger, Uganda

Dave Golding and Tom Fleming, Leeds, Leeds MB

12.30 – 1.30Lunch

1.30 – 2.30Break-out groups

Information governance/identifiable data

Software/OS/hardware design (incl. OpenStack)

Data management, iRODS

3.00 – 4.00Reports from break out groups

Farr Institute, Admin Data Research Network, Medical Bioinformatics

Attendees

Name / Affiliation / Projects
Jacky Pallas / UCL / Farr London, MB eMedLab, ADRC-E, Safe Share
Matt Ismail / Warwick / MB CLIMB
Marius Bakke / Warwick / MB CLIMB
Michael Lay / Oxford / CTSU, Safe Share
Paul Calleja / Cambridge / CRI
Mark Pitman / MRC
Thomas Fleming / Leeds / MB Leeds
Dave Golding / Leeds / MB Leeds
Philip Hobley / Leeds / MB Leeds
Richard Christie / QMUL / Farr London, MB eMedLab
Bruno Silva / Crick / MB eMedLab
Nicki Carter / Sanger
John Ainsworth / Manchester / Farr HeRC, Safe Share
Josh Randall / Sanger / MB Uganda
Tim Hubbard / KCL/Genomics England / GeL, MB eMedLab
Alan Real / Leeds / Farr HeRC, MB Leeds, Safe Share
Francesco Giannoccaro / Public Health England
Simon Thompson / Birmingham / MB CLIMB
Malcolm Teague / Jisc / Safe Share
EkateriniBlaveri / MRC
Tanvi Desai (apologies) / Essex / ADRN
Ed Conley / UK HIRN / Farr
Anthony Peacock / UCL / Farr London, ADRC-E
Steve Pavis / NHS Edinburgh / ADRC-S, Farr Scotland
Jeremy Yates / STFC/UCL / DiRAC
Sarah Butcher / Imperial / MB Imperial
James Abbott / Imperial / MB Imperial
Jake Pearce / Imperial / MB Imperial
David Fergusson / Crick / MB eMedLab, Safe Share
Simon Thompson / Swansea / Farr Wales, ADRC-W, MB CLIMB, Safe Share
Andy Cafferkey / EBI / MB eMedLab
Rhys Smith / Jisc / Safe Share
Maria Sigala / ESRC
Tim Cutts / Sanger / MB eMedLab
Arne Wolters / Essex/Health Foundation / ADRN
Thomas Stewart / Public Health England
Ally Hume / Edinburgh / ADRC-S, Farr Scotland

ADRC/N Administrative Data Research Centre/Network

MB Medical Bioinformatics

CRI Clinical Research Infrastructure

GeL Genomics England

HIRN Health Informatics Research Network

DiRAC Distributed Research utilising Advanced Computing (particle physics, astronomy and cosmology)