Analysis of existing computational infrastructure for creating a European Health Data Space
"We need to make the most of the potential of e-health to provide high-quality healthcare and reduce inequalities. I want you to work on the creation of a European Health Data Space to promote health-data exchange and support research on new preventive strategies, as well as on treatments, medicines, medical devices and outcomes. As part of this, you should ensure citizens have control over their own personal data"
Ursula von der Leyen, EC President
The European Commission has set a clear vision for creating a European Health Data Space to improve health research and its translation to health care at all levels: from public health to personalised medicine, through the health-data exchange.
HealthyCloud was proposed as a project that will align all the knowledge and expertise in health data spread across European and international actors to set the foundations for the future European Health Research and Innovation Cloud (HRIC) that will become a fundamental part of the European Health Data Space.
“… However, to align and converge all the data-driven initiatives and to respond to the challenges for distributed computational analysis across Europe, we first need to understand the landscape of computing initiatives designed for the processing of sensitive life science data,” says Gergely Sipos, who is contributing to HealthyCloud on behalf of the EGI.
To get that comprehensive overview, Gergely and Mark Dietrich from EGI, together with other project partners, have recently finalised an “Analysis of existing computational infrastructure models including ELSI Version 0.9”. This document presents critical patterns in the construction and operation of Secure Processing Environments (SPEs), identifying key requirements that will inform the development of reference guidelines for a planned European Health Research and Innovation Cloud.
The report provides readers with detailed assessments of more than 13 examples of key SPEs. These examples were drawn from a larger inventory of SPEs from across Europe. “Our report does not present an exhaustive survey of such environments, related statistics regarding the prevalence of existing patterns, or the frequency with which specific technical and governance requirements are found. It instead identifies key types and patterns among the SPEs. Our analysis will help to define the reference guidelines for the establishment of an ethically sound and legal compliant health data research ecosystem, which is necessary for the delivery of the Final Strategic Agenda for the creation of a European Health Research and Innovation Cloud”, says Mark.
The authors drew two main conclusions from their analysis:
First, while the European Union’s General Data Protection Regulation (GDPR) has focussed the public’s attention on the challenges of keeping sensitive personal data private and safe from unauthorised disclosure, in the health care and life sciences domain, these requirements predate GDPR. There is already considerable practical experience in safely and prudently conducting research involving personal health information. At the same time, over the years, a complex legal, regulatory and operational environment has developed, formed by the intersection of three layers of requirements:
- Requirements Imposed by Laws and Regulation and by competent data privacy authorities.
- Requirements Set by one or more Research Institutions, including related ethics guidelines.
- Requirements Imposed by Holders of Controlled Data.
These layers combine to produce privacy requirements that are not consistent across Europe or even across individual countries. Any “reference guideline” for an overarching HRIC must therefore accommodate these different requirements and guarantee that they can all be fulfilled consistently.
Second, this fragmented policy landscape has translated into a fragmented architectural landscape, where several designs for SPEs have emerged, each with slightly different objectives:
- Secure Access to Controlled Data
- Secure Collaboration
- Distributed Computational approaches
- Data Lakes
Of the 13 infrastructures examined in detail, most focus on enabling secure collaboration, a few enable secure access to controlled data, and one each provide “compute-to-data” capability and data lake functions.
Against this backdrop, a possible European Health Research and Innovation Cloud (HRIC) could include a range of capabilities:
- Local SPEs suitable for local health data holders and custodians, e.g. hospitals. They host local identified data, support “data visiting”, limited “compute-to-data” analytics (relying on trusted analytics packages) and controlled data export. As local governance and familiarity allow, they allow a wider range of automated data export capabilities, again relying on trusted packages, enabling more advanced distributed analytics such as federated learning.
- More centralised SPEs (e.g. structured on a national or regional basis, as needed to meet legislative constraints) providing more extensive data analysis services, greater processing capabilities, reference data sets and access to federated data from multi-site projects. In addition to the capabilities of local SPEs, they would offer validated “compute-to-data” analytics and controlled data export, as well as secure data linkage and capabilities to support exploratory analytics similar to current data lake SPEs.
- Trusted repositories of validated and certified data analytics packages.
- Trusted conformance bodies that can assess compliance of SPEs as well as validating common, as well as project-specific, data analytics packages.
- Trusted registries of both controlled data and their Data Controllers, as well as any approved projects and related Study Agreements.
- Secure ways for local and centralised SPEs to access cloud or HPC compute resources to support larger analyses and bigger data sets.
In general, the design of a European Health Research and Innovation Cloud requires effort to create distributed resource management, and governance tools secure enough to satisfy Custodians of the data that might move within this cloud.
Deliverable D5.1 is the first of a series of deliverables by HealthyCloud Work Package 5 that build on each other, culminating in D5.5, the proposed reference guideline for the European Health Research and Innovation Cloud (HRIC). D5.5 will be published in May 2023. WP5’s objective is to “design a decentralised cloud for health data research, exploring existing and foreseen computational solutions, both in terms of infrastructure (hardware) and data management and analysis (software), that will enable ethically sound, technically feasible and legally compliant decentralised computation for the future HRIC ecosystem.”