Summer reflections on the Open Science Cloud

In the conclusions of “open, data-intensive and networked research as a driver for faster and wider innovation” (May 28-29 2015) the Competitiveness Council welcomed “the further development of a European Open Science Cloud that will enable sharing and reuse of research data across disciplines and borders, taking into account relevant legal, security and privacy aspects”.

The Council conclusions offer an opportunity of reflection on the experience gathered by the EGI Cloud and the new challenges and requirements emerging from seven Research Infrastructures EGI cooperates with in the context of the EC funded project EGI-Engage. What has been achieved so far and what is missing to realise an “Open Science Cloud” worldwide?

After slightly more than one year of operations the EGI Cloud federates today 22 cloud providers across 15 European countries. The infrastructure provided so far 770,000 Virtual Machines to research communities.

What has been proven to be a success and what remains to be addressed?

A few thoughts. Firstly, the idea of providing a community platform for the sharing of virtual appliances through the Application DatabaseCloud Markeplace” has proven to be a powerful and secure means for sharing tools and applications across user communities and scientific disciplines. The Application Database allows the instantiation of VMs on a network of cloud providers that adopt open cloud standards and support federated AAI.The trust model adopted gives the certainty to cloud providers that virtual appliances in the library are endorsed by research groups (the so-called Virtual Organizations).

The federated approach to IaaS, is also key. By providing a distributed hosting environment, community-specific web services can be instantiated in a collection of choice of different cloud providers. WMs can be migrated across the federation to provide resiliency and portability, and most importantly, cloud compute services can be offered where big data resides, without the need of moving research data out of the institute premises. Through federation a large choice of hardware platforms and services can be offered.

Portability of applications requires doption of open standards. Across a federation different cloud providers are heterogeneous. The choice of EGI is to support different interfaces of choice, but to promote the adoption of OCCI and CDMI for full interoperability.

The question that arises is: what is needed to scale up capacity and capabilities to enable international collaborations, how should the EGI Cloud evolve? I will mention a few challenges for the sake of brevity – more could be identified [read more].

CHALLENGE 1. Realising a federated approach to research data
Nowadays, research practice is increasingly and in many cases exclusively data driven. Knowledge of how to use tools to manipulate research data, and the availability of e-infrastructures to support them, are foundational. Along with this, new types of communities are forming around interests in digital tools, computing facilities and data repositories. By making infrastructure services, community engagement and training inseparable, existing communities can be empowered by new ways of doing research, and new communities can be created around tools and data.

The Open Science Cloud should provide a Marketplace making open research data, the related tools and knowledge discoverable. The marketplace would federate existing research data sets that are provided by archiving organisations that can ensure compliance to a set of quality standards defined by the marketplace.

CHALLENGE 2. Offering of scalable access to and analysis of research data for reuse
Making data findable is not sufficient. Also, researchers should not need to download locally large datasets before executing their workflows as this would make access to data less efficient and time-consuming because of the lack of a shared approach. Research data must be made easy to access and reuse, this means scalable access – especially in the case of big data that cannot be efficiently downloaded locally. The Open Science Cloud could provide distributed data mirroring and caching capabilities based on federated IaaS cloud storage.

This would allow the Open Science Cloud to realise internationally the Data Commons, extending the practice already adopted by a number of research communities – more importantly in the area genomics, to all disciplines.

CHALLENGE 3. Integrating (shared) tools and applications
Knowledge cannot be extracted from data without the availability of specialised tools and applications (e.g. text mining). The Open Science Cloud would provide a library of community-specific applications and tools. This community platform should be open for publishing to any researcher. For greater specialisation, the Open Science Cloud should provide PaaS and SaaS services that are community-specific and that could be dynamically deployed with a focus on the long tail of science. These services could be provided in the form of managed services by the Research Infrastructures.

CHALLENGE 4. Provide services for depositing data for resource-bound users
Through virtual access the Open Science Cloud will federate infrastructures to provide services for the long tail of science that cannot benefit from these services at institutional and/or national level, but supports open research data.

CHALLENGE 5. Achieving integrated e-infrastructures
The development of an Open Science Cloud with an inclusive governance federated model would avoid duplication of provisioning of ICT services at national and European level. The Open Science Cloud should be developed as a federation of national Cloud Hubs, financially support by the Member States. The role of the EC would be to ensure the persistency of the services that allow the national Cloud Hubs to operate as a federation, and to ensure the coordinated procurement, service provisioning and data brokering according to the requirements of the RIs. This would allow aggregation of demand across Europe, coordinated delivery and the development of economies of scale.
Why a  cloud?

Cloud is a service provisioning paradigm that can support hosting for both data and software tools.
Being based on virtualization, clouds facilitate sharing, reuse and the combined offer of data and tools. Cloud federations enable ‘local hosting’ and ‘control sharing’ capabilities to respect ownership and allow accessibility for distributed communities, in addition the federation approach allows the implementation of hybrid models where private, community and public clouds can be integrated.

How could the Open Science Cloud be organised?

The Open Science Cloud could me managed as a federation of Cloud Hubs, jointly provided through a hybrid approach, involving publicly funded and commercial cloud providers.  A federation of hubs provides an organisational structure that meets European policies, regulations, restrictions and business models, which in some cases do not make the permanent relocation of data into centralised science-domain specific repositories possible, and/or into generic repositories (that integrate data and tools from multiple domains). Within this federation data providers should always retain complete control to their data.
The federated approach allows the implementation of a multi-level governance model where different governing bodies of the Commons can coexist and be integrated.

Who do we expect to operate the service offered by the Cloud Hubs?

Cloud Hub services can be provided in a coordinated fashion by multiple stakeholders, including research communities, research infrastructures and e-Infrastructures. A federator role needs to be established to ensure services are provided in an integrated way according to federated service management best practices and standards. A single interface needs to be provided to end-users.

EGI is keen on engaging with Research Infrastructure, research communities and other e-Infrastructures to jointly discuss these issues. You can read more about our ideas here.

If you are interested in the Open Science Cloud, do not hesitate to comment and share widely!