Outcomes of the INDIGO-DataCloud project

Davide Salomoni and Giacinto Donvito on how INDIGO lives up to the Better Software for Better Science motto

INDIGO-DataCloud is an EU-funded project that ran with the objective of developing a new cloud software platform for the scientific community. With this in mind, the team developed tools to facilitate the exploitation of distributed cloud and storage resources through public or private infrastructures.

The project’s 30 months were exciting and ripe with results. We believe that the foundations laid by INDIGO will continue to find proper development and adoption in a wide variety of fields, public and private, at the service of science and for the benefit of the overall general public. The key achievements are:

1) The involvement of scientific user communities to define and track their requirements: the INDIGO team categorised  requests, identified requirements and classified them into three areas: storage, computational and infrastructural.

2) The identification of technology gaps linked to concrete use cases. These gaps helped the team to validate the technical implementations and to define the INDIGO technical architecture: a modular framework, fully based on open standards, covering all areas of the cloud stack (IaaS, PaaS, SaaS).

3) Two major software releases: MidnightBlue (announced in August 2016) and ElectricIndigo (in April 2017). ElectricIndigo now consists of about 40 open modular components, 50 Docker containers and 170 software packages, all supporting up-to-date open operating systems. This result was accomplished by exploiting key European know-how, reusing and extending open source software and contributing code to upstream projects.

4) The release of two service catalogues: a short one, with a high-level description of the INDIGO solutions, and a longer version, with details about components and reports of sample applications.

5) The creation of two large distributed testbeds to support development activities and pre-production applications. The testbeds allowed the communi-ties to integrate INDIGO components into scientific applications now deployed in production over public or private infrastructures. So far, scientific communities that integrated INDIGO components belong to the domains of life sciences, physics, structural biology, earth sciences, physics and cultural heritage, among others.

6) The establishment of collaborations with IBM, ATOS and T-Systems: the INDIGO team worked with industry leaders to facilitate the adoption and enhancement of INDIGO components.

7) The participation to the EOSC-hub project: the INDIGO team will nominate the project’s Technical Coordinator. INDIGO will also contribute to the EOSC-hub service catalogue with many of its components: identity and access management, token translation, virtual filesystems (Onedata), advanced IaaS services, neutral access to heterogeneous Cloud resources (Infrastructure Manager), web frontend services and user-level containers.

8) The positive evaluation of two spin-off projects: eXtreme-DataCloud and DEEP-HybridDataCloud, due to start in late 2017 / early 2018. These projects will continue to develop many INDIGO components in areas such as data lifecycle management, smart caching, flexible metadata management for big data sets, PaaS-level access to HPC resources and real-time, streaming-based data ingestion and processing. We expect that these developments, once matured to production level, will eventually find a place in the European Open Science Cloud service catalogue, to further enhance and facilitate the work of scientists and resource providers.

More information

Davide Salomoni and Giacinto Donvito led the INDIGO-DataCloud project.