Introducing the EGI DataHub prototype

Matthew Viljoen gives an overview of an upcoming EGI service

EGI is proud to announce the launch of the EGI DataHub, a service prototype designed to make data discoverable and available in an easy way across all EGI federated resources.

The EGI DataHub allows users to make their data available using different levels of access: from completely unrestricted open access to open data to authenticated access to closed data sets. This is possible as a result of the seamless integration with the EGI AAI service.

The data hosted on the EGI DataHub can be readily accessible by cloud Virtual Machines (VMs) or running grid jobs thanks to full integration with EGI Federated Cloud and High-Throughput compute resources. The use of protocols such as POSIX and web services guarantees easy and scalable access to data from cloud and HTC applications. This ensures maximum compatibility with existing applications and minimum hassle for developers and users alike.

The EGI DataHub is built on top of the EGI Open Data Platform using Onedata technology to connect a wide range of existing storage services, regardless of their underlying technology (e.g. Lustre, Amazon S3, Ceph, NFS, or dCache).

In complementing existing HTC (High Throughput Computing) EGI resources and the EGI Fedcloud (the existing cloud federation offered by EGI) with the new EGI DataHub, exciting new possibilities are opened for data intensive computing:

  • Cloud computing and HTC resources using fast and scalable datasets
  • Tiered storage: fast local cache and access to large remote data providers
  • Complex orchestration and workflows involving computing and data discovery, ensuring locality of data if needed for efficient processing

A first preview of the EGI DataHub was launched at Krakow, Poland during the DI4R event.  This included a training event of the technology using the EGI Fedcloud, ran by Diego Scardaci from EGI. This session proved to be popular and will be repeated at future events.

More information about the EGI DataHub and Open Data Platform may be found on this Elsevier published paper: Towards European Open Science Commons: The EGI Open Data Platform and The EGI DataHub

If you are interested in understanding how the EGI DataHub can help your project then please get in touch with us. We are also keen to talk to you if you have open data or repositories that you wish to benefit by the EGI DataHub making your data more discoverable and accessible to EGI computing resources.

The EGI DataHub, EGI Open Data Platform and the new functionality on the EGI FedCloud that enables integration with these new technologies are being developed as part of the EGI-ENGAGE project and the INDIGO DataCloud project.

More information

Towards European Open Science Commons: The EGI Open Data Platform and The EGI DataHub

Matthew Viljoen is senior operations officer at the EGI Foundation.