Serverless computing for data-processing applications in EGI

Germán Moltó introduces the new OSCAR platform

The problem: event-driven scientific computing

Serverless computing, in the shape of Functions as a Service (FaaS) platforms, has paved the way for event-driven computing as a computational mechanism to process short-lived functions in response to events (such as HTTP requests or file uploads to a storage back-end).

Main open-source serverless platforms such as OpenFaaS, OpenWhisk and KNative are focused on processing bursts of short-lived HTTP requests via functions coded in certain programming languages, providing elasticity in terms of the number of containers started to simultaneously process the requests.

However, scientific applications are commonly more resource intensive, require longer execution times and can benefit from the auto-scaling of the underlying computational infrastructure in order to better cope with increased workloads.

The solution: OSCAR serverless computing platform

To this aim we developed OSCAR, an open serverless computing platform for data-processing applications. OSCAR consists of several services that are deployed on an elastic Kubernetes cluster. The cluster is then deployed via EC3, which uses the Infrastructure Manager to provision the cluster on multi-clouds.

OSCAR uses MINIO as the storage back-end so that files uploaded trigger events that are captured by the OpenFaaS serverless platform responsible to execute the user-defined functions. Requests are transformed into Kubernetes jobs to accommodate long-running executions.

This allows scientists to easily couple their file-processing applications to a storage area so that whenever files are uploaded, a function is invoked to simultaneously process the files. Output data results automatically appear in the same storage area.

 

 

 

 

 

 

 

 

Figure 1. Components of the OSCAR serverless platform. Benefits for scientists: integration with EGI

OSCAR has been integrated with several EGI services. First, the EGI Applications on Demand, in particular the EC3 portal, can be used to perform the provision of the elastic Kubernetes cluster on the EGI Federated Cloud. Second, the EGI Data Hub can be employed to use Onedata  as the source of events to trigger the execution of the OSCAR functions in response to file uploads. This means that users with an EGI account can self-deploy an elastic OSCAR cluster and create their file-processing applications in response to file uploads to their EGI DataHub Onedata space.

The use case: plants classification

A use case arising from the DEEP Hybrid DataCloud project that involves plants classification using deep learning techniques has been integrated with the OSCAR platform. This application uses a neural network optimised for plant identification using images. This use case can be followed step-by-step (see video), – an OSCAR cluster is deployed and a function is created to process the plants images whenever uploaded to a OneData space. Then, the Kubernetes cluster is scaled in the EGI Federated Cloud to simultaneously execute multiple plants recognition. Additional use cases are available in the GitHub repository.

More information

This work has been partially funded by the EGI Strategic and Innovation Fund.

Germán Moltó is Associate Professor at the Universitat Politècnica de València (UPV). He is responsible for the serverless computing research line at the GRyCAP research group and leads the OSCAR developer team. If you feel your scientific community could benefit from event-driven computing, contact Germán Moltó at gmolto@dsic.upv.es.

Subscribe to the EGI newsletter: