AI Inference in Action: Deployment Strategies Learnt from AI4EOSC and iMagine

Published 30/07/2025

Egi magazine - Issue 03

Amanda Calatrava Arroyo

Senior researcher at Universitat Politècnica de València, Spain

Alvaro López García

Research associate at the Spanish National Research Council (CSIC), Spain

Ignacio Heredia

Researcher at the Institute of Physics of Cantabria (IFCA)

Germán Moltó

Full professor at Universitat Politècnica de València, Spain

AI4EOSC and iMagine projects have developed scalable and flexible solutions to simplify AI model inference across the computing continuum, enabling efficient and user-friendly deployment for real-world scientific applications.

As AI is becoming more integrated in research and industry, one key question arises: how do we deploy AI models at scale, efficiently and reliably? While training models is often in the spotlight, it's the inference phase (the step when a model is used to make predictions on new data) that defines how useful and usable that AI truly is.

The AI Inference Challenge

Deploying AI models into production at scale isn’t easy. It requires powerful, adaptable infrastructure that can respond to varying demand, handle complex workloads, and integrate seamlessly with different computing environments.

Inference needs to be:

Scalable: Systems must automatically adapt to workload demand - scaling up when usage spikes and scaling down to save resources when resources are not needed.
Flexible: The environment should be able to run across different cloud platforms and computing architectures (from CPUs to GPUs to HPC systems), covering the computing continuum.
Efficient: Optimising both cost and energy consumption is essential, especially for large-scale or real-time applications.
Composable: users should have the ability to create inference pipelines, allowing models to be easily connected, extended, or reused.

The AI4EOSC and iMagine projects, both funded under the Horizon Europe programme, shared their experience addressing this challenge during EGI2025. On the one hand, AI4EOSC provides a user-friendly workbench and a toolbox, called AI4OS, of practical, scalable, and user-friendly solutions for developing and running AI models. On the other hand, iMagine is exploiting the AI4EOSC solutions to empower researchers in aquatic sciences with open access to a diverse set of tools and services to turn AI innovation into operational services for marine and freshwater research.

From Theory to Practice: Use Cases That Shaped the Approach

These capabilities weren’t developed in a vacuum—they’ve been shaped through a range of real-world use cases from the AI4EOSC (see more info about AI4EOSC use cases here) and iMagine (see more info about iMagine use cases here) projects. From marine species detection to thermal image processing, the technical team has worked hand-in-hand with end-users to understand what their needs are, what works and what doesn’t for each individual use case, and how to improve the existing solutions.

As depicted in the picture above, each of these use cases calls for different infrastructure solutions. That’s why no single inference method works for everyone, and why the AI4EOSC toolbox supports a wide range of strategies. One principle is clear: users should not adapt to the platform; the platform should be adapted to the users.

Serverless Inference with OSCAR

At the heart of AI4EOSC’s inference engine is OSCAR, an open-source, serverless computing framework used for scalable AI model execution. It allows AI models, packaged in Docker containers, to automatically respond to events; for example, triggering a model when a new file is uploaded to an object-storage solution (what we know as asynchronous invocations).

OSCAR runs on top of an elastic Kubernetes cluster, allowing the system to auto-scale based on demand, saving both costs and energy. It is cloud-agnostic and easy to deploy thanks to the Infrastructure Manager (IM). Moreover, it supports ARM-based low-power devices such as Raspberry PIs, enabling serverless computing at the Edge and composing workflows along the computing continuum. Next, you can find the architecture diagram of OSCAR:

In addition to event-driven inference, OSCAR supports programmatic access to trigger the inference of AI models. This is what we call synchronous invocations, and are scalable HTTP-based endpoints for AI models with user-defined configurable elasticity boundaries. Also, some AI models are better suited to be exposed as REST APIs, or the models even offer their own UI (for example, LLMs). OSCAR supports this too, making it easy to turn an AI model into a web-accessible service that can be called by apps, websites, or even other AI models.

However, inference isn’t always real-time. Many researchers need to process huge volumes of historical data, like the EMSO OBSEA use case from iMagine has (~900.000 images). That’s where OSCAR Batch comes in. This is an open-source extension for OSCAR designed to perform batch-based processing using the serverless approach. OSCAR Batch is designed for parallel, efficient processing of large datasets. It optimises CPU and memory usage across OSCAR Kubernetes clusters. This solution is perfect for one-off bulk inference tasks on large historical datasets.

Graphical AI Pipelines with AI4Compose

Workflows allow AI scientists to automate inference tasks using serverless computing. For non-experts or multidisciplinary teams, visual tools are essential, because they make this process more accessible and efficient. In this direction, AI4EOSC has developed AI4Compose, a solution that simplifies workflow creation through an intuitive, low-code interface, reducing the complexity of managing inference pipelines.

AI4Compose supports two different main tools (Node-RED and Elyra) for composing AI inference pipelines graphically, using a simple drag-and-drop interface. This allows researchers to build, connect, and experiment with different models and data flows - no deep coding required.

Why support two different solutions? Because we have learned during the projects that there is no solution that fits all the use cases' needs. Meanwhile, Elyra is designed for data scientists who work within JupyterLab, providing a structured way to integrate Python notebooks and scripts into machine learning workflows; Node-RED is designed for rapid deployment, event-driven workflows, making it ideal for quick prototyping and demonstration purposes.

Moreover, AI4Compose offers several customised nodes and examples that integrate models in the AI4EOSC Catalogue with OSCAR. For Node-RED, they are published in this Node-RED Library collection. For Elyra, they are available in the AI4Compose Github repository. Thus, AI4Compose lowers the barrier to deploying multi-stage AI pipelines for a wide scientific community.

Beyond Serverless: Other Flexible Deployment Options

Due to the wide variety of disciplines developing and using AI models, not every use case fits in the serverless model. Some require access to specific computing resources, like GPUs, or use case owners need to control the whole computing environment where their models are being executed. That’s why AI4EOSC platform also offers more traditional deployment options:

The Federated Nomad Cluster that supports mainly training, can be also used to support deploying models for inference. The cluster is distributed across 5 data centres across 4 countries. The main advantage is access to GPU-equipped nodes (also supported by OSCAR, but the deployments for the projects did not have this kind of nodes), and it is also a good option for low-latency inference, although it comes at the cost of constant resource use.
The Infrastructure Manager is also integrated into the platform, and it allows full control over deployment, because users can deploy AI models directly onto cloud-based virtual machines (VM), choosing the VM characteristics, its location, and configuration. This option is slower to launch (the VM needs to be created and properly configured), but offers maximum flexibility.

Finally, models in the AI4EOSC Catalogue can also be deployed directly on the EOSC EU Node, making it easier to integrate with broader European Open Science Cloud services and resources. More info here.

All these options provide researchers with a full spectrum of options depending on their workload and goals.

Final considerations for efficient AI Model Execution

As AI continues to move from the lab to real-world applications, flexible and efficient inference solutions will be one of the keys. Thanks to AI4EOSC and iMagine, we now have a flexible, open toolbox that offers exactly that. Whether you need event-driven predictions, GPU-powered inference, batch processing of legacy datasets, or an easy-to-use visual interface, there’s a solution ready to go.

The bottom line? You don’t need to be an infrastructure expert to put your AI models to work. The AI4EOSC platform, together with the AI4OS software stack, is offering researchers a powerful toolbox for bringing their AI models into production across multiple clouds, platforms, and environments.

The future of AI isn’t just about smarter models. It’s about smarter deployments.

From this category

iMagine

Integrated AI-based Tools for Ecosystem Monitoring: Applications in Fish Detection and Artificial Reef Impact Assessment

For marine research, AI has a transformative potential that helps improving both the efficiency and