Empowering Structural Biology: Inside WeNMR’s Journey with EGI

Published 30/07/2025

Egi magazine - Issue 03

Alexandre M.J.J. Bonvin

Professor, Utrecht University

Now serving over 66,000 users in 174 countries, WeNMR’s success is built on user-friendly tools like the HADDOCK server, strong support and training, and a commitment to scientific excellence.

We interviewed Alexandre M. J. J. Bonvin, professor of Computational Structural Biology at the University of Utrecht and coordinator of WeNMR, a worldwide infrastructure for NMR and Structural Biology.

For how long has the EGI Federation been supporting WeNMR?

We started using High-Throughput or Grid Computing around 2008, thanks to a European project called Enmr. So essentially, since 2008 — or 2009 to be precise — we've been using EGI services without interruption. Even during funding gaps between European projects, we kept our services running, for some short periods even without external funding.

This makes you one of the longest-standing communities supported by EGI. Given that time span, what would you say are the most significant milestones in WeNMR’s growth?

The most widely used service we offer is HADDOCK, a web server for modelling biomolecular complexes. We launched it in 2008, initially running on local resources. A year later, we moved it to the grid, which allowed us to scale and support a larger user base.

The name “WeNMR” came from a project I coordinated in 2009, following the original Enmr project — we added the “W” for “Worldwide,” reflecting our community’s global expansion from the beginning. That branding has stuck with us across different projects over the years.

We've seen exponential growth, especially recently. Why? There hasn’t been a single big event — rather, it's been a steady focus on user friendliness. From the start, we've aimed to make tools accessible even to non-experts. That ease of use has been a major draw. We've also invested heavily in tutorials and documentation, covering everything from basic, no-command-line-needed guides to advanced workflows. User support has been a priority, answering every question, even if it's the same one repeatedly. I’ve received emails from users saying, “Now I see why someone recommended HADDOCK — I got an answer within a day.” Promotion has played a big role too: workshops, courses, and events worldwide — some under EMBO, others via EU projects. I’ve travelled a lot to promote the tools.

And it’s not just about software. We’re a research group at a university, so many of our software developments stem from scientific questions we’re trying to solve. Those developments often become new or improved services, and the resulting papers help bring more users to our tools.

Finally, we've been part of the BioExcel Center of Excellence for ten years now, currently funded under EuroHPC. That funding supports software development, user support, and service improvement. Sustained funding is absolutely essential to continue growing and serving the community.

You mentioned that your user base has grown significantly outside of Europe.

Yes! We've had close to 66,000 registered users over the years. More than half of them come from Asia, especially India and China — those two alone make up about one-third of our user base.
We’re now involved in a new Europe-India project, funded by EuroHPC, to promote collaboration. As part of that, we’ll further develop HADDOCK and also deploy some of our services on Indian high-performance cloud infrastructure.

So the Indian user base is likely to keep growing.

Definitely. Though, if users run the services on local resources, those stats won’t show up in our web service metrics. Still, it’s clear the impact is growing. We currently have users from 174 countries — that’s pretty amazing, considering the UN recognises 195 countries globally.

What do you think has driven this exponential growth in users?

Several factors. Everything I mentioned earlier contributes: intuitive interfaces, strong user support, tutorials, promotion, and so on.

But recently, we’ve noticed a new trend: more and more use in education. We’re seeing our tools included in university curricula, and I also use them in my teaching for bachelor’s and master’s level courses. Students use our services for assignments, and often that becomes their entry point. If they move into research later, they’re likely to stick with the tools they’re already familiar with.

We can spot these patterns: a flood of student registrations from the same university around the same time, or professors reaching out to say they want to use the server for a course.

Covid also played a big role. During the pandemic, people turned to computational tools, and our usage spiked. Before Covid, we had around 300 new users per month. That jumped to 600, and since the beginning of 2025, it’s climbed again — now we’re averaging 2,000 new users per month. Part of that growth might also come from recent publications we’ve released on modelling immune complexes involving e.g. antibodies.

Given this educational use, do you think infrastructure needs to adapt to meet those demands?

Education typically doesn't need massive compute power. When we run courses, we create special accounts with sampling limits to avoid wasting resources. We often provide pre-calculated results for students to analyse — it gives them the experience without burdening the system.

We could consider a dedicated educational interface running separately, but managing these services isn’t simple. There’s a complex backend infrastructure, including a secure frontend, job submission management, and more. That’s why we’ve kept a single entry point.

We use EGI’s workload manager, DIRAC, to distribute jobs, which helps. Technically, we could run multiple servers, but managing them all — ensuring security, maintenance — would be a significant effort.

We also serve pharmaceutical companies and industry partners, some of whom have asked to install HADDOCK locally to avoid sharing data over the network. We’ve had similar requests from China, where network constraints can be a challenge. But managing remote server deployments is complex, especially without root access.

The HADDOCK server — currently HADDOCK 2.4, running version 2.5 of the software — accounts for all our grid and HTC computing on the EGI infrastructure. Other services, including some in Florence, run on local resources.

We’re now developing and recently released the next generation of our software: HADDOCK 3. It’s a complete rewrite — more modular and flexible, and better suited for cloud environments than grid systems.

Currently, HADDOCK runs start on our server, which sends jobs to DIRAC, and results are returned and further processed locally. With HADDOCK 3, the whole workflow will be sent to a cloud or HPC node, executed entirely there, and the results returned. That model requires cloud resources, which are currently limited in our EGI SLA.

As part of the EOSC Data Commons project, we're building the infrastructure to support HADDOCK 3 in a cloud environment. Automation is key — the system will need to start and stop cloud instances on demand without human intervention.

This shift will also mean managing much larger data transfers — tens of megabytes sent out, potentially gigabytes coming back. The infrastructure must evolve to support that. Things run smoothly now, but it took years to get here, and it will take time for the new system to reach the same maturity.

How do you see AI and machine learning playing a role in WeNMR’s future?

We’re using AI both to generate hypotheses and as part of the modelling process, then refining those models using physics-based approaches.

There’s been a lot of buzz around AI tools like AlphaFold, it even won a Nobel Prize. I thought it might reduce our usage, but that hasn’t happened. One reason is that AI doesn’t perform well (yet) on all systems, particularly on antibodies and nanobodies. These evolve very quickly as they need to adapt to new threats, and most AI models rely on evolutionary conservation, which breaks down in those fast-evolving regions.

So, AI doesn’t replace us; in fact, it complements our work. Users might generate a model with AI and then turn to HADDOCK to refine it. That hybrid workflow is becoming more common and may explain the surge in users this year — though it’s also increasing the load on our servers.

We’ve developed AI tools ourselves, like DeepRank for protein-protein model quality prediction. As part of the EOSC Data Commons project, we’re launching a web service for it. The prototype is up and running, but it currently lacks GPU support, which makes it slower. Eventually, we aim to run it on GPU-enabled cloud machines.

We’ve also rewritten our frontend architecture as microservices, meaning services can run anywhere — locally or on EGI infrastructure — and just communicate with the main server. This should be especially useful for AI workloads.

We could, in principle, integrate with external databases like the AlphaFold one in the future, allowing direct upload of models. But we must be careful: if users without domain expertise misuse these tools, it can waste computing time. That’s where education plays a role again — ensuring tools are used effectively and responsibly.

NB. The statistics included above come from this website.

From this category

e-infrastructures

EGI Federated Cloud

EGI Federation

EOSC General

Featured

EGI: 20 Years of Federated Infrastructures Serving European Science

EGI, a European landmark infrastructure for scientific computing, celebrated its 20th anniversary in 2023, marking