20 years with EGI: an interview with Fabrizio Gagliardi
The DataGrid project was conceived at CERN to address the shortcoming of enough computing and data processing capacity to face the huge requirements of the Large Hadron Collider. An easy solution would have been to expand the computer centre by at least one order of magnitude more. However, that would have required approval by the CERN council, possibly a new building and massive financial funding. In addition, the agreement of the Member States: computing is always considered a little bit secondary compared to running the accelerators and supporting the experiments. We had the problem of having an estimated huge amount of computing and data processing that the LHC would have brought in. We already had quite a considerable amount of computing resources available distributed in the various Member States in anticipation of the data taken from the LHC, which, in the meantime, was delayed.
The capacity was, therefore, in the Member States, but it was impossible at CERN to make an upgrade comparable to that. From this came the idea that if we could aggregate all those available computer capacities worldwide and somehow build a virtual supercomputer centre to support the computing for the LHC, that would be the most affordable solution. Like anything new, though, there was quite a bit of scepticism. We looked around for basic technology we could use, and we ran into a Globus, which was a package developed in the United States at that time by essentially two computer scientists: Ian Foster at the University of Chicago and Argonne National Lab and Carl Kesselman at University of Southern California in San Diego. With it, we had the hope that maybe we could aggregate all those resources around the world, create a single administrative domain so that people could log in on all those systems at the same time, send jobs around and have a job scheduler that could move the working load from one to another computer. We could have ways to distribute and share the data. That was the basic idea. We knew there would be an increase in the complexity and overall management. We knew that would have implied getting so many different institutes to agree. We knew we would face political, policy and technical problems and network instability. But then – at the end of the 1990s and early 2000s, more than 20 years ago – the technology of wide area networking was reaching a status where networks were relatively reliable and the bandwidth they offered was decent. If you put all that together, a mature network, the lack of central funding at CERN, the availability of a relatively large amount of capacity in the Member States, and some kind of developments already ongoing in the United States based on Globus, you have the recipe for DataGrid.
Despite having identified the essential ingredients for designing this large virtual, let’s call it, computer centre to support the Large Hadron Collider, you know what? We eventually decided to call it the DataGrid, because it focused more on moving data around than just processing. There was anyway a problem. The problem was funding because, you know, CERN needed more funding to support the development needed. We could not just take Globus and install it as it was. We needed to augment it and make it more adaptable to our needs. From the beginning, we knew that we had to develop a large part of the middleware that required skills and manpower that we did not have enough on-site. Therefore, the big obstacle was finding more funding. We went to the only funding agency available at the time, the European Commission, and tried to sell them the idea that what we were trying to do at CERN for the physicists could have a more general use for the rest of the European scientific community, for instance, the life science community, especially the medical research and the engineering community. A similar operation today would be more difficult; it was easier at the time because the European Commission had more centralised power and independence. What happened for DataGrid is that, together with a senior American computer scientist, Dr Paul Messina, we presented our ideas and vision to George Mitakides, the director in charge of the forerunner Framework Programmes Esprit. He understood and believed in it. Of course, we had to write a proposal and go through all the bureaucracy and the necessary checks to convince all the internal Commission experts and external reviewers. Asking the Commission to fund novel computing technology has always been challenging. However, we succeeded; I am proud and eternally grateful to Commission officials such as Dr Mitakides for their vision and confidence.
The serendipity was that some physicists, especially in France, were involved in helping medical research. Being physicists, there are so many things in common between medicine or healthcare research and physics. They approached us, suggesting that what we were doing for the Large Hadron Collider computing infrastructure could be used by health research, which suffered more or less from the same problem of needing more computer resources to support all the research the community wanted to do. Their resources were all scattered. Because the LHC was late and most of the computer resources had already been procured, there was another piece of serendipity: overcapacity. Physics had the idea of the grid. But Physics could offer a fraction of their large spare capacity to scientists with similar problems. Of course, the situation changed dramatically three years later when we started to collect data from LHC: all these other scientific communities were pushed a little bit on the margin, but all in all, it was a good operation because these other communities, essentially the health research, but also Earth Science such as carried out by ESA could learn the technology and develop their Grid like solutions.
We needed, for instance, to design a special scheduler and negotiate with each data centre on how to harmonise and have the grid coexist with local management tools. The local accounting system needed to be compatible with the grid. And all those always have double aspects: the technical aspects are usually the easiest because we can find solutions. After that, though, there was all the time much massaging and discussing with the various levels of administration and leadership and a huge amount of policy because, in the end, it was almost 100 partners whom all had to sign and follow the procedure for a European Union project – At that time, the director general of CERN had to sign in three copies manually 100 times for each partner and the DG at that time was the famous Italian physicists Luciano Maiani. To manage such a complex project, we decided to follow the Latin motto “Divide et impera”, or structure and create technical but also administrative structures on a regional basis. That did the trick from both sides because it was much easier to support the grid to solve the problem at the regional level than every time an entire solution developed globally.
When the LHC became the WLCG, that went much further than Europe. A note here that it was fantastic to demonstrate, from a political point of view, how science is better than politics because, in the same project, we had Taiwan and China, Russia, India and the US: I think this is probably the most important political achievement in my life.
In addition to what I said before, the only thing that could have worked better and that somehow limited the grid is that it could have been designed for robust commercial use. It worked in Physics because all institutes were working on the same undertaking (LHC); therefore, a precise level of accounting and security was not mandatory. It took much work to understand who was using what and charge them accordingly. Of course, we had some accounting techniques, but in the end, it didn’t matter too much. For real commercial use, the grid should have been more robust. Any smart hacker could have put it down. However, who wanted to hack a scientific network? There is no business, no money transactions. It was the perfect solution for that particular scientific community at that moment. However, it was not intrinsically safe and not designed from the start for commercial or industrial use. Nonetheless, the grid prepared the ground for the scientific community to adopt later solutions like commercial and private cloud computing and paved the way to the present extraordinary success of the hyperscalers.
I cannot go down into the nitty gritty details, but at a very high level, it was the natural consequence of contributing to the LHC world grid, which still supports LHC computing today. You needed to continue to develop the middleware as well as policies and strategies to federate the resources, which in a way, we already started to do with the DataGrid.
My feeling is that Cloud computing as a model, and especially as a business model, is becoming predominant today; however, that doesn’t come without important consequences, especially with commercial providers. For example, one consequence is that you move data outside Europe, or a non-European legal entity controls these. Secondly, moving away from in-house development risks losing the local know-how. Making everything virtual and moving the data to a commercial provider, the computer scientists become redundant and will move to something else. But if, by any chance, you need to develop a new system because you know what the cloud offers you is either inadequate or unsafe or too expensive, and this is not something you can go out and buy in a couple of months, then you will have lost the local expertise to develop an alternative solution. That’s why I’m a relatively great believer in hybrid solutions where you keep your critical data locally; you have a reasonable system on premises which can be just one system or a federation of European systems. Then you profit from the clouds to get much more elasticity but maintain your local expertise, protect what is more critical locally, and do everything that may be latency sensitive locally.
Q. If you could invite a historical figure to dinner, who would that be?
I’ve not reflected on the historical figure before, but the first name that comes to my mind is Karl Marx. I wanted to invite Karl Marx to dinner and tell him, “Look, you know, you inspired me when I was a young guy, and I was, of course, trying to change the world in the student revolution, unfortunately, all attempts to put in practice your principles turned in terrible regimes and Capitalism won… ”. But you were right in many areas: we observe now the failure of the dream of the capitalistic vision, built on the assumption of an ever-expanding economy and a global market without frontiers. Unfortunately, no physical process can continue to expand forever, and we now observe the emergence of frontiers and competing blocks again.
Q: What are you reading right now?
What am I reading right now? Well, I’m reading Dario Ferrari, La ricreazione è finita. It’s all about the inside view of the academic career in Italy. Unless you’re an Italian in the Italian academic world, you will not appreciate it because it might become tedious. Nonetheless, I’m having some fun because, although I was never a part of Italian academia, I know quite a few of them.
Q: What is your favourite hobby outside work?
What is my favourite hobby? My wife would say my work because I tend to work all the time. Other than that, I practised several sports, tennis, skiing, and horseback riding (still two horses in ten families). Now the only one my health allows me to continue is sailing: I have a nice sailing boat in Barcelona, which I use every weekend. And I’m watching the onboard computer sailing my boat (laughs).
Q: What is your most impressive achievement outside the work?
I’m going to say something trivial, but picking up a woman who has been able to stand me for 50 years is probably the best I can think of at this moment.
Thanks so much for your time, Fabrizio! It was fun on the one hand and very useful on the other to understand a bit of the history, background, and future challenges that we are expecting.