<< Portals & Gateways | Blog Home | Joint Summer School on Workflows and Gateways for Grids and Clouds >>

Damage control

Are we being paranoid?

After spending a very stimulating week at the EGI Community Forum 2012 I returned to work on Friday morning to find over 400 emails from our VOMS server stating that pretty much all our enmr.eu VO members had been suspended, including myself... Since then the number of emails has probably passed the 1000, many of them from users enquiring what's happening.

In short, this "behaviour" seems to be a "feature" of the new VOMS software. Without warning or advanced notice, all users are simply suspended, because they failed to sign the VO Acceptable User Policy (AUP) in time! It would have helped if users would have been notified they had to do that...

In trying to control the damage, we re-enabled all users, requesting that they sign the AUP. We were told we violated VO policies in doing so... I would like to see policies about services to the users, including proper notication well in advance when something like this is going to happen. Think of your bank doing the same: suspend your account without notice... You would probably switch bank pretty quickly.

We have been doing our best over the last few years to attract a new community to grid computing and promote the use of e-Infrastructure. What just happened clearly damage the image of the grid. I understand we have to worry about security, but let's not be paranoid!

There were lots of talks at the EGI Community Forum about user centric views, friendly portals and gateways, discussions about attracting new communities, e.g.the ESFRI projects... Clearly what just happened shows that we are still far from a user-centric and friendly view of grid business.

Now I am doing damage control, trying to answer the many emails of users, explaining them that this occurred outside our doing and apologizing deelply for it. Many of them encoutered problems accessing the VOMS server, a few typical issues being:

  • unable to establish a secure connection (probably because of some specific security settings of the server)
  • being confronted with an untrusted certificate of the server - should I trust this site?
  • being abroad without access to their personal certificate and thus being unable to sign the AUP on time
  • thinking that these are spam or phishing emails
  • ...

I am trying to reinsure them that all is safe and helping them as much as possible. We are representing the largest VO in the life sciences area... A few more of these events and I am afraid we will loose many users.

Enough blogging for now, back to damage control.

Cheers

Alexandre

Share


Re: Damage control

Just a quick note to thank the prompt action of the VOMS team in this issue. Considering the many emails users were getting (and we were getting from them), they took care of re-enabling all our suspended users!

Hopefully next time users will get notified on time that they have to renew their membership and this problem will not happen anymore.

Cheers

Alexandre

Re: Damage control

Alexandre, we are sorry to hear about the problems WeNMR has been having related to VOMS. The Director, Steven Newhouse has asked the people at EGI.eu who coordinate the support teams to look into what has happened and try and help your team to rectify it ASAP. Our thanks also to the VOMS team for their fast response in working to resolve the issue.

Re: Damage control

It's through feedback like this that we can enhance the existing tools and services, so first thanks for raising the problem and letting us know of the issues for your users.
We are now following up the issue in detail through the GGUS user support tool (the helpdesk is the tool to be used to get specialized support from the operators and the software providers) , but I would like to touch upon a broader topic that is relevant to the incident you reported, which is the current membership renewal policy and its support in VOMS.


Membership renewal and reaffirmed approval of the AUP are two distinct aspects. I'll just focus on the first, which is where the incident started.

VO membership is one of the main pillars on which secure access to resources rests.

Secure access to resources is very important to both our users who need access to resources, and to the managers of those resources who are responsible for resource access authorization, and for providing a secure environment to resource users. We also need to support VO managers, who need policies, procedures and tools that can support VOs regardless of their scale.

The policy for membership renewal is defined in the VO membership management policy. Having been in place for several years, it  takes on board the interests of both consumers and providers of resources, so that access is provided to users who are truly authorized by their research community. You are right in saying that while providing a secure environment, VO management procedures should give access to authorised users as simply as possible.

The current VO membership renewal policy currently says that the renewal process must include:
1- "Confirmation, by the VO Manager, that continued membership of VO is still allowed".
2. "Confirmation or update of all data provided during registration and all special authorisations"
3. "Reaffirmed acceptance by the user of the Grid Acceptable Use Policy and the VO Acceptable Use Policy."
4. "Membership of the VO must be renewed at least every 12 months. Additionally all members of the VO should renew following a major change to the Grid Acceptable Use Policy".

Point (1) implies that a user does not decide by themselves to extend membership: the responsibility for deciding if membership can be extended is owned by the VO manager. For, example users of an expired project/collaboration may no longer be authorized to access resources, and the VO manager needs to confirm which users are still authorised members. How to improve this?

Both the policy and the tools implementing it may be revised.
- The policy could be changed so that duration of membership is no longer mandatory (point 4), but is configurable by the VO manager (the default could still be 12 months). Discussion of a such a change would need to involve resource providers, users and security experts.
- Even with the existing policy, the mechanism for membership renewal could be improved. Renewal of membership could be triggered by the user instead of the VO manager. By notifying the user about the expiring membership (for example 2 weeks before the deadline), users interested in extending the membership could notify the VO manager by  re-signing the AUP. Then, the VO manager could be notified about users willing to extend the membership, and could accept/reject requests.
- The default time to accept a new AUP in the tool could be extended from 24 hours to something like 2 weeks (but the time is already configurable in VOMS admin system and the VO managers may already extend this time if they wish to give users more time to respond).
- Search of suspended users for revocation of suspension should be supported by the tool.

These are some ideas on how to improve membership renewal. I'm sure other approaches/solutions are also possible. I suggest we discuss this with other Virtual Research Communities, the security policy group and the developers, so that we can reach community consensus on how to improve the user experience while maintaining secure access to the resources.

Tiziana

Re: Damage control

For those interested and to complement my previous blog post, a incident post-mortem is now available (https://documents.egi.eu/document/1090) describing how the incident was contained. The post-mortem also provides  useful information on the current processes for managing membership renewal and reaffirmed AUP acceptance.

Re: Damage control

If EGI is making true its goal to serve a much larger user community in the future, increasing its user base from the current ~18'000 to 180'000 or even more (wouldn't that be nice!), changes in policy and tools is a must and Tiziana has been suggesting a few based on the feedback we provided on this particular problem:

- The policy could be changed so that duration of membership is no longer mandatory (point 4), but is configurable by the VO manager (the default could still be 12 months). Discussion of a such a change would need to involve resource providers, users and security experts.

Why not by default trust our users? Now the current policies seem to make everyone a suspect. Extending or relaxing the duration would be a good move.

- Even with the existing policy, the mechanism for membership renewal could be improved. Renewal of membership could be triggered by the user instead of the VO manager.

It seems to me this should really be the default. Has any physics VO with a few thousands users encountered the same problem? I can't see how VO manager could monitor the memberships of so many users...

By notifying the user about the expiring membership (for example 2 weeks before the deadline), users interested in extending the membership could notify the VO manager by  re-signing the AUP. Then, the VO manager could be notified about users willing to extend the membership, and could accept/reject requests.

Please implement this as soon as possible.

- The default time to accept a new AUP in the tool could be extended from 24 hours to something like 2 weeks (but the time is already configurable in VOMS admin system and the VO managers may already extend this time if they wish to give users more time to respond).

If it is that simple, then please change the default to two weeks in the next distribution. 24 hours is clearly not reasonable. Most user complains we received were not about having to renew their membership, but rather about the ridiculously short deadline for this. Remember you need to access the VOMS server from a browser containing your certificate, which you might not always have at hand. And of course, access should be smooth from all devices, including mobile devices these days.


- Search of suspended users for revocation of suspension should be supported by the tool.

Add to that sorting users based on their expiration date to facilitate life of VO managers.

I would say, let's start this policy and tools discussion in a broader context!

Alexandre

 

Re: Damage control

It is unfortune to see that a software bug caused disruption of a service, which in turn affected many end users.

However, I just want to point out that this has nothing to do with existing security policy and I can't see why we should change the security policy. Of course certain function of the tool can be enhanced so that membership management will be easier and less time-consuming, but it should not compromise security.

It is clear that everyone (EGI management, security team, software developers etc) agree that maintaining continuity of the service is the higest priority, a temporary workaround is being worked out, but it does not mean we need to change the security policy .

What we are trying to do is to find a fine balance between usability and security.

Mingchao

Re: Damage control

Just a note - if, per chance, you decide to change the AUP to another version and you increment the version number, you should disable the ability for users to register with the VOMS BEFORE incrementing the value, thus preventing members from having to sign the AUP.  

This can be done by setting 'voms.request.webui.enabled = false'  in the voms.service.properties file, and re-deploy the voms-admin instance for that VO.  

Then, go through and re-sign the AUP for everyone before setting  'voms.request.webui.enabled = true' and again re-deply the instance.

Cheers!

Dan

yocum@fnal.gov

Re: Damage control

Just a couple of comments:

On "physics VOs with few thousands users" - actually, they use a different software to manage VO registrations. That one works pretty well: it bugs me so doggedly about re-signing my VO AUP, I want to filter it as spam ;-)

And this brings me to another point: people, it is just a software bug! There is no perfect software. This bug will be fixed, another will be found. You don't like this buggy software - use another. Thankfully, we are leaving in the Open Source space, and strive to use common standard interfaces. I know at least two other softwares which offer functionality similar to VOMS Admin, with the same VOMS service behind. Perhaps EGI can consider adding them to the alternatives in the UMD.

But of course the reason for the original post is not the bug itself, but the apparent paranoia of the resource owners. Well, they have very good reasons to be paranoid. Even bank cards have limited validity period and need to be renewed. National passports have limited validity periods. Your car's technical inspection certificate needs to be renewed. And no bank or government or a vehicle registration authority sends you e-mail reminders. This is something you as the document owner must check yourself. So EGI actually offers you a better service (when the bug will be fixed). Enjoy :-)

Re: Damage control

 Yes, VOMRS is nice for sending notifications that members need to re-sign the AUP.  Tanya does good work.  And she brings us chocolate (bonus!).  

You mentioned there are 2 other products that fulfill the VOMS-Admin role - what are they?

Thanks,

Dan

yocum@fnal.gov

Re: Damage control

"But of course the reason for the original post is not the bug itself, but the apparent paranoia of the resource owners. Well, they have very good reasons to be paranoid. Even bank cards have limited validity period and need to be renewed. National passports have limited validity periods. Your car's technical inspection certificate needs to be renewed. And no bank or government or a vehicle registration authority sends you e-mail reminders. This is something you as the document owner must check yourself."

...hmmm...

My bank send me automatically a new card when the old one expires... They don't first close my account. And my passport is valid for 5 years... I even get one month in advance a notice that it will expire... This does not mean there are no problems: for example internet banking is down from time to time...

So not sure I can fully agree with this comment. Of course I am paying for all those services and so far grid usage is pretty much free. They are talks of changing that in the future, which might make the user a paying customer... and the saying is: the customer is king!

There will always be a bug somewhere (I tend to call it "undocumented feature"). This one was nasty, but proper action followed quickly and the damage was limited. Of course physics people have been born with grid and it feels natural. We are trying to attract a new community to it and it takes quite some effort to convince people.

Cheers

Alexandre

PS: And chocolate is of course always good!


Add a comment Send a TrackBack