The ten thematic services of EOSC-Synergy: an updated landscape view

Valentino Cavalli briefs us on the project’s thematic services

EOSC-synergy aims at expanding the uptake of the EOSC by building capacities in participating countries, harmonising policies and federating relevant national research e-Infrastructures, scientific data and thematic services. EGI leads a landscape analysis, which will contribute to an assessment of policy gaps and overlaps and to recommendations for policy harmonisation across the region. As part of the regional landscape analysis, which is still ongoing, the ten thematic services in the four areas identified in the project were surveyed.

 

Data management, curation and preservation

All ten thematic services manage rich and diverse data types: numeric data text, geospatial information, still images, and 3D information. The size of the data ranges from several megabytes up to hundreds of terabytes. Processing such amount of data will require a sufficient cloud computing capacity to tackle the 2.4K jobs and 4.2K CPU hours per week required in a third of the thematic services.

Most services perform basic data curation, including data checking and the addition of metadata, which, in most cases, relies on standardised or controlled vocabularies. Sometimes, extended controlled vocabularies with their own terminology are used. Four thematic services are implementing machine-readable data catalogues, whilst others are still in the design phase or have not yet considered doing so. Measures for the annotation of the data used and produced include version control, file integrity checks, etc. DOIs through Zenodo, Dataverse, B2Handle and other methods are being used or planned.

All thematic services support the release of the resulting data under open research policies, as well as the long-term availability of the relevant data. Some of them already require compliance with the FAIR principles while others will consider it at some later stage during the project. Half of the thematic services plan to consider the publication of data in certified repositories. Data-format standardisation and the use of meaningful metadata are the key aspects identified by the thematic services for the reuse of the data.

Data sharing and access

Some thematic services expressed concerns on data protection or IPR, and on a number of issues related to data sharing, such as potential competitive disadvantage, the effort required for sharing the data, cost of maintaining the data and the lack of control when sharing the data.

Thematic services have different access policies, which range from no access restriction to access approved by funding bodies. Some of them restrict access based on Virtual Organizations membership, especially regarding the use of computational resources, which could lead to user charges. Others plan to restrict access to users or communities upon approval by the funding body.

All services are planning to have a publicly available access policy in less than two years, whilst only one has implemented ity. This may become an issue as having such a policy is a requirement for registering a service in the EOSC marketplace.

Authorisation is provided through individual or group-membership support. A few services are also considering selective permission sharing for resulting data. In most cases, user access control supports or plans to support EGI Check-in or other community IdPs. Support for access control based on a national federation is also considered in a few instances.

Conclusion

The extension of the landscape analysis to the thematic services proved beneficial in showing that there is room for improvement with respect to the “FAIRness” of the data produced. It also showed how those services should evolve to become fully EOSC compliant. Finally, it has made several thematic services aware of the need to reconsider functionalities that were initially unnoticed.

Valentino Cavalli is Strategy and Innovation Officer at the EGI Foundation.