Building an International FAIR Infrastructure for Uniting Research Data

posted on 10.03.2020, 03:59 by Carina Kemp, Kuba Moscicki
Over the past ~6 years, a budding community of NREN and discipline operators of synch&share stores has popped up. These operators typically run one of [ownCloud, seafile, NextCloud, PowerFolder]. Judging by site surveys presented at consecutive synch&sharefocused CS31 conferences, their services have all become runaway successes – it’s not unusual for these stores to be in the PB range and to serve tens of thousands of real researchers and their real research data. The next wave of open science policy, however, tells us that data shouldn’t be locked inside a single vault – instead it needs to be interlinked, citable, free to move; in short, FAIR. The CS3 community have always been working towards enabling interlinking of the data between stores at the identity and metadata levels. An open protocol was developed to announce, accept and propagate shared volumes from one installation to another. This protocol is called OpenCloudMesh2 and is by now supported by most synch&share software vendors. So, we have the installed base, the incentive to interlink, and the technology to interlink. We just haven’t taken actual linking beyond proof of concept yet; not in an operational, sustainable way in any case.

A proposal: interlink synch&share stores into a new pan-european data eInfrastructure

We were informed in late 2018 that the EC had put out a call for the development of innovative science cloud eInfrastructures, called InfraEOSC-023 .

This call matched surprisingly well with our intents. A few guidelines from the call may illustrate this. Imagine you have interlinked sets of synch&share nodes, and that data can be freely requested and mounted between them. Now think how well you’re placed to answer these challenges from the call:

• innovative models of collaboration that genuinely include incentive mechanisms for a user oriented open science approach
• develop innovative services that address relevant aspects of the research data cycle (from inception to publication, curation, preservation and reuse),
• allowing implementation of new scientific data-related developments and intelligent linking and discovering of all research artefacts
• foster interdisciplinary research, serving a wider remit of research needs, as well as new users like industry and the public sector.

A consortium has now been formed to deliver this project and is made up of ~10 eInfrastructure providers (NRENs, landmark instruments etc.), most in Europe, but AARNet is also a contributor through their Cloudstor Services. The project shall be delivered not from a blank slate, but rather building on an existing set of services already operated and in widespread use among end users at the participant sites. This proposal does not focus on development of software for a new infrastructure; rather, it is about systems integrating existing components to deliver added value to the existing and active participants of the CS3 and GEANT communities.

The resultant eInfrastructure will be established by interfederating exiting stores into a fabric of “federated sites” based on federation mechanisms, operational routines and trust. Federative best practices learned from EduGAIN and EduRoam will be adopted and applied.

This presentation will present the building blocks for the project, the conditions to consider to make this a success and the proposed milestones and invite additional international collaborators.




