Building_an_International_FAIR_Infrastructure_for_Uniting_Research_Data.pdf (1.31 MB)
Download fileBuilding an International FAIR Infrastructure for Uniting Research Data
presentation
posted on 2020-03-10, 03:59 authored by Carina Kemp, Kuba MoscickiOver the past ~6 years, a budding community of NREN and discipline operators of
synch&share stores has popped up. These operators typically run one of [ownCloud, seafile,
NextCloud, PowerFolder]. Judging by site surveys presented at consecutive synch&sharefocused CS31
conferences, their services have all become runaway successes – it’s not
unusual for these stores to be in the PB range and to serve tens of thousands of real
researchers and their real research data. The next wave of open science policy, however,
tells us that data shouldn’t be locked inside a single vault – instead it needs to be
interlinked, citable, free to move; in short, FAIR. The CS3 community have always been
working towards enabling interlinking of the data between stores at the identity and
metadata levels. An open protocol was developed to announce, accept and propagate
shared volumes from one installation to another. This protocol is called OpenCloudMesh2
and is by now supported by most synch&share software vendors. So, we have the installed
base, the incentive to interlink, and the technology to interlink. We just haven’t taken actual
linking beyond proof of concept yet; not in an operational, sustainable way in any case.
A proposal: interlink synch&share stores into a new pan-european data eInfrastructure
We were informed in late 2018 that the EC had put out a call for the development of
innovative science cloud eInfrastructures, called InfraEOSC-023
.
This call matched surprisingly well with our intents. A few guidelines from the call may
illustrate this. Imagine you have interlinked sets of synch&share nodes, and that data can be freely requested and mounted between them. Now think how well you’re placed to answer
these challenges from the call:
Highlights
• innovative models of collaboration that genuinely include incentive mechanisms for a
user oriented open science approach
• develop innovative services that address relevant aspects of the research data cycle
(from inception to publication, curation, preservation and reuse),
• allowing implementation of new scientific data-related developments and intelligent
linking and discovering of all research artefacts
• foster interdisciplinary research, serving a wider remit of research needs, as well as
new users like industry and the public sector.
A consortium has now been formed to deliver this project and is made up of ~10
eInfrastructure providers (NRENs, landmark instruments etc.), most in Europe, but AARNet is
also a contributor through their Cloudstor Services. The project shall be delivered not from a
blank slate, but rather building on an existing set of services already operated and in
widespread use among end users at the participant sites. This proposal does not focus on
development of software for a new infrastructure; rather, it is about systems integrating
existing components to deliver added value to the existing and active participants of the CS3
and GEANT communities.
The resultant eInfrastructure will be established by interfederating exiting stores into a fabric
of “federated sites” based on federation mechanisms, operational routines and trust.
Federative best practices learned from EduGAIN and EduRoam will be adopted and applied.
This presentation will present the building blocks for the project, the conditions to consider
to make this a success and the proposed milestones and invite additional international
collaborators.