Globus_research_data_management_service.pdf (2.7 MB)
Download fileGlobus research data management service
Research data can traverse a multitude of compute and storage devices from their
collection, through analysis, dissemination, and archival storage. The scientific data lifecycle
often requires acting on data spanning geographical locations and timescales, from nearreal time quality control, to human-oriented curation, through to long-term cataloguing and
archival. Further, almost any step of this lifecycle can require the use of specialized
hardware or computing resources resident in one or more administrative domains.
Combined with ever-growing data rates and volumes, these challenges necessitate new
technologies to aid researchers in reliably, and simply, offloading distributed data
management and analysis tasks.
To address these needs we have developed Globus Automate--a distributed research
automation platform designed to empower scientists to create, deploy, and apply dataoriented pipelines. Globus Automate can reliably automate the entire research data
lifecycle, governing data from its generation at various instruments, through analysis, to
dissemination and archival, while weaving fine-grained access control throughout the
pipeline to securely interoperate with services across administrative domains. Globus
Automate enables users to offload the management of data and abstract the challenges
associated with distributed analysis and storage pipelines.
Globus Automate fills an important, yet previously unmet need in science by enabling the
composition of data management services into distributed data management pipelines.
Using any of the provided Globus services, such as Transfer, Search, and Auth, as well as any
custom service that exposes an Automate API, users can construct rich data pipelines to
perform various tasks. Further, users can leverage funcX--a distributed function as a service
platform-- in Automate flows to perform remote computation on almost any resource to
which the user has access.
In this talk we will present Globus Automate and describe uses cases from initial pilot
deployments. We will describe how funcX and Globus Automate make it possible to easily
and seamlessly exploit a wide range of computational resources to automate the research
data lifecycle, such as is depicted in Fig 1., from performing preprocessing and quality
control tasks locally through to outsourcing large-scale analyses to leadership computing
facilities.
ABOUT THE AUTHORS
Ryan Chard is an Assistant Computer Scientist at Argonne National Laboratory having joined
2016 where he was awarded a Maria Goeppert Mayer Fellowship. His research focuses on
the development of cyberinfrastructure to enable scientific research. He is particularly
interested in automation platforms and performing on-demand scientific analysis at scale.
He has a Ph.D. in Computer Science from Victoria University of Wellington, New Zealand and
a Masters of Science from the same university. His research interests include high
performance computing, scientific computing, cloud computing, cloud economics, and
network inference.
Kyle Chard is a Research Assistant Professor at the University of Chicago and a researcher at
Argonne National Laboratory. He received his Ph.D. in Computer Science from Victoria
University of Wellington, New Zealand. His research interests include data-intensive
computing, cloud computing, and economic resource allocation.
Ian Foster is an Argonne Senior Scientist and Distinguished Fellow and the Arthur Holly
Compton Distinguished Service Professor of Computer Science. Ian received a BSc (Hons I)
degree from the University of Canterbury, New Zealand, and a PhD from Imperial College,
United Kingdom, both in computer science. His research deals with distributed, parallel, and
data-intensive computing technologies, and innovative applications of those technologies to
scientific problems in such domains as climate change and biomedicine. Methods and
software developed under his leadership underpin many large national and international
cyberinfrastructures. Ian is a fellow of the American Association for the Advancement of
Science, the Association for Computing Machinery, and the British Computer Society. His
awards include the Global Information Infrastructure (GII) Next Generation award, the
British Computer Society's Lovelace Medal, R&D Magazine's Innovator of the Year, and an
honorary doctorate from the University of Canterbury, New Zealand. He was a co-founder of
Univa UD, Inc., a company established to deliver grid and cloud computing solutions.