Globus_research_data_management_service.pdf (2.7 MB)
Globus research data management service
presentationposted on 2020-03-10, 03:57 authored by Ian Foster, Kyle ChardKyle Chard, Ryan Chard
Research data can traverse a multitude of compute and storage devices from their collection, through analysis, dissemination, and archival storage. The scientific data lifecycle often requires acting on data spanning geographical locations and timescales, from nearreal time quality control, to human-oriented curation, through to long-term cataloguing and archival. Further, almost any step of this lifecycle can require the use of specialized hardware or computing resources resident in one or more administrative domains. Combined with ever-growing data rates and volumes, these challenges necessitate new technologies to aid researchers in reliably, and simply, offloading distributed data management and analysis tasks.
To address these needs we have developed Globus Automate--a distributed research automation platform designed to empower scientists to create, deploy, and apply dataoriented pipelines. Globus Automate can reliably automate the entire research data lifecycle, governing data from its generation at various instruments, through analysis, to dissemination and archival, while weaving fine-grained access control throughout the pipeline to securely interoperate with services across administrative domains. Globus Automate enables users to offload the management of data and abstract the challenges associated with distributed analysis and storage pipelines.
Globus Automate fills an important, yet previously unmet need in science by enabling the composition of data management services into distributed data management pipelines. Using any of the provided Globus services, such as Transfer, Search, and Auth, as well as any custom service that exposes an Automate API, users can construct rich data pipelines to perform various tasks. Further, users can leverage funcX--a distributed function as a service platform-- in Automate flows to perform remote computation on almost any resource to which the user has access.
In this talk we will present Globus Automate and describe uses cases from initial pilot deployments. We will describe how funcX and Globus Automate make it possible to easily and seamlessly exploit a wide range of computational resources to automate the research data lifecycle, such as is depicted in Fig 1., from performing preprocessing and quality control tasks locally through to outsourcing large-scale analyses to leadership computing facilities.
ABOUT THE AUTHORS
Ryan Chard is an Assistant Computer Scientist at Argonne National Laboratory having joined 2016 where he was awarded a Maria Goeppert Mayer Fellowship. His research focuses on the development of cyberinfrastructure to enable scientific research. He is particularly interested in automation platforms and performing on-demand scientific analysis at scale. He has a Ph.D. in Computer Science from Victoria University of Wellington, New Zealand and a Masters of Science from the same university. His research interests include high performance computing, scientific computing, cloud computing, cloud economics, and network inference.
Kyle Chard is a Research Assistant Professor at the University of Chicago and a researcher at Argonne National Laboratory. He received his Ph.D. in Computer Science from Victoria University of Wellington, New Zealand. His research interests include data-intensive computing, cloud computing, and economic resource allocation.
Ian Foster is an Argonne Senior Scientist and Distinguished Fellow and the Arthur Holly Compton Distinguished Service Professor of Computer Science. Ian received a BSc (Hons I) degree from the University of Canterbury, New Zealand, and a PhD from Imperial College, United Kingdom, both in computer science. His research deals with distributed, parallel, and data-intensive computing technologies, and innovative applications of those technologies to scientific problems in such domains as climate change and biomedicine. Methods and software developed under his leadership underpin many large national and international cyberinfrastructures. Ian is a fellow of the American Association for the Advancement of Science, the Association for Computing Machinery, and the British Computer Society. His awards include the Global Information Infrastructure (GII) Next Generation award, the British Computer Society's Lovelace Medal, R&D Magazine's Innovator of the Year, and an honorary doctorate from the University of Canterbury, New Zealand. He was a co-founder of Univa UD, Inc., a company established to deliver grid and cloud computing solutions.