The extraordinary volume and velocity of data produced by scientific instruments presents new challenges
to efficiently organize, process, and share data without overburdening researchers. To address these needs
we are developing Gladier (Globus Architecture for Data-Intensive Experimental Research), a data
architecture that enables the rapid development of customized data capture, storage, and analysis solutions
for experimental facilities. We have deployed a Gladier at Argonne’s Advanced Photon Source (APS) and
Leadership Computing Facility (ALCF) to enable various solutions, including: delivery of data produced
during tomographic experiments to remote collaborators; capture, analysis, and cataloging of data from Xray Photon Correlation Spectroscopy (XPCS) experiments; and feedback based on analysis of data from
serial synchrotron crystallography (SSX) experiments to guide data acquisition.
The Gladier architecture leverages a data/computing substrate based on data and compute agents deployed
across computer and storage systems at APS, ALCF, and elsewhere, all managed by cloudhosted Globus
services. All components are supported by the Globus Auth identity and access management platform to
enable single sign on and secure interactions between components. This substrate makes it easy for
programmers to route data and compute requests to different storage systems and computers. Other
services support the definition and management of flows that coordinate data transfer, analysis, cataloging,
and other activities associated with experimental activities. Each service can be accessed via REST APIs,
and/or from Python via a simple client library (which calls the REST APIs). Scientists can then develop
experiment-specific data solutions by coding to these APIs or library—or reuse or adapt solutions developed
by others. Importantly, both the overall architecture and specific solutions can easily be replicated at other
institutions and extended to provide additional capabilities.
We describe three examples to illustrate how Gladier can be used to implement powerful data collection,
analysis, and cataloging capabilities.
1. DMagic: Automated data delivery to experimentalists. The DMagic system uses a combination of Globus
APIs and APS administrative APIs to 1) automatically create and configure shared storage space on the ALCF
Petrel data service before an experiment begins; and 2) automatically copy over experimental data from the
beamline to Petrel storage as they are produced during the experiment.
2. XPCS data collection, analysis, and cataloguing. This example uses Globus Automate to automatically
collect data at an XPCS experiment, transfer the data to an HPC computer for processing, and then load
processed data into a catalog, from where it can be searched and retrieved by authorized individuals 3.
Rapid feedback for SSX experiments. This example guides SSX experiments by generating statistics and
images of the sample being processed and providing them to the scientists in near real-time. These results
can then be used to determine whether enough data have been collected for a sample, whether a second
sample is needed to produce suitable statistics, or whether the sample is producing enough data to warrant
continued processing.
In this talk we will present the Gladier architecture, highlight the major components used in the
architecture, discuss three example data solutions deployed at APS and ALCF, and describe how the Gladier
architecture can be replicated in other environments.
ABOUT THE AUTHOR
Kyle Chard is a Research Assistant Professor in the Department of Computer Science at the University of
Chicago. He also holds a joint appointment at Argonne National Laboratory. His research focuses on a broad
range of problems in data-intensive computing and research data management. He leads various projects
related to distributed and parallel computing, scientific reproducibility, research automation, and costaware use of cloud infrastructure.
Ian Foster is the Director of Argonne’s Data Science and Learning Division, Argonne Senior Scientist and
Distinguished Fellow, and the Arthur Holly Compton Distinguished Service Professor of Computer Science at
the University of Chicago. Foster’s research contributions span highperformance computing, distributed
systems, and data-driven discovery. He has published hundreds of scientific papers and eight books on
these and other topics. Methods and software developed under his leadership underpin many large
national and international cyberinfrastructures.