Optimising Research Data Management
In 2021, CSIRO initiated the Science Data Stocktake project with the aim of optimizing research data management processes and aligning with the upcoming Australian Government Data Availability and Transparency legislation. This project focused on the discovery, identification, and documentation of administrative metadata for datasets stored within the Information, Management and Technology (IMT) business unit's storage systems.
Following an intensive 18-month journey involving successive iterations of business requirement refinements, researcher engagements, comprehensive solution architecture documentation, and strategic stakeholder engagement planning, we reached a pivotal milestone - the implementation of a pilot engagement within IMT to evaluate the effectiveness of the Data Stocktake Dashboard, aptly named 'Swordfish.'
Swordfish offers a comprehensive overview of data distributed across six distinct physical locations and four storage tiers. Built on the Starfish file system scanning tool and Microsoft's PowerBI with integrations powered by Hitachi Vantara's Pentaho the end result provides a user experience though the Swordfish dashboard that allows the information to be easily sliced to get visibility into data stored across many unstructured file systems that have evolved over the last 30 years.
Over the course of the last two years, we have meticulously tracked and recorded more than 200 engagement activities with data custodians including researchers and staff who deliver research services. We’ve been able to identify gaps in service delivery as well as areas for procedural uplift. Subsequently we drafted the intent statement for the CSIRO Data Archive project. This was approved and is now a project within the Managed Data Ecosystem Program.
Thanks to the ARDC Data Retention Project, the Research Data Culture Conversation, and the National Archives of Australia Petabyte-Plus Data Management Special Interest Group we’ve learned that we’re not alone in not knowing what all 40 petabytes of our research data holdings are and that we’re heading in the right direction to be better able to understand them.
ABOUT THE AUTHORS
Katie Hannan - https://orcid.org/0000-0002-5689-4133 - is a Research Data Specialist at CSIRO in Adelaide, working with Data Management Systems. She is passionate about storytelling, linking people with information and helping to facilitate learning experiences. Her research interests are in the areas of human computer interaction, digital legacy, and information society.
Rene Tyhouse – https://orcid.org/0009-0006-6087-0712 - is an Infrastructure Integration Specialist at CSIRO in Canberra focused on the information lifecycle and the storage systems that underpin it. He strives to optimise the storage offerings though the understanding and aggregation of metrics to be as informed as possible.
For more information about eResearch NZ / eRangahau Aotearoa, visit:
https://eresearchnz.co.nz/