ARDC Data Retention Project: One year on
The ARDC Data Retention project was designed to establish a standardised, minimal metadata specification using international metadata standards, focused on existing data collections of national significance. Our approach also considers partnerships a co-investment rather than a traditional funding stream, as such actively engages investment recipients with assistance via advice, tools and training to maximise delivery, embed a behaviour and avoid unnecessary additional burden.
The major effort for investment partners was collecting and structuring metadata sufficient for the project requirements. Contacting owners and recording metadata for data collections retrospectively was resource heavy and time consuming. Even with effective engagement, the concepts of ownership, licensing and merit were often not well understood in the context of research data collections. The ARDC has maintained a rich and insightful guidance resource for these concepts (and more) which proved a valuable and reusable resource; for example establishing ownership, where licences can be legally applied, or recording merit, in line with established conventions of peer review and academic research.
Second was assistance to partner operational workflows to efficiently scale workflows registering metadata into the DataCite metadata store. ARDC established a Jupyter notebook <10.5281/zenodo.5574652> that documented the necessary code blocks and a narrative to guide a scalable interaction with the DataCite RESTful API. A co-development principle was deployed that rapidly reinforced the partnership between the ARDC and its stakeholders, assuring the notebook was fit for purpose. The result is an informative and practical notebook that transfers code skill to users while they maximise the investment from the Data Retention project.
Finally, we commissioned small proofs of concept with commercial data mining tools directed to large unstructured research data silos in an effort to establish sufficient metadata that may lead to a more structured intelligence on the data, which can then be used to make informed decisions on the possible routes of any particular data collection though any particular data lifecycle arc.
ABOUT THE AUTHORS
Dr J Max Wilkinson
Max has a comprehensive background in research data management, research data governance and research infrastructure operations. For the last three years he has worked with the Australian Research Data Commons as a research data infrastructure architect, designing a scalable and sustainable investment models for nationally significant research data collections. Prior to this, he has worked with New Zealand eScience Infrastructure (NeSI), the Council of New Zealand Research Librarians (CONZUL), AgResearch, eResearch2020 and MBIE. He lived and worked in the UK for two decades, most recently as Head Of Research Data and Network Services at University College London, the Datasets Programme Manager at the British Library and Informatics coordinator at Cancer Research UK. He received his PhD in Molecular Nephrology from UCL in 2003.
Mr Matthias Liffers
Matthias is a research infrastructure specialist at the Australian Research Data Commons. With a background in data stewardship, he is a technologist that is trying to find a better word to describe himself than "technologist"
Ms Carmel Walsh
Ms Carmel is the Director, eResearch Infrastructure & Services at the Australian Research Data Commons (ARDC). The ARDC provides Australian researchers with competitive advantage through data. It does this by enabling the Australian research community and industry access to nationally significant, data intensive digital research infrastructure, platforms, skills and collections of high quality data. Carmel leads on storage and compute with a focus on the national research cloud compute service for Australian researchers, the ARDC Nectar Research Cloud, and national Data Retention project.