What drives bioinformatic tool accuracy?
In bioinformatics, multiple tools often exist for the same task, yet their accuracy can vary significantly. I have led several projects evaluating the accuracy of bioinformatic software. One of which is focused on detecting protein-coding sequences in nucleotide sequences, which are commonly used in long non-coding RNA pipelines [1]. Benchmarks like this prompted my team to explore what factors are linked to software accuracy [2]. We found that citation-based metrics (H-index, impact factors, citations) had no correlation with accuracy.
Instead, indicators of long-term software support, such as GitHub activity, were strongly associated with better performance. This suggests that sustained support for bioinformatics tools is more beneficial than pursuing citation-based reputation. I will conclude with a grudge-based investigation into the link between academic department affiliation and software accuracy [3].
These studies highlight the crucial role of continuous development and interdisciplinary collaboration in producing reliable bioinformatic software.
1. Champion DJ, Chen TH, Thomson S, Black MA, Gardner PP (2024) Flawed machine-learning confounds coding sequence annotation. bioRxiv. https://doi.org/10.1101/2024.05.16.594598
2. Gardner PP et al. (2022) Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software. https://doi.org/10.1186/s13059-022-02625-x
3. Gardner PP (2024) A Bioinformatician, Computer Scientist, and Geneticist lead bioinformatic tool development - which one is better? https://doi.org/10.1101/2024.08.25.609622
-----
For more information about the eResearch NZ / eRangahau Aotearoa conference, visit:
https://eresearchnz.co.nz/