Date of Award
Fall 2021
Project Type
Dissertation
Program or Major
Computer Science
Degree Name
Doctor of Philosophy
First Advisor
Elizabeth Varki
Second Advisor
Radim Bartos
Third Advisor
Daniel Bergeron
Abstract
Reproducibility is of central importance to the scientific process. The difficulty of consistently replicating and verifying experimental results is magnified in the era of big data, in which computational analysis often involves complex multi-application pipelines operating on terabytes of data. These processes result in thousands of possible permutations of data preparation steps, software versions, and command-line arguments. Existing reproducibility frameworks are cumbersome and involve redesigning computational methods. To address these issues, we developed two conceptual models and implemented them through RepeatFS, a file system that records, replicates, and verifies computational workflows with no alteration to the original methods. RepeatFS also provides provenance visualization and task automation.
We used RepeatFS to successfully visualize and replicate a variety of bioinformatics tasks consisting of over a million operations with no alteration to the original methods. RepeatFS correctly identified all software inconsistencies that resulted in replication differences.
Recommended Citation
Westbrook, Anthony Stephen, "RepeatFS: A File System Providing Reproducibility Through Provenance and Automation" (2021). Doctoral Dissertations. 2640.
https://scholars.unh.edu/dissertation/2640