We develop systems that facilitate computational reproducibility for scientific and ML workflows
The scientific community is experiencing a reproducibility crisis, and computational experiments are a significant contributing factor. Large-scale studies, surveys, and case studies of computational experiments show that reproducibility is a challenge. Without proper tooling and training, missing dependencies, missing files, under-specified experimental descriptions, and stale files and code inevitably develop. While these are also known issues in software development, the tools that help software developers re-execute code across machines do not fit the research programming paradigm scientists use. We observe that reproducibility goals rely on questions of lineage, "What series of operations led to some computational result?"
Currently we are researching methods to facilitate reproducibility for research programmers who are performing computational experiments using provenance. We are evaluating users' perceptions of provenance and building tools to that handle the labor of reproducibility.
Our previous work includes evaluating the feasibility of retroactive reproducibility, i.e., trying to make computational experiments reproducible again after their authors published them without preserving their original computational environment.
Systopia lab is supported by a number of government and industrial sources, including Cisco Systems, the Communications Security Establishment Canada, Intel Research, the National Sciences and Engineering Research Council of Canada (NSERC), Network Appliance, Office of the Privacy Commissioner of Canada, and the National Science Foundation (NSF).