Abstract
Reproducible data analysis requires reproducible software installation. There are many approaches to reproducible software installation – DebianMed, Docker, homebrew-science, software modules, and others. Many work well in cloud and container-enabled environments – where the researcher has full control of a virtual machine or container host and may choose whatever software installation mechanism makes sense. However, these same approaches are less appropriate at high performance computing (HPC) centers where large centralized resources mean such freedom is unavailable. On the other hand, the HPC-centric approaches do not provide options such as ready-to-run software containers ideal for the cloud. Furthermore, some approaches are built to work with command-line scripting while others are built for specific computational platforms or deployment technologies. Here we will outline an approach that covers all of these scenarios with a great deal of flexibility – allowing for the execution of the same binaries regardless of which technologies are selected. For Galaxy in particular, this approach allows the same packages and binaries to be used inside and outside of containerized environments automatically without extra annotation in Galaxy tools.
This approach to reproducibility is the combination of Bioconda and BioContainers.We will update the community on progress in Bioconda adoption and demonstrate that it has improved Galaxy dependency management for both developers and deployers. We will then focus in depth on BioContainers - containerized environments built automatically from Bioconda packages and how they enable containerized tool execution across all best practice Galaxy tools without requiring extra work by tool authors or administrators.