➔ Slides
Authors
Geir Kjetil Sandve, Department of Informatics, University of Oslo
Boris Simovski, Department of Informatics, University of Oslo
Sveinung Gundersen, Department of Informatics, University of Oslo
Diana Domanska, Department of Informatics, University of Oslo
Christin Lund-Andersen, Department of Informatics, University of Oslo
Abstract
Biomedical investigations increasingly consider patient- and cell-type-specific data. These may be generated for a particular study or gathered from public reference collections like ENCODE and Roadmap Epigenomics, which provide hundreds of cell-specific tracks for a variety of markers. These developments make it increasingly important to offer bioinformatics users the ability to efficiently define and manage large collections of datasets.
The introduction of dataset lists in Galaxy has greatly improved handling of multiple datasets. Still, to allow users to casually compile and analyze hundreds of datasets in one go, we argue that the present Galaxy lists should be complemented with a representation where dataset collections are first-class entities in the system. The GSuite format may serve such a purpose, as it represents a collection of datasets as a tabular file in history, allowing the collection to be modified using standard Galaxy tools. Furthermore, users can even manipulate a collection by downloading the tabular file, processing it by any custom script or standard spreadsheet software, and upload the modified file back to Galaxy.
We also present the recently published GSuite HyperBrowser (PMID:28459977), a public Galaxy instance that spearheads efficient and user-friendly analysis of patient- and cell-type-specific data (https://hyperbrowser.uio.no). For instance, a user can easily retrieve hundreds of cell-specific DNase datasets from a repository like ENCODE, fine-tune the collection using a variety of customization tools, and use the collection to asses cell specificity of a separate dataset of disease-associated locations. A regular user can perform a complete analytical scenario like this within minutes.