➔ Slides
Authors
Michael Sauria, Department of Biology, Johns Hopkins University
James Taylor, Department of Biology, Johns Hopkins University
Abstract
Chromatin architecture is recognized as an integral component of cellular differentiation, gene regulation, and epigenetic homeostasis. In recent years there has been a acceleration in the production of chromatin interaction data, primarily Hi-C data. Full utilization of the data produced from Hi-C experiments has been challenging for many researchers because of computational limitations, bioinformatics knowledge, and a lack of user-friendly tools. To address these challenges, we have compiled a comprehensive database of Hi-C data, supported by Galaxy, and integrated with analysis and visualization tools allowing truly open access to more than 1,500 Hi-C datasets.
To best enable use of these data, we have created a uniform processing and analysis pipeline, executed using CWL workflows and run in containerized environments. We also have developed quality metrics for Hi-C samples to help evaluate sample quality and replicate reproducibility. Each processing step is made available rather than simply endpoint data, including quality metrics for each phase. Data were all processed using HiFive, a Hi-C analysis suite available on Galaxy main.
We have also created a 2-dimensional genome browser connected to Trackster for easy data exploration within Galaxy. Samples can be directly loaded from the data library into Trackster-2D for visual assessment, comparison to one-dimensional genomic annotations, or for Hi-C inter-dataset comparison. In order to support fast browsing and compact of these sparse datasets, we also have developed a multi-resolution 2-dimensional binary tree file format, allowing easy access to any level of resolution and the random access to data necessary for real-time browsing.