➔ Slides
Authors
Manabu ISHII, RIKEN ACCC Bioinformatics Research Unit
Matsushima Akihiro, RIKEN ACCC Bioinformatics Research Unit
Mika Yoshimura, RIKEN ACCC Bioinformatics Research Unit
Hiroki Danno, RIKEN ACCC Bioinformatics Research Unit
Itoshi NIKAIDO, RIKEN ACCC Bioinformatics Research Unit
Abstract
With the progress of DNA sequencing methods, it continues to increase a quantity of data and type of data to be produced. To analyze such data, we need massive computer resources and setup of various software and databases. Many data-analysis techniques and databases are constantly developed. Accordingly, it takes plenty of time and works to construct an analysis environment, such as procurement of computers, installation of software, and construction of data analysis pipelines.
To cope with both reproducibility and flexibility of the environment, we develop a Docker image with-in Galaxy, job scheduler, and data-analysis pipeline. We also construct a deployment system of the Docker image on a public cloud system such as Microsoft Azure. The procedure of deployment is implemented by source codes using Chef (Infrastructure as Code). The cloud computer system automatically expanded and destroyed computing nodes depending on a demand of amount of jobs.
In this presentation, we will discuss the comparison the setting time of environment, cost, reproducibility of the pipeline, calculation speed between on-premise and public cloud system. We also demonstrate that the system is constructed from a web browser conveniently. Using this system, we have operated an analysis environment for thousands of single-cell RNA-sequencing in our laboratory. The system including data-analysis pipeline has been tested its idempotence with continuous integration / continuous delivery way.