The 2017 Galaxy Community Conference (GCC2017) is being held in Montpellier, France, 26-30 June.  GCC2017 will include keynotes and accepted talks, poster sessions, demos, birds-of-a-feather meetups, exhibitors, and plenty of networking opportunities. There will also be three days of pre-conference activities, including hackathons and training. If you work in data-intensive biomedical research, there is no better place than GCC2017 to present your work and to learn from others.

Friday, June 30 • 14:40 - 15:00
A reproducible data analysis environment for next-generation sequencing on public cloud computer

Manabu ISHII, RIKEN ACCC Bioinformatics Research Unit
Matsushima Akihiro, RIKEN ACCC Bioinformatics Research Unit
Mika Yoshimura, RIKEN ACCC Bioinformatics Research Unit
Hiroki Danno, RIKEN ACCC Bioinformatics Research Unit
Itoshi NIKAIDO, RIKEN ACCC Bioinformatics Research Unit

With the progress of DNA sequencing methods, it continues to increase a quantity of data and type of data to be produced. To analyze such data, we need massive computer resources and setup of various software and databases. Many data-analysis techniques and databases are constantly developed. Accordingly, it takes plenty of time and works to construct an analysis environment, such as procurement of computers, installation of software, and construction of data analysis pipelines.

To cope with both reproducibility and flexibility of the environment, we develop a Docker image with-in Galaxy, job scheduler, and data-analysis pipeline. We also construct a deployment system of the Docker image on a public cloud system such as Microsoft Azure. The procedure of deployment is implemented by source codes using Chef (Infrastructure as Code). The cloud computer system automatically expanded and destroyed computing nodes depending on a demand of amount of jobs.

In this presentation, we will discuss the comparison the setting time of environment, cost, reproducibility of the pipeline, calculation speed between on-premise and public cloud system. We also demonstrate that the system is constructed from a web browser conveniently. Using this system, we have operated an analysis environment for thousands of single-cell RNA-sequencing in our laboratory. The system including data-analysis pipeline has been tested its idempotence with continuous integration / continuous delivery way.


Manabu Ishii

Technical Staff, RIKEN ACCC Bioinformatics Research Unit
RIKEN ACCC Bioinformatics Research Unit

Friday June 30, 2017 14:40 - 15:00
Einstein Auditorium Le Corum, Level 0

