Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
The 2017 Galaxy Community Conference (GCC2017) is being held in Montpellier, France, 26-30 June.  GCC2017 will include keynotes and accepted talks, poster sessions, demos, birds-of-a-feather meetups, exhibitors, and plenty of networking opportunities. There will also be three days of pre-conference activities, including hackathons and training. If you work in data-intensive biomedical research, there is no better place than GCC2017 to present your work and to learn from others.
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Monday, June 26
 

08:00

Conference desk open
Checkin for Hack the Galaxy events in the morning, and for Tuesday's training during the rest of the day.

Monday June 26, 2017 08:00 - 22:00
Le Corum Le Corum

09:00

Hack the Galaxy: Data
This two day event will gather community members who are interested in collaboratively addressing challenging integration and analysis problems, and establishing best practices for analysis in the life sciences

Monday June 26, 2017 09:00 - 22:00
Sully 2 Level 1, Le Corum

09:00

Hack the Galaxy: Dev
This two day event will gather community members who are interested in contributing to Galaxy's code base, available tool set, and anywhere else that expands the Galaxy ecosystem.

This event aims to expand the contibutor community as well as the Galaxy base.
  

Monday June 26, 2017 09:00 - 22:00
Sully 1 Level 1, Le Corum

11:00

Break
Monday June 26, 2017 11:00 - 11:30
Le Corum Le Corum

13:00

Lunch
Monday June 26, 2017 13:00 - 14:00
Le Corum Le Corum
 
Tuesday, June 27
 

08:00

Conference desk open
Checkin starts at 8am.  You can also check in the day (and night) before as well.

Tuesday June 27, 2017 08:00 - 18:00
Le Corum Le Corum

09:00

Galaxy 101 - A gentle introduction to Galaxy

Slides

This workshop will focus on introducing the Galaxy user interface and how it can be used to analyze large datasets. We will cover the basic features of Galaxy, including where to find tools, how to import and use your data, and an introduction to workflows. This session is recommended for anyone who has not used, or only rarely uses Galaxy.

Prerequisites
  • Little or no experience using Galaxy.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Mallory Freeberg

Mallory Freeberg

Galaxy Project, Johns Hopkins University
Johns Hopkins University
avatar for Mo Heydarian

Mo Heydarian

Galaxy Project, Johns Hopkins University


Tuesday June 27, 2017 09:00 - 11:30
Barthez Room Level 2, Le Corum

09:00

Hack the Galaxy: Data
This two day event will gather community members who are interested in collaboratively addressing challenging integration and analysis problems, and establishing best practices for analysis in the life sciences

Tuesday June 27, 2017 09:00 - 18:00
Sully 2 Level 1, Le Corum

09:00

Hack the Galaxy: Dev
This two day event will gather community members who are interested in contributing to Galaxy's code base, available tool set, and anywhere else that expands the Galaxy ecosystem.

This event aims to expand the contibutor community as well as the Galaxy base.
 

Tuesday June 27, 2017 09:00 - 18:00
Sully 1 Level 1, Le Corum

11:30

Lunch
Tuesday June 27, 2017 11:30 - 12:30
Le Corum Le Corum

12:30

RNAseq analysis in Galaxy

➞ Slides

This workshop will introduce the concepts behind transcriptomics with NGS data and how to analyze this data in Galaxy. Specifically, this workshop will focus on de novo transcriptome reconstruction of RNA-seq data with the following goals:

  • comprehensive identification of all transcripts across an experiment
  • appropriately annotating classes of transcripts
  • generating abundance estimates across a transcriptome
  • significance testing of differentially expressed transcripts
  • visualisation of reads and transcript structures with Trackster
Prerequisites
  • a general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • a wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Mallory Freeberg

Mallory Freeberg

Galaxy Project, Johns Hopkins University
Johns Hopkins University
avatar for Mo Heydarian

Mo Heydarian

Galaxy Project, Johns Hopkins University


Tuesday June 27, 2017 12:30 - 15:00
Barthez Room Level 2, Le Corum

15:00

Break
Tuesday June 27, 2017 15:00 - 15:30
Le Corum Le Corum

15:30

Visualisation of BIG DATA in Galaxy

➔ Slides

This workshop will focus on visualisation of large datasets using the built-in tools of Galaxy, focusing on primary next-generation sequencing (NGS) data and the resulting downstream, aggregated data. First, using a multi-omic dataset consisting of exome and transcriptome (RNA-seq) data, participants will visualise alignments, variation, expression levels, and annotations using the Galaxy’s built-in genome browser, Trackster. Participants will learn how to create a genome visualisation, add data, configure data, move between a linear genome browser view and a Circos view, and generate complex genome visualisations (figures) with more than 12 NGS datasets. Second, using a processed multi-omic dataset, participants will create a several numerical plots (e.g., scatter plot, histogram) to gain an overview of the data. Based on insight gained from these visualisations, participants will create a heatmap to identify patterns and potential causal factors. All visualisations will be created, saved, and shared using only Galaxy and a web browser; no data or software uploads or downloads will be necessary.


Prerequisites
  • a general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • a wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Aysam Guerler

Aysam Guerler

Galaxy Project, Johns Hopkins University
Johns Hopkins University
JG

Jeremy Goecks

Oregon Health Sciences University
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus MC


Tuesday June 27, 2017 15:30 - 18:00
Barthez Room Level 2, Le Corum
 
Wednesday, June 28
 

08:00

Conference desk open
Checkin starts at 8am.  You can also check in the day before as well.

Wednesday June 28, 2017 08:00 - 18:00
Le Corum Le Corum

09:00

RNAseq analysis in Galaxy

➞ Slides

This workshop will introduce the concepts behind transcriptomics with NGS data and how to analyze this data in Galaxy. Specifically, this workshop will focus on de novo transcriptome reconstruction of RNA-seq data with the following goals:

  • comprehensive identification of all transcripts across an experiment
  • appropriately annotating classes of transcripts
  • generating abundance estimates across a transcriptome
  • significance testing of differentially expressed transcripts
  • visualisation of reads and transcript structures with Trackster
Prerequisites
  • a general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • a wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Mallory Freeberg

Mallory Freeberg

Galaxy Project, Johns Hopkins University
Johns Hopkins University
avatar for Mo Heydarian

Mo Heydarian

Galaxy Project, Johns Hopkins University


Wednesday June 28, 2017 09:00 - 11:30
Sully 1 Level 1, Le Corum

09:00

Small genome de novo assembly using Galaxy

Tutorial, Slides

Workshop will cover the basics of de novo genome assembly using a small genome example. This includes project planning steps, selecting fragment sizes, initial assembly of reads into fully covered contigs, and then assembling those contigs into larger scaffolds that may include gaps. The end result will be a set of contigs and scaffolds with sufficient average length to perform further analysis on, including genome annotation. This workshop will use tools and methods targeted at small genomes. The basics of assembly and scaffolding presented here will be useful for building larger genomes, but the specific tools and much of the project planning will be different.

Prerequisites
  • Galaxy 101 or equivalent experience.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Dan Blankenberg

Dan Blankenberg

Galaxy Project, Penn State University
avatar for Simon Gladman

Simon Gladman

Bioinformatician, Melbourne Bioinformatics / University of Melbourne



Wednesday June 28, 2017 09:00 - 11:30
Sully 2 Level 1, Le Corum

09:00

Galaxy Architecture

Want to know the big picture about what is going on inside Galaxy? This workshop will give participants a practical introduction to the Galaxy code base with a focus on changing those parts of Galaxy most often modified by local deployers and new contributors.

The workshop will include the following specific content:

  • A description of the various file and top-level directories in the Galaxy code base.
  • An overview of important Python modules - including models, tools, jobs, workflows, visualisations, and API controllers.
  • An overview of important Python objects and concepts in the Galaxy codebase - including the Galaxy transaction object ("trans"), the application object ("app") , and the configuration object ("config").
  • An overview of various plugin extension points. - An overview of important JavaScript modules that power the front-end.
  • An overview of important JavaScript concepts used by Galaxy - in particular RequireJS, Backbone MVC, and grunt.
  • An overview of the client build system used to generate compressed JavaScript, cascading stylesheets, and other static web assets. 
  • A demonstration of a complete start-to-finish modification of Galaxy - including forking the project on Github, modifying files, running the tests, checking style guidelines, committing the change, pushing it back to your local Github fork, and opening a pull request. 
  • A brief description of other projects in the Galaxy ecosystem (CloudMan, the Tool Shed, bioblend, docker-galaxy-stable, Pulsar, and Planemo).

Instructors
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Nate Coraor

Nate Coraor

Galaxy Project, Penn State University


Wednesday June 28, 2017 09:00 - 11:30
Barthez Room Level 2, Le Corum

09:00

Introduction to Galaxy admin: Setting up a Galaxy instance as a service

What is important when you set up a Galaxy server from scratch, what are the pitfalls you might run into, how to interact with the potential users of the service you gonna offer, and how to make sure, the Galaxy instance you have set up is really used in the end. After a general introduction, several Galaxy installations are presented. The session will include some demonstrations and hands-on exercises. We will finish with a panel discussion, where we intend to discuss questions from the workshop participants.

Prerequisites
 - Virtualbox installed on the laptop 
      https://www.virtualbox.org/wiki/Downloads
 - The VM for the course downloaded from
      http://folk.uio.no/nikolaiv/gcc2017_SetupGalaxyAsService.ova
      and Imported into Virtualbox (File > Import Appliance)
 - Familiar with the Bioinformatics problems (and their solutions) that wet lab scientists run into.
 - Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.
 - A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Jochen Bick

Jochen Bick

ETH Zürich
avatar for Nikolay Aleksandrov Vazov

Nikolay Aleksandrov Vazov

Senior Engineer, University of Oslo
University of Oslo
avatar for Sabry Razick

Sabry Razick

Senior engineer, University of Oslo
University of Oslo


Wednesday June 28, 2017 09:00 - 11:30
Sully 3 Level 1, Le Corum

09:00

Writing & Publishing Galaxy Tools

This session will walk developers and bioinformaticians through the process of taking a working script or application and turning it into a Galaxy tool. It will also cover the basics of using Planemo: a command-line utility to assist in building and publishing Galaxy tools. We will investigate wrapping, common parameters, tool linting, best practices, loading tools into Galaxy, citations, and publishing tools to Github and the Galaxy Tool Shed. Common tips and tricks will be discussed as well as insights from experienced tool developers.

Prerequisites:
  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what commands like cd, mv, rm, mkdir, chmod, grep can do then you will struggle in this workshop.
  • A wi-fi enabled laptop with a modern web browser. Chrome or Firefox will work best.

Instructors
avatar for Dan Blankenberg

Dan Blankenberg

Galaxy Project, Penn State University
avatar for Dave Bouvier

Dave Bouvier

Galaxy Project, Penn State University
avatar for Gildas Le Corguillé

Gildas Le Corguillé

CNRS-UPMC - Station Biologique de Roscoff - ABiMS
avatar for Marius van den Beek

Marius van den Beek

Institut Curie, Paris
avatar for Martin Čech

Martin Čech

Dev, Galaxy Project, Penn State University
avatar for Nick Stoler

Nick Stoler

Penn State University
Penn State University
avatar for Nicola Soranzo

Nicola Soranzo

Earlham Institute


Wednesday June 28, 2017 09:00 - 11:30
Rondolet Room Level 2, Le Corum

11:30

Lunch
Wednesday June 28, 2017 11:30 - 12:30
Le Corum Le Corum

12:30

Galaxy for Proteomics

The Galaxy bioinformatics platform has emerged as a valuable resource for mass spectrometry (MS) based proteomic informatics. An active community of researchers and users, including the Galaxy for proteomics (Galaxy-P) team, continues to extend Galaxy for these applications.

This hands-on workshop will guide participants through the essential steps for using Galaxy for the analysis of MS-based proteomics data, focusing on protein identification and more advanced multi-omic applications. Workflows from emerging applications integrating genomic and proteomic data (such as proteogenomics and metaproteomics) will also be demonstrated.

In order to extend the reach of these workflows to the greater community, the Galaxy-P team has partnered with both the JetStream cyberinfrastructure resource (http://jetstream-cloud.org/) and Amazon Web Services (https://aws.amazon.com).

The workshop will be constructed to follow the steps based on the structure below:

  1. Instructions on how to access the resource (30 minutes)
    • This workshop will provide participants background on Galaxy-P software and workflows that have been made available on the resource. Attendees will take away knowledge on how to access this resource and possibly make use of it for their own MS-based proteomics informatics needs.
  2. Introduction to Proteomics (1 hour)
    • Attendees will learn about inputs for proteomics search and also available software in Galaxy for sequence database searching, which identifies proteins via matching of MS data to sequence databases. Use of these tools and optimization of parameters will be demonstrated and discussed.
  3. Multi-omics Workflows ( 1 hour) Attendees will be exposed to a variety of tools and workflows for filtering results in Galaxy. Emphasis will be on these two workflows
    1. Proteogenomics Workflow: Used for filtering identified peptides from proteogenomic analyses.
    2. Metaproteomics Workflow: Used for identifying genera from identified peptides from metaproteomics analysis.

At the end of the workshop, attendees will have working knowledge of MS-based proteomics tools; experience in setting up basic workflows for protein identification, as well as more advanced workflows in proteogenomics and metaproteomics. 

Participants will be given temporary accounts to a cloud-based Galaxy instance to participate in hands-on workshop activities.

Prerequisites
  • Galaxy 101 or equivalent experience.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
CB

Clemens Blank

University of Freiburg
avatar for James Johnson

James Johnson

Minnesota Supercomputing Institute, University of Minnesota
avatar for Pratik Jagtap

Pratik Jagtap

Research assistant Professor, University of Minnesota
Pratik Jagtap is a Research Assistant Professor at the Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis (USA). In 2000, he received his PhD at the Center for Cellular and Molecular Biology, Hyderabad (India). Later, during his pos... Read More →
avatar for Timothy Griffin

Timothy Griffin

Center for Mass Spectrometry and Proteomics, University of Minnesota


Wednesday June 28, 2017 12:30 - 15:00
Sully 3 Level 1, Le Corum

12:30

How to analyze metagenomic and amplicon data in Galaxy

After metagenomic data generation, you need to extract useful information such as the taxonomic composition of your samples or the metabolics functions made by the studied environment sample. Several tools have recently been integrated into Galaxy for metagenomic data analysis: Mothur, QIIME, MetaPhlAN, HUMAnN, FROGS,....

We will show in this training how to analyze metagenomic and amplicon data inside Galaxy:

  • Extraction of the OTUs using QIIME/Mothur
  • Reconstruction of the taxonomic composition of a sample without OTUs using MetaPhlAn
  • Find the metabolic functions realized in an environment using HUMAnN

Prerequisites
  • Galaxy 101 or equivalent experience.
  • Ideally participants will already be familiar with the concepts behind metagenomics (e.g., OTU).
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Bérénice Batut

Bérénice Batut

Post-doc, University of Freiburg
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus MC


Wednesday June 28, 2017 12:30 - 15:00
Sully 2 Level 1, Le Corum

12:30

Visualisation of BIG DATA in Galaxy

Slides

This workshop will focus on visualisation of large datasets using the built-in tools of Galaxy, focusing on primary next-generation sequencing (NGS) data and the resulting downstream, aggregated data. First, using a multi-omic dataset consisting of exome and transcriptome (RNA-seq) data, participants will visualise alignments, variation, expression levels, and annotations using the Galaxy’s built-in genome browser, Trackster. Participants will learn how to create a genome visualisation, add data, configure data, move between a linear genome browser view and a Circos view, and generate complex genome visualisations (figures) with more than 12 NGS datasets. Second, using a processed multi-omic dataset, participants will create a several numerical plots (e.g., scatter plot, histogram) to gain an overview of the data. Based on insight gained from these visualisations, participants will create a heatmap to identify patterns and potential causal factors. All visualisations will be created, saved, and shared using only Galaxy and a web browser; no data or software uploads or downloads will be necessary.


Prerequisites
  • a general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • a wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Aysam Guerler

Aysam Guerler

Galaxy Project, Johns Hopkins University
Johns Hopkins University
JG

Jeremy Goecks

Oregon Health Sciences University



Wednesday June 28, 2017 12:30 - 15:00
Sully 1 Level 1, Le Corum

12:30

Advanced customisation of a Galaxy instance

Do you have your lab's Galaxy instance set up and configured but want to give it some more love without diving too deep into the code? This training will show you step by step how to modify some advanced but not complex parts of the installation. We will teach you how to:

  • modify the menu
  • prepare a custom tour
  • adjust the graphical interface
  • translate the UI labels to different language
  • set up a built-in user/group chat
  • write and activate interface webhooks 
Prerequisites
  • Introduction to Galaxy admin: Setting up a Galaxy instance as a service, or equivalent experience
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Björn Grüning

Björn Grüning

University of Freiburg
avatar for Eric Rasche

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology @ Texas A&M University
avatar for Martin Čech

Martin Čech

Dev, Galaxy Project, Penn State University


Wednesday June 28, 2017 12:30 - 15:00
Barthez Room Level 2, Le Corum

12:30

Conda and Containers for Tool Dependencies - A Developers Perspective

This workshop is aimed at people with some experience developing tools but may also be of use to deployers who need to manage complex sets of dependencies for tools.

Galaxy tools define the applications and other dependencies they require to run using their requirements section. This training session will cover the elements of the requirements section and how Galaxy can be configured to utilize these.

The current best practice for resolving these dependencies is using Conda and Bioconda, and so a substantial amount of time will be spent on these topics. We will go through the process of creating, testing, and publishing a Bioconda package. We will work through an example of connecting these packages to Galaxy.

We will also discuss how the Biocontainers project constructs Docker containers from Bioconda packages and how to emulate this process for local testing before publication. Finally, we will review approaches to leveraging these containers from Galaxy to run jobs within containers.

Prerequisites:
  • Basic Knowledge of Galaxy Tools, or attendance to either a basic session on building tools or deploying Galaxy.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors
avatar for Abdulrahman Azab

Abdulrahman Azab

Senior Engineer, University of Oslo
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Marius van den Beek

Marius van den Beek

Institut Curie, Paris
avatar for Nicola Soranzo

Nicola Soranzo

Earlham Institute


Wednesday June 28, 2017 12:30 - 15:00
Rondolet Room Level 2, Le Corum

15:00

Break
Wednesday June 28, 2017 15:00 - 15:30
Le Corum Le Corum

15:30

ChIPseq analysis using DeepTools and MACS

Did my IP work? Where is my signal? How well do my replicates correlate? What might my peaks even look like? Where are my peaks (or signal) in relationship to transcription start sites (or other features)? These are common questions that biologists first pose when dealing with ChIPseq data. We will use deepTools and MACS within Galaxy to demonstrate effective methods of

  1. performing ChIPseq-specific quality control,
  2. calling peaks and
  3. visualising signal and peak enrichment around genes or other features.

Prerequisites
  • Galaxy 101 or equivalent experience.
  • Ideally participants will already be familiar with generic NGS quality control and read mapping, since those won't be covered
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Devon Ryan

Devon Ryan

Max Planck Institute of Immunobiology and Epigenetics (MPI-IE)


Wednesday June 28, 2017 15:30 - 18:00
Sully 1 Level 1, Le Corum

15:30

Galaxy for Training and Education

Galaxy is a great platform for teaching diverse scientific topics to a broad user base. The flexibility, reproducibility, and scalability of Galaxy make it an ideal environment for teaching and training. The Galaxy Training Network is a community initiative dedicated to high-quality Galaxy-based training around the world. One of its objectives is to support trainers with complete training material and recommendations about bioinformatics training. Templates and best training practices were defined to help trainers create new material, unify the different material, and make training materials more accessible and easy for users to learn and for teachers to teach.

This workshop will first introduce participants to the infrastructure of the GTN training materials and describe how to generate training materials following best practices. Participants will generate Galaxy Interactive Tours and create Docker Flavours intended for teaching and training sessions. The workshop will also cover best practices for running Galaxy-based workshops, focusing on how to plan a training session based on number of attendees, time constraints, resource availability, and some best practices for leading Galaxy training sessions.


Prerequisites
  • Basic familiarity with using Galaxy (how to import datasets and run tools
  • Basic familiarity with git and Docker will also be helpful for parts of the workshop.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome or Firefox will work best.

Instructors
avatar for Björn Grüning

Björn Grüning

University of Freiburg
avatar for Bérénice Batut

Bérénice Batut

Post-doc, University of Freiburg


Wednesday June 28, 2017 15:30 - 18:00
Rondolet Room Level 2, Le Corum

15:30

SNP analysis and GWAS using SNiPlay

SNiPlay workflow allows to exploit high density SNP data from a VCF file. In this trainings, we will show how to analyze SNP data in different ways:

  • Population structure analysis
  • SNP density and diversity analyses along the chromosomes, and comparison of groups/populations
  • GWAS analysis using associated phenotypic data
  • SNP annotation using an GFF-annotated genome
  • Visualise outputs from the different analyses using a specific Galaxy plugin

SNiPlay has been integrated into Galaxy and is available via the main Tool Shed as a complete workflow.

Dereeper A, Homa F, Andres G, Sempere G, Sarah G, Hueber Y, Dufayard JF, Ruiz M. SNiPlay3: a web-based application for exploration and large scale analyses of genomic variationsNucleic Acids Res. 2015 Jul 1;43(W1):W295-300.

Prerequisites
  • Galaxy 101 or equivalent experience.
  • Ideally participants will already be familiar with SNP data manipulation
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Yann Hueber

Yann Hueber

Bioinformatician, Bioversity International



Wednesday June 28, 2017 15:30 - 18:00
Sully 3 Level 1, Le Corum

15:30

Advanced accelerated Galaxy admin

A compressed top level review of the advanced parts of Galaxy Administrators Course offered at Salt Lake City in November and in Melbourne in February. Given the size of the scope of this topic we will be explaining advanced concepts, pointing out resources and providing guidance, tips, and tricks rather than going through the exercises and into details.

Prerequisites

Instructors
avatar for Dan Blankenberg

Dan Blankenberg

Galaxy Project, Penn State University
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Martin Čech

Martin Čech

Dev, Galaxy Project, Penn State University
avatar for Nate Coraor

Nate Coraor

Galaxy Project, Penn State University
avatar for Simon Gladman

Simon Gladman

Bioinformatician, Melbourne Bioinformatics / University of Melbourne


Wednesday June 28, 2017 15:30 - 18:00
Barthez Room Level 2, Le Corum

15:30

Visualisation Development in Galaxy

In this age of high-throughput analysis and big data, visualisations have become an invaluable resource for the presentation and exploration of these often high-dimensional, complex, and large datasets.

While many tools in Galaxy produce static visual outputs (graphs, trees, etc), often some more interactivity is desired to aid in the exploration of these datasets. To support this need, Galaxy offers a range of visualisation options, such as Trackster for browsing genomic data and Charts for the interactive visualisation of tabular data and other datatypes.

In this workshop participants will learn how to develop such visualisations in Galaxy, more specifically: - Develop a module within the Charts visualisation plugin using Javascript - Develop a simple visualisation plugin from scratch


Prerequisites:
  • Basic understanding of Galaxy from a developer point of view.
  • Some familiarity with Javascript.
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors
avatar for Aysam Guerler

Aysam Guerler

Galaxy Project, Johns Hopkins University
Johns Hopkins University
JG

Jeremy Goecks

Oregon Health Sciences University
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus MC


Wednesday June 28, 2017 15:30 - 18:00
Sully 2 Level 1, Le Corum
 
Thursday, June 29
 

08:00

Conference desk open
Checkin starts at 8am.  You can also check in the day before as well.

Thursday June 29, 2017 08:00 - 19:00
Le Corum Le Corum

09:00

GCC2017 Opening and Welcome
Presenters

Thursday June 29, 2017 09:00 - 09:15
Einstein Auditorium Le Corum, Level 0

09:15

The Evolution of Galaxy: A Rough Timeline
Slides

The history of the Galaxy project, and how it went from a humble Perl script to having its own international meeting in the South of France in the summer of 2017.

Presenters
avatar for Anton Nekrutenko

Anton Nekrutenko

Galaxy Project, Penn State University
Penn State University
avatar for James Taylor

James Taylor

Johns Hopkins University
Johns Hopkins University



Thursday June 29, 2017 09:15 - 10:00
Einstein Auditorium Le Corum, Level 0

09:15

Session 1: Galaxy Framework
Moderators
JG

Jeremy Goecks

Oregon Health Sciences University

Thursday June 29, 2017 09:15 - 10:40
Einstein Auditorium Le Corum, Level 0

10:00

Bioconda and BioContainers - Enabling Universal Reproducibility in Galaxy
Slides

Authors

John Chilton, Department of Biochemistry and Molecular Biology, Penn State University
Marius Van den Beek, Institut Curie
Björn Grüning, Uni-Freiburg (ALU)
Galaxy Team   


Abstract
Reproducible data analysis requires reproducible software installation. There are many approaches to reproducible software installation – DebianMed, Docker, homebrew-science, software modules, and others. Many work well in cloud and container-enabled environments – where the researcher has full control of a virtual machine or container host and may choose whatever software installation mechanism makes sense. However, these same approaches are less appropriate at high performance computing (HPC) centers where large centralized resources mean such freedom is unavailable. On the other hand, the HPC-centric approaches do not provide options such as ready-to-run software containers ideal for the cloud. Furthermore, some approaches are built to work with command-line scripting while others are built for specific computational platforms or deployment technologies. Here we will outline an approach that covers all of these scenarios with a great deal of flexibility – allowing for the execution of the same binaries regardless of which technologies are selected. For Galaxy in particular, this approach allows the same packages and binaries to be used inside and outside of containerized environments automatically without extra annotation in Galaxy tools.

This approach to reproducibility is the combination of Bioconda and BioContainers.We will update the community on progress in Bioconda adoption and demonstrate that it has improved Galaxy dependency management for both developers and deployers. We will then focus in depth on BioContainers - containerized environments built automatically from Bioconda packages and how they enable containerized tool execution across all best practice Galaxy tools without requiring extra work by tool authors or administrators.


Presenters
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University



Thursday June 29, 2017 10:00 - 10:20
Einstein Auditorium Le Corum, Level 0

10:20

Building an open, collaborative, online infrastructure for bioinformatics training

Slides (web)

Authors
Bérénice Batut, University of Freiburg
Galaxy Training Network
Dave Clements, Johns Hopkins University
Björn Grüning, University of Freiburg (ALU)

Abstract
With the advent of high-throughput platforms, life science data analysis is tightly linked to the use of bioinformatics tools, resources, and high-performance computing. However, the scientists who generate the data often do not have the knowledge required to be fully conversant with such analyses. To involve them in their own data analysis, these scientists must acquire bioinformatics vocabulary and skills through training.

Data analysis training is particularly challenging without a computational background. The Galaxy framework is addressing this problem by offering a web-based, intuitive and accessible user interface to numerous bioinformatics tools.

Recently, the Galaxy Training Network (GTN) set up a new open, collaborative, online model for delivering high-quality bioinformatics training material: http://galaxyproject.github.io/training-material.

Each of the current 12 topics provides tutorials with hands-on, slides and interactive tours. Tours are a new way to go through an entire analysis, step by step inside Galaxy in an interactive and explorative way. All material is openly reviewed, and iteratively developed in one central repository by 40 contributors. Content is written in Markdown and, similarly to Software/Data Carpentry, the model separates presentation from content. In addition, the technological infrastructure needed to teach each topic is described with a list of needed tools. The data (citable via DOI) required for the hands-on, time and resource estimations and flavored Galaxy Docker images are also provided.

This material is automatically propagated to Elixir's TeSS portal. With this community effort, the GTN offers an open, collaborative, FAIR and up-to-date infrastructure for delivering high-quality bioinformatics training for scientists.


Presenters
avatar for Bérénice Batut

Bérénice Batut

Post-doc, University of Freiburg



Thursday June 29, 2017 10:20 - 10:40
Einstein Auditorium Le Corum, Level 0

10:40

Break
Thursday June 29, 2017 10:40 - 11:10
Le Corum Le Corum

11:10

An Educational Framework for Galaxy Administration

Slides

Authors

  • Martin Cech, Department of Biochemistry and Molecular Biology, Penn State University, University Park, Pennsylvania, United States
  • Enis Afghan, Department of Biology, Johns Hopkins University, Baltimore, Maryland, United States   
  • Nate Coraor, Department of Biochemistry and Molecular Biology, Penn State University, University Park,  Pennsylvania, United States
  • Simon Gladman, Melbourne Bioinformatics, University of Melbourne, Melbourne, Victoria, Australia
  • Daniel Blankenberg, Department of Biochemistry and Molecular Biology, Penn State University, University Park,  Pennsylvania, United States 
  • Björn Grüning, Department of Computer Science, Albert-Ludwigs-University, Freiburg, Germany
  • Ross Lazarus, Sydney, Australia
  • Dave Clements, Department of Biology, Johns Hopkins University, Baltimore, Maryland, United States


Abstract

With more than 90 public and hundreds of non-public Galaxy servers, there is a growing demand for managing Galaxy application servers. Even though on the surface this is a straightforward task, in the long run or when an instance serves a large number of users, the administration requirements become significant. Examples of administration tasks include proper selection of hardware, deploying in a production-ready mode, connecting to a compute cluster or external authentication, keeping an instance up to date, etc. Cumulatively, these tasks require detailed knowledge of the Galaxy administration principles. In response, we have designed Galaxy Admin Training materials - a set of tutorials catered to current and future administrators intended to educate them about the principles of Galaxy administration.

The Galaxy Admin Training is envisioned as a multi-day interactive training workshop and/or a set of materials that can be followed in a self-paced setting. The current set of topics cover a range from introductory steps required to setup an instance of Galaxy on one's laptop, to advanced Galaxy server or cluster setup, to managing users and resources, to setting up Interactive Environments. All the materials are developed and available in a public GitHub repository (https://github.com/galaxyproject/dagobah-training), facilitating revisions and expansion of topics.

Conceptually, the materials represent an educational framework for training Galaxy server administration. Any community additions/modifications/updates to the materials can be added to the master repository via a pull request. Community involvement and reuse of the materials is encouraged, with help available from the past instructors.


Presenters
avatar for Nate Coraor

Nate Coraor

Galaxy Project, Penn State University
avatar for Simon Gladman

Simon Gladman

Bioinformatician, Melbourne Bioinformatics / University of Melbourne



Thursday June 29, 2017 11:10 - 11:30
Einstein Auditorium Le Corum, Level 0

11:10

Session 2: Galaxy Framework, continued
Moderators
avatar for Dave Clements

Dave Clements

Training and Outreach Coordinator, Galaxy Project, Johns Hopkins University

Thursday June 29, 2017 11:10 - 12:40
Einstein Auditorium Le Corum, Level 0

11:30

Integrative analysis of hundreds of datasets in Galaxy

Slides

Authors
Geir Kjetil Sandve, Department of Informatics, University of Oslo
Boris Simovski,  Department of Informatics, University of Oslo
Sveinung Gundersen, Department of Informatics, University of Oslo
Diana Domanska, Department of Informatics, University of Oslo
Christin Lund-Andersen, Department of Informatics, University of Oslo 


Abstract
Biomedical investigations increasingly consider patient- and cell-type-specific data. These may be generated for a particular study or gathered from public reference collections like ENCODE and Roadmap Epigenomics, which provide hundreds of cell-specific tracks for a variety of markers. These developments make it increasingly important to offer bioinformatics users the ability to efficiently define and manage large collections of datasets.

The introduction of dataset lists in Galaxy has greatly improved handling of multiple datasets. Still, to allow users to casually compile and analyze hundreds of datasets in one go, we argue that the present Galaxy lists should be complemented with a representation where dataset collections are first-class entities in the system. The GSuite format may serve such a purpose, as it represents a collection of datasets as a tabular file in history, allowing the collection to be modified using standard Galaxy tools. Furthermore, users can even manipulate a collection by downloading the tabular file, processing it by any custom script or standard spreadsheet software, and upload the modified file back to Galaxy.

We also present the recently published GSuite HyperBrowser (PMID:28459977), a public Galaxy instance that spearheads efficient and user-friendly analysis of patient- and cell-type-specific data (https://hyperbrowser.uio.no). For instance, a user can easily retrieve hundreds of cell-specific DNase datasets from a repository like ENCODE, fine-tune the collection using a variety of customization tools, and use the collection to asses cell specificity of a separate dataset of disease-associated locations. A regular user can perform a complete analytical scenario like this within minutes.


Presenters
GK

Geir Kjetil Sandve

University of Oslo



Thursday June 29, 2017 11:30 - 11:50
Einstein Auditorium Le Corum, Level 0

11:50

Galaxy Genome Annotation Project: Galaxy and GMOD for Annotation, Teaching, and Genomic Databases

Slides

Authors
Eric Rasche, Center for Phage Technology, Texas A&M University (CPT)
Björn Grüning, Department of Computer Science [Freiburg]
Nathan Dunn, Lawrence Berkeley National Laboratory (LBNL)
Anthony Bretaudeau, BIPAA/GenOuest


Abstract
GMOD projects have long provided powerful open-source tools to the bioinformatics community, but have historically been hard to configure and integrate. The Galaxy Genome Annotation (GGA) group provides a highly integrated set of Dockerized GMOD projects allowing for more widespread use of these tools in new contexts for system administrators wishing to deploy the suite. Our projects include maintenance of the Galaxy-Apollo bridge tools, Galaxy-Tripal and Chado tooling, and containerized versions of various GMOD projects which are configured to easily integrate with the rest of the suite.

This talk will explore the use of this suite in the context of a real life use-case, an undergraduate phage annotation course. We will cover the GGA suite as well as various integrations, workflows, training materials, and tools that were built and made available in support of GGA.


Presenters
avatar for Eric Rasche

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology @ Texas A&M University



Thursday June 29, 2017 11:50 - 12:10
Einstein Auditorium Le Corum, Level 0

12:10

Apollo in Galaxy: Increasing Opportunities for Collaborative Genome Annotation
 Slides

Authors

Nathan Dunn, Lawrence Berkeley National Lab (LBNL)
Monica Munoz-Torres, Lawrence Berkeley National Lab (LBNL)
Deepak Unni, Division of Plant Sciences, University of Missouri
Eric Rasche, Center for Phage Technology, Texas A&M University (CPT)
Eric Yao, Department of Bioengineering, University of California, Berkeley (UC Berkeley)
Ian Holmes, Department of Bioengineering, University of California, Berkeley (UC Berkeley)
Christine Elsik, Division of Plant Sciences, University of Missouri
Suzanna Lewis, Lawrence Berkeley National Lab (LBNL)


Abstract
Manual refinement of automated gene predictions using experimental evidence is a crucial step for improving the quality of a genome's annotation. Apollo, which utilizes the JBrowse genome browser, is a web-based genome annotation editor used by well over one hundred annotation projects. Annotation changes are reflected in real-time (like Google Docs), which facilitates distributed curation efforts. A single Apollo server can scale to support multiple genome projects and regulate access to multiple curators via fine-grained permissions.

Apollo has been successfully integrated with Galaxy via Docker, and externally via its web-services, allowing the community to refine predicted genome elements generated via Galaxy workflows. Annotated genomic elements may be exported as FASTA, GFF3, or as a Chado database.

We introduce two important features nearing completion. The first is variant annotation, which provides both a way to annotate and visualize variants as well as to visualize individual and combined effects of each variant on a given annotation. The second is coordinate transformation, which allows the visualization of two or more genomic regions, from the length of entire chromosomes to just a few exons, within an artificially constructed “assemblage”. This facilitates annotation of genomic features split across two or more regions of a fragmented assembly, while informing potential improvements to the genome assembly in the process. Additionally, inter- and intragenic regions can be hidden to focus on regions of interest. For example, bringing the sequences of exons separated by thousands of base-pairs to be shown adjacently.

Learn more at http://genomearchitect.org/.


Presenters
avatar for Nathan Dunn

Nathan Dunn

Lead Software Engineer, Lawrence Berkeley National Laboratory
Lawrence Berkeley National Lab (LBNL)



Thursday June 29, 2017 12:10 - 12:30
Einstein Auditorium Le Corum, Level 0

12:30

Galaxy at the IFB: From computing to training
Slides

Presenters

Victoria Dominiguez del Angel 1
Christophe Caron 2

1 : ELIXIR Training Coordinator (FRANCE), Institut Français de Bioinformatique,| UMS3601 IFB-core, bat. 21, CNRS, 2 Av. de la Terrasse, 91190 Gif-sur-Yvette, France
2 : INRA


Abstract

The French Institute of Bioinformatics (IFB) is a national service infrastructure in bioinformatics constituted of 35 bioinformatics platforms spanning the entire territory. 

The principal mission is to provide an integrates and sustains bioinformatics resources and services across the life science community.

These services can be grouped in:

 

  • Data: provision of curated data collections with added value based on the biological expertise of the host laboratory
  • Tools: diffusion of innovative tools for analyzing biological data
  • Training and
  • Infrastructure.

 

The IFB-Infrastructure coupled to the required computing and storage capacity in a national bioinformatics cloud. To address the most common needs, a selection of major scientific software tools was made and they were installed in pre-configured virtual images (cloud appliances), ready to run on the IFB’s cloud (biosphere). IFB is also using the lightweight virtualization based on Docker containers to provide bioinformatics tools and pipelines ready to run in the cloud or locally to personalize their virtual research environment. One of the most widely adopted images/containers coming from Galaxy.

The IFB also contributes for structuring and organizing the French bioinformatics community on emergent needs like Galaxy platforms. At the National level, IFB member is gathering in the French Galaxy Working.  The objective of this working group is to federate the French bioinformatics community of developers working on the Galaxy environment. This working group activities are related to the 3 following topics:

  • Architecture and optimization that addresses operation issues in production environments, options to automate deployment tasks and monitoring solutions of Galaxy instances.
  • Development and tool integration that proposes tools and procedures to facilitate the integration of tools in Galaxy instances
  • Training: for the developers and the end-users at the national and European level.

Presenters
avatar for Victoria Dominguez del Angel

Victoria Dominguez del Angel

ELIXIR Training Coordinator (FRANCE) | Responsable de la cellule Communication, Formation et Valorisation | Institut Français de Bioinformatique | UMS3601 IFB-core, bat. 21 | CNRS, 2 Av. de la Terrasse | 91190 Gif-sur-Yvette, France


Thursday June 29, 2017 12:30 - 12:45
Einstein Auditorium Le Corum, Level 0

12:45

Lunch
Thursday June 29, 2017 12:45 - 13:45
Le Corum Le Corum

12:45

Arts & Crafts & CTF

Arts & Crafts BoF

GCC sure can be overwhelming sometimes! This BoF is a quiet place to do some stress free, science related, arts and crafts.

Hack the Universe and Capture the Flag!

Saskia and Eric have also put together a Capture the Flag (CTF) event for GCC2017.  Hack the Universe is a completion where you attempt to hack into a Galaxy instance and learn about Galaxy, Galaxy Administration, vulnerabilities, and good security practices.  The CTF competition will start on the first night of GCC2017, and run for a week after.

Anyone! We strongly recommend that you form teams involving at least one bioinformatician or biologist, and one computer person. Get out there, make new friends!

Interested?  See https://ctf.galaxians.org/ for more and visit Saskia and Eric at the Arts & Crafts BoF.    

If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.

And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.

 


Moderators
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus MC
avatar for Eric Rasche

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology @ Texas A&M University

Thursday June 29, 2017 12:45 - 13:45
TBA

12:45

Birds-of-a-Feather (BoF) Flocking Session 1
Birds of a Feather are informal gatherings of people that are interested in a common topic. 

If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.

And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.


Thursday June 29, 2017 12:45 - 13:45
Le Corum Le Corum

12:50

Reproducibility best practices for computational analyses involving large numbers of datasets BoF

The aim for this BoF is a very concrete one: to initiate a collaborative project towards a manuscript on best practices for reproducibility and transparency when doing computational analysis involving large numbers of datasets. Anyone curious or potentially interested in contributing to such a manuscript is encouraged to join!

Ensuring that omics analyses are truly transparent and reproducible is always a challenge, and even more so when analyses involve large numbers of datasets. The exact scope of a manuscript on this topic and organization of the work is open for discussion at the BoF session. How to make transparent analyses using Galaxy and dataset collections will clearly be central. How much focus to put on comparison with alternative interfaces is open for discussion. As for organization of the work, collaborative editing on an open document might serve as a main pillar, but this would also be a point of discussion at the BoF.

If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.


And
, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.


Moderators
GK

Geir Kjetil Sandve

University of Oslo

Thursday June 29, 2017 12:50 - 13:55
BoF Space 1, Antigone Room Antigone Room, Level 2

12:50

Supporting the non-coder Galaxy community BoF

How can we support Galaxy community members whose (main) contribution is not coding? I.e. they might be doing training (although GTN is generally covering these people), being "ambassadors" for Galaxy, doing user support. Some questions:

  1. How can these people be recognised and 
  2. what communication channels can enhance this part of the community and 
  3. how can this community and the coder / developer community link?

Moderators
PV

Peter Van Heusden

South African National Bioinformatics Institute (SANBI)

Thursday June 29, 2017 12:50 - 13:55
BoF Space 2, Antigone Room Antigone Room, Level 2

13:45

Galaxy Community Update
Slides

Authors

Jeremy Goecks, Oregon Health & Science University
Dan Blankenberg, Penn State University
Galaxy Community

Abstract
An update on what's been happening and what's planned for the Galaxy Project.


Presenters
avatar for Dan Blankenberg

Dan Blankenberg

Galaxy Project, Penn State University
JG

Jeremy Goecks

Oregon Health Sciences University



Thursday June 29, 2017 13:45 - 14:20
Einstein Auditorium Le Corum, Level 0

13:45

Session 3: Tools and Workflows
Moderators
avatar for Nicola Soranzo

Nicola Soranzo

Earlham Institute

Thursday June 29, 2017 13:45 - 15:40
Einstein Auditorium Le Corum, Level 0

14:20

The Year of IUC

Authors
Björn Grüning, Uni-Freiburg (ALU)
Bérénice Batut, Uni-Freiburg (ALU)
Marius Van den Beek, Institut Curie
Daniel Blankenberg, Penn State University
Dave Bouvier, Penn State University
Anthony Bretaudeau, Institut de Génétique, Environnement et Protection des Plantes (IGEPP)
Nate Coraor, Penn State University
John Chilton, Penn State University
Peter Cock, The James Hutton Institute
Saskia Hiltemann, Erasmus University Rotterdam , Rotterdam
Youri Hoogstrate, Erasmus MC
James Johnson, University of Minnesota
Greg von Kuster, Penn State University
Lance Parsons, Princeton University
Eric Rasche, Center for Phage Technology, Texas A&M University (CPT)
Nicola Soranzo, Earlham Institute 


Abstract
Galaxy tools are a first class object in Galaxy and virtually any tool that can be run from the command line or has some kind of API can be integrated into Galaxy.

The ability to seamlessly and easily integrate tools into Galaxy spawned a large community of Galaxy tool developers and a large suite of programs and services around Galaxy tool development. The Galaxy Tool Shed (https://usegalaxy.org/toolshed) is an App Store for Galaxy tools, Cargo Port is a package mirror to make software location sustainable and Planemo is the Galaxy Tool development kit, to name a few.

The Intergalactic Utilities Commission (IUC) was founded in 2012 as a community group to define standards, to develop best-practices, to maintain tools and the services we have built as a community. The IUC is actively supporting all Galaxy tool developers and is maintaining high-quality training material to provide training events worldwide.


In this talk we will highlight the achievements of the IUC community since the last year. We will update the community on our on-going effort in shifting our dependency stack to BioConda and BioContainers, 3 new community members and our regular Contribution fests that we organised since last GCC. Moreover, we would like to talk about our plans for next year and invite everyone to join our ranks to shape the future of the Galaxy tool community.


Presenters
avatar for Björn Grüning

Björn Grüning

University of Freiburg


Thursday June 29, 2017 14:20 - 14:40
Einstein Auditorium Le Corum, Level 0

14:40

Workflow4Metabolomics: Towards an international computing infrastructure and a tools showcase for Metabolomics

Slides

Authors

  • Gildas Le Corguillé, ABiMS
  • Franck Giacomoni, PFEM
  • Pierrick Roger-Mele, Laboratory for Data Analysis and Systems Intelligence (CEA-LIST/LADIS)  
  • Christophe Duperier, PFEM  
  • Mélanie Pétéra, PFEM  
  • Yann Guitton, Laboratoire d'Etude des Résidus et Contaminants dans les Aliments - Oniris (LABERCA)  
  • Marie Tremblay-Franco, Toxalim  
  • Jean-François Martin, Toxalim  
  • Cécile Canlet, Toxalim  
  • Alexis Delabrière, Laboratory for Data Analysis and Systems Intelligence (CEA-LIST/LADIS)  
  • Etienne Thévenot, Laboratory for Data Analysis and Systems Intelligence (CEA-LIST/LADIS)  
  • Christophe Caron, Ingenum



Abstract
Metabolomics data analysis is a complex, multistep process, which is constantly evolving with the development of new analytical technologies, mathematical methods, and bioinformatics tools and databases. The Workflow4Metabolomics (W4M) project aim to develop full LC/MS, GC/MS, FIA/MS and NMR pipelines using Galaxy framework for data analysis including preprocessing, normalization, quality control, statistical analysis and annotation steps.

The W4M Core Team is fully involved building tools for the metabolomics community and its does particular efforts in tool quality and for disseminate their work. The tools developed strive to adopt recommendations implemented by the Galaxy team and the IUC Best Practices. Wrappers are openly available on GitHub and automatically tested using Planemo on the TravisCI platform. The dependencies are managed with Conda packages. Eventually, wrappers are distributed on the ToolShed. Thanks to the Galaxy community, we are allowed to provide some ready-for-use Docker Galaxy flavor and Vagrant VM using Ansible roles.

Meanwhile, the Workflow4Metabolomics Galaxy infrastructure (workflow4metabolomics.org) provides an on-line, user-friendly and high-performance environment to build, run and share metabolomics workflows for LC-MS, GC-MS,  FIA/MS and NMR technologies. In parallel of providing expert and reference workflows, the W4M infrastructure is totally open to community contributions. This contribution should take different forms: i) as complete integration in the W4M ecosystem, shared input/output formats and support involvement or ii) using W4M portal and infrastructure as a showcase for external developers, proposing a functional version of a tool.


Presenters
avatar for Gildas Le Corguillé

Gildas Le Corguillé

CNRS-UPMC - Station Biologique de Roscoff - ABiMS



Thursday June 29, 2017 14:40 - 15:00
Einstein Auditorium Le Corum, Level 0

15:00

South Green Galaxy: a suite of tools for plant genomics

Slides

Authors

  • Jean-François Dufayard, CIRAD UMR AGAP (AGAP)  
  • Marilyne Summo, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD)  
  • Gaëtan Droc, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD)  
  • Mathieu Rouard, CIRAD UMR AGAP (AGAP)  
  • Bertrand Pitollat, CIRAD UMR AGAP (AGAP)  
  • Paul Pastor, CIRAD UMR AGAP (AGAP)  
  • Delphine Larivière, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD)  
  • Frédéric De Lamotte, INRA UMR AGAP (AGAP)  
  • Stéphanie Bocs, CIRAD UMR AGAP (AGAP)  
  • Manuel Ruiz, CIRAD UMR AGAP (AGAP)  
  • Felix Homa, CIRAD UMR AGAP (AGAP)  
  • Guillaume Martin, CIRAD UMR AGAP (AGAP)  
  • Gautier SARAH, Amélioration Génétique et Adaptation des Plantes Méditerranéeennes et Tropicales (AGAP)
  • Sébastien RAVEL, Biologie et génétique des interactions plantes-parasites pour la protection intégrée (BGPI)
  • Vincent Maillol, CIRAD UMR AGAP (AGAP)  
  • Jonathan Lorenzo, Institut Français de Bioinformatique (IFB)   
  • Alexis Dereeper, CIRAD UMR AGAP (AGAP)  


Abstract

South Green is a bioinformatics platform applied to the genomic resource analysis of southern and Mediterranean plants. The South Green web portal (http://www.southgreen.fr/) provides access to a large panel of bioinformatics resources including its own Galaxy instance which support a large community of users in Montpellier, France and beyond.

In addition to the generic tools provided with the standard installation of Galaxy, the South Green Galaxy instance (http://galaxy.southgreen.fr/galaxy/) contains a large collection of exclusive tools, Galaxy wrappers and workflows designed for analyses applied to plant genomes.

It actually comprises more than 100 Galaxy wrappers, 9 pre-configured workflows designed for recurrent analyses such as NGS mapping/cleaning, RNAseq, SNP calling and filtering, Genome-Wide Association Study, basic population genetics, structural variations, metagenomics and phylogenetics. We also developed innovative solution to graphically display outputs of each workflows.

Home-made Galaxy wrappers have been deposited in our local/central toolshed (http://galaxy.southgreen.fr/toolshed/) or in github (https://github.com/SouthGreenPlatform/galaxy-wrappers). Galaxy is extensively used to conduct capacity building activities. It is currently connected to HPC but we are also initiating use of Docker to disseminate some workflows in the IFB cloud, thus facilitating training activities worldwide.


Presenters
avatar for Jean-François Dufayard

Jean-François Dufayard

Researcher, CIRAD
CIRAD



Thursday June 29, 2017 15:00 - 15:20
Einstein Auditorium Le Corum, Level 0

15:20

Performing Next Generation Phylogenetic Analyses with NGPhylogeny.fr

Slides

Authors


Abstract
Phylogenetic analyses aim at reconstructing the evolutionary history of biological objects from molecules to species, and populations. Faced with the number of programs available and the difficulty for scientists to combine them, we designed in 2008 Phylogeny.fr, which has quickly become one of the most used platforms to perform phylogenetic analyses. However, due to the diversity of analyses performed (phylogeny.fr can be simultaneously used by hundreds of students or can be used through batch scripts), the number of analyses performed (50,000 per month), and the number of new phylogenetic tools available, the need to refactor Phylogeny.fr has become crucial.

In this talk, we introduce NGPhylogeny.fr (Next Generation Phylogeny.fr), developed within a Python Web framework (Django), in which we have refactored Phylogeny.fr and made it distributable by designing a scalable environment, an easy-to-use web interface based on a series of modular Galaxy workflows able to perform a very large variety of phylogenetic analyses. Moreover, we have performed a reproducibility study, to systematically compare the results obtained by the Galaxy-based NGPhylogeny.fr workflow and the original phylogeny.fr, using real datasets.

Our talk will highlight how (i) NGPhylogeny.fr can be used in a functional genomics context to quickly analyze large sets of protein superfamilies, (ii) in-depth studies can be quickly launched and (iii) NGPhylogeny.fr can be installed on a wide variety of configurations. On a more generic aspect, we will underline the benefit of designing a coupled Django-interface / workflow-Galaxy environment for end-users.


Presenters
DC

Damien Correia

Institut Français de Bioinformatique - UMS CNRS 3601 (IFB-CORE); Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI); Laboratoire de Recherche en Informatique (LRI)



Thursday June 29, 2017 15:20 - 15:40
Einstein Auditorium Le Corum, Level 0

15:40

15:40

15:40

P01: Galaxy-docker-ansible
Poster

Authors

Ruben Vorderman  1

1 : Leiden University Medical Center  (LUMC)


Abstract
Galaxy-docker-ansible aims to simplify the installation of a galaxy instance to running just a few commands.


Setting up a galaxy instance requires a lot of effort. Solutions like docker-galaxy-stable and the ephemeris installer have reduced this effort considerably. Unfortunately the amount of configuration is still immense.

Galaxy-docker-ansible is a collection of ansible scripts that simplifies the installation of galaxy on a server to running just one command for installation and one command for provisioning. Configuration is host and group based which allows for a multitude of servers to be set up with just one command. 

The project can be found at https://github.com/LUMC/galaxy-docker-ansible. It builds upon work done at https://github.com/bgruening/docker-galaxy-stable and https://github.com/galaxyproject/ephemeris to make installation of a Galaxy server very simple.


Presenters
avatar for Ruben Vorderman

Ruben Vorderman

Leiden University Medical Center
Currently I am working at LUMC to set up a galaxy server that communicates with our Sun Grid Engine cluster. | We have developed a script that does an automated set up of a galaxy instance on an ubuntu server using the docker-galaxy-stable image and ansible. | At LUMC I am pa... Read More →



Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P03: Automated customized deployment Galaxy portal instance in IFB Cloud Computing
Authors
Sandrine Perrin 1*, Björn Grüning 2, Bryan Brancotte 1, Christophe Blanchet 1*,

1 : Institut Français de Bioinformatique  (IFB)
2 : Biological Systems Analysis  (ZBSA), Department of Computer Science, Albert-Ludwigs-University,Center for Biological Systems , University of Freiburg, Freiburg, . -  Germany
* : Corresponding author


Abstract
The life science community has heterogeneous needs in terms of bioinformatics resources, software and services. It needs turnkey work environments, called appliances, and will benefits from an easy maintenance of such appliances. The French Institute of Bioinformatics (IFB) developed an appliance to easily deploy a Galaxy portal instance in IFB cloud, addressing the need of punctual analysis of potentially large set of biological data.

The project Cyclone H2020 set a use case on "Shared environment between cloud Galaxy portals" focusing on one-click deployment, secure authentification, and sharing workflows, data and histories.
The appliances answering the use case leverage docker images of Björn Grüning or related flavors, it proposes a one-click deployment from Biosphere with Slipstream technology (multi-cloud application management platform), and secured authentification based on the EduGAIN identity. This authentification is used for all access to the virtual machines (no ssh key needed). Note that user can grant several users in this Galaxy instance. 
Ongoing work includes sharing data and accessing to mutualised referenced data in persistent work environment.

The main use case considers a first bioinformatician who customizes the appliance to adjust configuration and add tools from yaml file edition, and users who use the customized appliance to conduct their analysis with various need of computation resources.

This solution differs from common usage of Galaxy instance in proposing to the user to deploy an disposable appliance with computational resources tailored to the analysis needs. It is particulary adapted to tutorial sessions and on demand analysis.

Presenters
SP

Sandrine Perrin

engineer, Institut français de Bioinformatique
Institut Français de Bioinformatique (IFB) | Portal : http://biosphere.france-bioinformatique.fr/


Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P05: The EDAM ontology and its integration into Galaxy
Poster

Authors

Hervé Ménager 1, Jon Ison 2, Matúš Kalaš 3, Veit Schwaemmle 4, Edam Contributors

1 : Bioinformatics and Biostatistics Hub of the C3BI, Institut Pasteur, Paris, France
2 : Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark
3 : Computational Biology Unit (CBU), University of Bergen
4 : Protein Research Group, Department for Biochemistry and Molecular Biology, University of Southern Denmark


Abstract
EDAM is an ontology of well established, familiar concepts that are prevalent within bioinformatics, including types of data and data identifiers, data formats, operations, and topics. EDAM has a simple structure, and comprises a set of concepts with terms, synonyms, definitions, relations, links, and some additional information (especially for data formats).

7 consecutive stable versions of EDAM have been released since July 2015 (version 1.10), with version 1.17 being the current one at the time of the abstract submission. EDAM is developed in a participatory and transparent fashion, with a growing community of contributors. EDAM is used by multiple bioinformatics projects, including Debian Med[1] and the Common Workflow Language[2].  

Thanks to the Galaxy community, it has been integrated with Galaxy through recent modifications:
  • the annotation of Galaxy datatypes with their corresponding EDAM data and format terms,
  • the possibility to specify EDAM topics and operations in Galaxy tool definitions.
This allows for an easier integration of Galaxy and other EDAM-compatible systems, including the ELIXIR bio.tools registry of bioinformatics tools and services, for which a direct application of this mapping enables the automated registration in bio.tools of Galaxy services[3], and the semi-automated creation of Galaxy tool definitions using bio.tools metadata[4].

[1] http://debian.org/devel/debian-med
[2] http://commonwl.org/
[3] https://github.com/c3bi-pasteur-fr/regate
[4] https://github.com/bio-tools/ToolDog

Presenters
avatar for Hervé Ménager

Hervé Ménager

Research Engineer, Institut Pasteur
Bioinformatics and Biostatistics Hub of the C3BI, Institut Pasteur



Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P07: myFAIR Analysis: Personal FAIR Data Management and Analysis
Authors
Saskia Hiltemann 1, Rick Jansen 1, David Van Zessen 1, John Hays 2, Andrew Stubbs 1*

1 : Bioinformatics Dept., Erasmus University Medical Center  (Erasmus MC)
2 : MMIZ, Erasmus University Medical Center  (Erasmus MC)
* : Corresponding author


Abstract
In recent publications, it has been proposed that approximately 50% of pre-clinical research data is not reproducible. Whilst we cannot address all the challenges associated with these findings we can work towards creating an ecosytem at Erasmus MC in which we enable researchers to comply with FAIR data (Findable, Accessible, Interoperable, Re-usable) principles. The correct ‘long-term care' of valuable digital assets means that data can be efficiently re-used for subsequent investigations, either alone, or in combination with newly generated data to create new knowledge. To address the challenges associate with un-FAIR data the European Union has initiated a plan for Open research data in H2020 which is a requirement for all grantees, whereby all research data complies with the FAIR data principles. Our aim was to implement a secure and an easy to use translational research application for clinical research scientists that uses existing informatics technology and services with FAIR data principles built into the design. We implement a generic “end to end” FAIR data point and analysis architecture, myFAIR Analysis, that is applicable for any type of translational or clinical research project. myFAIR was developed using FAIR Data compliant applications including B2DROP (EUDAT), an ownCloud, that provides end users with Dropbox like storage and sharing services which comply with FAIR data principles. myFAIR uses Galaxy to deliver reusable and “provenant” predefined workflows. myFAIR Analysis enables scientists to apply FAIR data principles to both their data and analysis within one single web application that will be freely available from https://github.com/ErasmusMC-Bioinformatics/myFAIR.

Presenters
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus MC


Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P09: Wrapping tools: on the shoulders of the GCC community
Authors
Valentin Marcon 1, Nathalie Choisne 2, Veronique Jamilloux 2, Gwendoline Andres, Isabelle Luyten 2, Joelle Amselem 2, Mikael Loaec 2, Françoise Alfama-Depauw 2, Sarah Maman-Haddad 3, Melanie Petera 4, Luc Jouneau 5, Sandrine Laguerre 6, Olivier Inizan 1 

1 : MaIAGE, Institut National de la Recherche Agronomique - INRA, Centre de Jouy-en-Josas Domaine de Vilvert, France
2 : Unité de Recherche Génomique Info  (URGI), Institut National de la Recherche Agronomique - INRA, Centre de recherche de Versailles, France
3 : Génétique Physiologie et Systèmes d'Elevage  (GenPhySE), Institut national de la recherche agronomique [Toulouse], France
4 : Unité de Nutrition Humaine  (UNH), Institut national de la recherche agronomique (INRA)
5 : Unité de recherche Virologie et Immunologie Moléculaires  (VIM), Institut National de la Recherche Agronomique, INRA Centre de Recherche de Jouy-en-Josas, France
6 : l'Ingénierie des Systèmes Biologiques et des Procédés  (LISBP), Institut National des Sciences Appliquées de Toulouse, France


Abstract
This year at the Galaxy Community Conference, the Galaxy project will celebrate its 12th anniversary. Twelve years where the Accessibility, Reproducibility and Transparency (ART) of the bio-analyses have been the principal concern of the developers of this framework. Today the project offers improved development methods with a set of good practices ensuring the ARTnature of the analysis.

French community has always been active in tools development. This resulted to a large heterogeneity of wrappers, some of them not having benefited from recent advances in terms of good practices. The goal of the project “Galaxy For Life Science” (GFLS) is to bring to several french communities the recent good practices established by the galaxy community developers. The raw material of the project is a set of wrappers and workflows grouped in several use cases: plant science, statistical analysis, livestocks, bacterias.

In this poster we want to present 2 use cases implemented in the project : statistical analysis and plant science. These use cases are interesting because it is possible for the first one to fully apply a set of good practices while we choose to make concessions for the second one. A second interesting result is that we are able to identify a generic and reproducible method of work for any initiative to make tools available under Galaxy, matching with the community's way of working.


Presenters

Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P11: Utility scripts for data manipulation in Galaxy
Authors
Simon Gladman 1*, Madison Flannery 1
1 : Melbourne Bioinformatics, The University of Melbourne  (MelBioinfo), Victoria, Australia
* : Corresponding author


Abstract
Galaxy has an excellent system for the storage, indexing and handling of large sets of reference genome data that are required by various tools. This works particularly well for eukaryotic genomes such as human and mouse etc. However, in the case of micro-organisms there are many thousands of reference genomes. These genomes are quite small and so are very easily indexed on the fly as required. We have developed a set of utility scripts to take bacterial genomes and build galaxy data libraries of them by genus and/or species. The data libraries can be created on a local or remote machine easily. The utility scripts are also capable of taking an arbitrary directory structure and creating data libraries based upon on them.

The second script - galaxy-fuse - is a file system creation script that makes the Galaxy histories of a particular user available on the local file system in a matching directory structure. It uses the Galaxy user's api and bio-blend access to the Galaxy database to name the files. 

This poster describes the rationale, development and distribution of the two tools.



Presenters
avatar for Simon Gladman

Simon Gladman

Bioinformatician, Melbourne Bioinformatics / University of Melbourne


Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P13: Prioritizing SNPs using the Neo4j Galaxy Interactive Environment
Poster

Authors

Lose Thoba 1, Ziphozakhe Mashologu 1, Peter Van Heusden 1*, Alan Christoffels 1*
1 : South African National Bioinformatics Institute  (SANBI)
* : Corresponding author


Abstract
Graph database implementation such as Neo4J are increasingly used within the biomedical research space, eg. disease network underpinned by a protein and metabolic framework. We previously developed a Galaxy datatype and an interactive environment for storing and exploring Neo4j graph databases within Galaxy. Building on this work we generate a M. tuberculosis genomic database from multiple sources of annotation. This database follows a Chado-like schema with graph nodes named according to sequence ontology terms. Thus, making it natural to the researcher to make queries using the Cypher query language. NGS data is processed to yield novel variants that are stored in the database using a schema derived from the GA4GH variant model. Using the resultant Neo4j database and Cypher queries in the context Mycobacterium tuberculosis drug resistance, we able to prioritize SNPs for further experimental investigation of their association with multi-drug resistance in Mtb.

Presenters
PV

Peter Van Heusden

South African National Bioinformatics Institute (SANBI)
avatar for Ziphozakhe Mashologu

Ziphozakhe Mashologu

Developer, UWC - South African National Bioinformatics Institute
South African National Bioinformatics Institute (SANBI)
LT

Lose Thoba

South African National Bioinformatics Institute (SANBI)



Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P15: The MiModD suite of tools for genetic variant mapping and identification
Poster

Authors

Wolfgang Maier 1*, Mark Seifert 1, Katharina Moos, Gregory Minevich, Oliver Hobert 2, Ralf Baumeister 1
1 : University of Freiburg
2 : Department of Biological Sciences, Columbia University
* : Corresponding author


Abstract
MiModD (http://www.celegans.de/mimodd) is a GPLv3-licensed comprehensive tool suite for variant mapping and identification. It extends ideas and concepts found in CloudMap, for which it can serve as a drop-in replacement. The package features: 
  • a fully integrated mapping-by-sequencing analysis pipeline without external dependencies 
  • multisample variant calling and filtering for improved call statistics and straightforward variant identification 
  • NacreousMap linkage analysis and plotting engine with full compatibility, but improvements over CloudMap. 
The current version of MiModD is available from the Galaxy Tool Shed, but we would like to propose its inclusion on the Galaxy main server as a replacement for the deprecated CloudMap suite of tools.

Presenters
WM

Wolfgang Maier

University of Freiburg
University of Freiburg



Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P17: ProteoRE, a Galaxy-based infrastructure for interpreting and exploring mass spectrometry-based proteomics data
Authors
Lien Nguyen  1*  , Maud Lacombe  1  , Sandra Dérozier  2  , Lisa Perus  1  , Olivier Rué  2  , Florence Combes  1  , Christophe Caron  2  , Virginie Brun  1  , Valentin Loux  2  , Yves Vandenbrouck  1  
1 : Institut de Biosciences et Biotechnologies de Grenoble  (BIG), Université de Grenoble, Commissariat à l'Énergie Atomique et aux Énergies Alternatives (CEA) - Grenoble, Centre National de la Recherche Scientifique : FR3425
17, rue des Martyrs 38054 Grenoble cedex 9 -  France
2 : Unité Mathématiques et Informatique Appliquées du Génome à l'Environnement  (MAIAGE), Institut National de la Recherche Agronomique, Université Paris-Saclay
Domaine de Vilvert 78350 Jouy en Josas Cedex -  France
* : Corresponding author


Abstract
Introduction
Galaxy is a well-maintained software platform providing simple interfaces to tools and online access to computational resources in a transparent way. Build upon Galaxy, the ProteoRE (Proteomics Research Environment) project is a joint effort between the French proteomics infrastructure (ProFI) and the French bioinformatics Institute (IFB). Its primary aim is to centrally provide the proteomics community with an online research service enabling biologists/clinicians without programming expertise to explore their proteomics data through the Web in a reproducible manner. 

Methods
Starting from proteome software output files (MaxQuant, Proline), various components have been designed driven by expertise and needs from our collaborators. These modules embedded into Galaxy components have been implemented either by reusing tools (from the Galaxy Tool Shed) or by wrapping Bioconductor packages and external code, and further beta-tested. 

Results
We have set up two use cases scenarios derived from our own research projects to interpret a large proteins identification list and to entail selection of biomarkers candidates based on biochemical criteria. A first ProteoRE instance is now deployed and currently in beta-testing before public release in early 2018.

Conclusions
While Galaxy-based tools offers services for primary proteomics data analyses (e.g. MS data conversion, protein database tools, search algorithms), tools focusing on downstream analysis are still lacking. The ProteoRE platform proposes to fill this gap with the hope of promoting proteomics data in the Life Science community.

Presenters
LN

Lien Nguyen

Institut de Biosciences et Biotechnologies de Grenoble (BIG)


Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P19: A reproducible data analysis environment for next-generation sequencing on public cloud computer
Authors
Manabu Ishii 1, Matsushima Akihiro 1, Mika Yoshimura 1, Hiroki Danno 1, Itoshi Nikaido 1

1 : RIKEN ACCC Bioinformatics Research Unit, 2-1 Hirosawa, Wako, Saitama 351-0198 -  Japan

Abstract
With the progress of DNA sequencing methods, it continues to increase a quantity of data and type of data to be produced. To analyze such data, we need massive computer resources and setup of various software and databases. Many data-analysis techniques and databases are constantly developed. Accordingly, it takes plenty of time and works to construct an analysis environment, such as procurement of computers, installation of software, and construction of data analysis pipelines.

To cope with both reproducibility and flexibility of the environment, we develop a Docker image with-in Galaxy, job scheduler, and data-analysis pipeline. We also construct a deployment system of the Docker image on a public cloud system such as Microsoft Azure. The procedure of deployment is implemented by source codes using Chef (Infrastructure as Code). The cloud computer system automatically expanded and destroyed computing nodes depending on a demand of amount of jobs.

In this presentation, we will discuss the comparison the setting time of environment, cost, reproducibility of the pipeline, calculation speed between on-premise and public cloud system. We also demonstrate that the system is constructed from a web browser conveniently. Using this system, we have operated an analysis environment for thousands of single-cell RNA-sequencing in our laboratory. The system including data-analysis pipeline has been tested its idempotence with continuous integration / continuous delivery way.


Presenters
MI

Manabu Ishii

Technical Staff, RIKEN ACCC Bioinformatics Research Unit
RIKEN ACCC Bioinformatics Research Unit


Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P21: Deployment of genome databases for insects using Galaxy Genome Annotation
Authors
Anthony Bretaudeau 1*
1 : Institut de Génétique, Environnement et Protection des Plantes  (IGEPP), Institut National de la Recherche Agronomique : UMR1349, Universite de Rennes 1 : UMR1349, Agrocampus Ouest : UMR1349Domaine de la Motte au Vicomte BP 3532735653 Le Rheu -  France
* : Corresponding author


Abstract
BIPAA is an INRA bioinformatics platform located in Rennes (France). It is dedicated to insect genomics.

BIPAA is the home of several reference genome databases: AphidBase (aphids), LepidoDB (lepidopterans) and ParWaspDB (parasitoïd wasps).

For each genome, a collection of web applications (including JBrowse, Tripal or Apollo from GMOD project) allows users to explore reference assemblies and annotations (e.g. genome browser, gene reports), to analyze them (e.g. dedicated Galaxy server), and to collect new scientific knowledge (e.g. manual curation using Apollo).

As the number of hosted genomes is quickly increasing, the need for an automatic, flexible, and scalable hosting architecture imposed itself. As a result, we set up a new system, based on the developments from the Galaxy Genome Annotation project (GGA, https://github.com/galaxy-genome-annotation).
Making a new genome available is now as easy as launching a collection of pre-configured dockerized web applications, and loading the reference data (genome, annotation) using a dedicated GGA Galaxy flavor instance.

This constitutes a demonstration of GGA for rapid, flexible and reproducible deployment of information systems for non-model organisms, where Galaxy is used as an orchestrator for data management.

In this poster we will expose the architecture of this new system as used on BIPAA (10 genomes already online). We will also highlight the specific developments that were done, and present the planned features for the coming months.

All developments are available under a free license (MIT or AGPL) and were contributed to the GGA or GMOD GitHub repositories.



Presenters
avatar for Anthony Bretaudeau

Anthony Bretaudeau

INRA, BIPAA and GenOuest platforms
I work on insect genomics: analysis tools in galaxy and the galaxy genome annotation project | I also like to talk about bunnies and other fun animals.


Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P23: Detecting somatic de-novo Transposable Element insertions with Galaxy
Poster

Authors

Marius Van Den Beek 1, Nick Riddiford, Katarzyna Siudeja, Allison Bardin

1 : Institut Curie, Institut Curie26 rue dÚlm 75248 PARIS CEDEX 05 -  France


Abstract
Transposable Elements (TEs) are mobile genetic elements present in most eukaryotic genomes. By mobilizing to new genomic locations (transposition), transposable elements can alter gene regulatory networks by mutating genes or gene-regulatory elements. Alternatively transposition can alter expression of genes through local epigenetic changes at sites of new transposition events. Transposition therefore plays an important role in evolution and disease, though the frequencies and effects of somatic transposition is not well understood. Detecting and quantifying somatic transposition from current generation short read sequencing technologies is challenging, since most somatic transposition events are rare and will be sparsely sampled when sequencing tissues, which are composed of a pool of cells with heterogenous transposition events.

Here we describe a Galaxy Workflow that can detect germline and somatic transposition events and intersect discovered transposition events with other structural variants. The workflow accurately quantifies the evidence for and against a transposition event from multiple samples and helps researchers evaluate the extent of somatic transposition that can be detected. We have applied this workflow to 19 tumor-normal pairs of Drosophila Melanogaster, where males frequently lose an X-linked tumor suppressor gene (Siudeja et al., Cell Stem Cell, 2015). This leads to the formation of a clonally expanded cell-mass. Using our workflow we can (1) detect an overall increase of somatic transposition as compared to a control tissues, (2) detect new transposable element insertions at the breakpoints of deletion at the tumor suppressor gene.

This demonstrates theimportance of whole-genome sequencing approaches for detecting mutations in diseases.

Presenters
avatar for Marius van den Beek

Marius van den Beek

Institut Curie, Paris



Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P25: Galaxy and the full spectrum of needs in a small-scale cancer study
Authors
Christin Lund-Andersen 1, Ståle Nygård 2,3, Stein Larsen 4, Eivind Hovig 1,3  , Kjersti Flatmark 1,3,4  

1 : Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
2 : Bioinformatics Core Facility, Institute for Medical Informatics, Oslo University Hospital.
3 : University of Oslo, Oslo, Norway
4 : Department of Gastroenterological Surgery, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway


Abstract
In small research groups with limited bioinformatics capabilities, there is a need for analyzing various forms of sequencing data in a simple, comprehensive, time and cost effective manner. Galaxy potentially offers the ideal solution for such a situation, providing standardized workflows for a variety of analytical scenarios in a unified framework. However, in a given institutional context, there may be a variety of practical obstacles.

We will in the poster present analytical needs and practical obstacles faced in a concrete small-scale research project concerning a patient with peritoneal mesothelioma that responded extremely well to chemotherapy but needed additional treatment options. We performed exome, RNA and miRNA sequencing on tumor samples, and ran a variety of bioinformatics analyses, e.g. detection of somatic variants, copy number variation, mutational signatures, and differential gene and miRNA expression. While the needed bioinformatics methods are available in public Galaxy server(s), one is still dependent on experience with all involved methodologies to make appropriate choices of methods and parameters according to the specifics of the data and research questions. The required competence is available locally through collaboration with various bioinformaticians, where only some offer to provide their analyses through a (public or local) Galaxy platform. This may be due to personal inclinations or technical challenges like running Galaxy within a secure environment. The poster will present the analyses performed for the case, and argue that to truly reach the potential for simplification and streamlining, it is crucial that Galaxy can be used for all involved bioinformatics analyses.

Presenters
CL

Christin Lund-Andersen

Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway


Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P27: ASaiM: a Galaxy-based framework to analyze raw sequencing data from microbiota
Authors
Bérénice Batut 1*, Clémence Defois 2, Kévin Gravouil 2,3,4, Jean-François Brugère 2, Eric Peyretaillade 2, Pierre Peyret 2

1 : Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany
2 : Microbiologie Environnement Digestif Santé - Clermont Auvergne  (MEDIS), INRA Clermont-Ferrand-Theix : UMR0454, Université Clermont Auvergne : UMR0454INRA, Centre de Recherches de Clermont-Ferrand - Theix / 63122 Saint-Genès Champanelle -  France
3 : Microorganismes : Génome et Environnement - Clermont Auvergne  (LMGE), Université Clermont Auvergne : UMR6023, Centre National de la Recherche Scientifique : UMR6023Campus des Cézeaux / 24, avenue des Landais BP 80026 / 63170 Aubière -  France
4 : Laboratoire dÍnformatique, de Modélisation et dÓptimisation des Systèmes - Clermont Auvergne  (LIMOS)SIGMA Clermont, Université Clermont Auvergne : UMR6158, Centre National de la Recherche Scientifique : UMR6158ISIMA / Campus des Cézeaux BP 10025 / 63173 AUBIERE cedex -  France* : Corresponding author


Abstract
With the development of the new generation of sequencing platforms and the development of metagenomics and metatranscriptomics technologies, complex microorganism communities can be more easily studied. Numerous bioinformatics tools are available for such purpose but analyses of microbiota data remain difficult: numerous different bioinformatics tools must be combined to extract useful information. Indeed, modular, accessible and sharable user-friendly tools are then urgently needed to conduct efficient analyses to explore microbiota data. We developed then ASaiM, an open-source Galaxy-based framework dedicated to microbiota data analyses.

ASaiM provides an expertly selected collection of tools to exploit and visualize taxonomic and functional information from raw amplicon, metagenomic or metatranscriptomic sequences. To help the analyses, several (customizable) workflows are included. The main workflow has been tested on two mock metagenomic datasets with controlled communities. More accurate and precise taxonomic analyses and more informative metabolic description have been obtained compared to EBI metagenomics' pipeline on the same datasets.

The available workflows are supported by tutorials and Galaxy interactive tours to guide the users through the analyses. Furthermore, an effort on documentation of ASaiM, its tools and workflows has been made (http://asaim.readthedocs.io/).

Based on the Galaxy framework, ASaiM offers sophisticated analyses to scientists without command-line knowledge, while emphasizing reproducibility, customization and effortless scale up to larger infrastructures. ASaiM is implemented as Galaxy Docker flavour and can be easily extended with additional tools or workflows. ASaiM provides then a powerful framework to easily and quickly exploit microbiota data in a reproducible and transparent environment.

Presenters
avatar for Bérénice Batut

Bérénice Batut

Post-doc, University of Freiburg


Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P29: Developing a bioinformatics pipeline for optimization of sperm epigenome analysis in mice and men to be used for the identification of epigenetic signatures in sperm associated with environmental perturbation.
Poster

Authors

Deepak Tanwar 1, Romain Lambrot 1, Keith Siklenka  2, Mahmoud Aarabi 3, Donovan Chan 3, Clifford Librach 4, Sergey Moskovtsev 4, Jianguo Xia 1,6, Jacquetta Trasler 3, Sarah Kimmins 1,2

1 : Department of Animal Science, McGill University, Ste Anne-de-Bellevue, QC -  Canada
2 : Department of Pharmacology and Therapeutics, McGill University, Montreal, QC -  Canada
3 : Research Institute of the McGill University Health Centre at the Montreal Children's Hospital, Montreal, QC -  Canada
4 : Department of Obstetrics and Gynaecology, University of Toronto, Toronto, ON -  Canada
5 : Institute of Parasitology, McGill University – Montreal, QC, Canada 



Abstract
Sperm has a unique chromatin conformation with the majority of somatic histones being replaced with protamines. Thus, unlike a typical ChIP-seq profile generated from targeting a histone modification there are fewer histone peaks and these tend to be distributed over CpG (5'-C-phosphate-G-3') enriched regions. Effects of the paternal environment including stress, diet and toxicants have been linked to negative outcomes for offspring including birth defects and increased risks for complex diseases. These paternal effects may occur via non-genetic inheritance, through epigenetic mechanisms including DNA methylation, post-translational modifications of histones and noncoding RNAs. We hypothesize that, the sperm epigenome in men, specifically histone methylation, can influence offspring development and health. The challenges in analyzing and quantitating ChIP-seq data from sperm with currently available software is the ability to detect and quantify differences not just in peak enrichment but also the broad domains. Our objective is to develop the most suitable bioinformatics pipeline for semi-quantitative/quantitative comparison of histone methylation levels in sperm from fertile and infertile, men of varying folate status, BMI and toxicant exposures. To perform an optimal pre-processing and to address other challenges in data analysis, I am developing an efficient bioinformatics pipeline for analyzing sperm epigenome data, by using currently available tools (Bowtie2, Trimmomatic, Picard tools, MACS2, etc.), to address the challenges of identification of the most reliable peak calling method with appropriate parameters while taking into account the unique chromatin configuration in sperm.

Presenters
avatar for Deepak Tanwar

Deepak Tanwar

Graduate Research Assistant, McGill University
Department of Animal Science, McGill University



Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

P31: In silico toxicity assessment in cultured rat intestinal cells deduced from cellular impedance measurements and transcriptomics data
Poster

Authors

Pooja Gupta 1,2,*, Soroush Sharbati 3, Ralf Einspanier 3, Christof Schuette 1,2, Annika Gramatke 3, Max von Kleist 1, Jutta Sharbati 3

1 : Institute of Mathematics, Freie Universität Berlin
2 : Mathematics for Life and Materials Sciences, Zuse Institute Berlin
3 : Institute of Veterinary Biochemistry, Freie Universität Berlin
* : Corrsponding author

Abstract
Predicting adverse effects of chemicals with respect to their distinguishing temporal and dose-dependent mode of action denotes a formidable challenge in toxicological research. In this project, we addressed the aforementioned problems by developing a framework for predicting the toxicity of chemicals using real-time cell impedance measurements and transcriptomics data (mRNA and miRNA) obtained from rat intestinal cell line IEC-6.

Real-time cell impedance measurements provide a novel, fast and cost effective in vitro method that may be used to probe compounds' toxicity. In order to improve the interpretability and usability of this experimental method, we developed a mathematical model for quantifying the cytotoxicity of chemicals (in terms of its 50% inhibitory concentration IC50 and cell growth rate). Estimated IC50 values for the apparently toxic compounds were in good agreement with literature knowledge. Furthermore, a computational analysis of the transcriptomics data was undertaken to identify the ‘molecular mechanism of action' of the test chemicals. Our analysis showed that 2-acetylamino-fluorene and benzo-[a]-pyrene were severely genotoxic, with several genes differentially expressed even at very low doses.

Taken together, by developing a foundation work for modeling impedance measurements and analyzing transcriptomics datasets for predicting cytotoxicity, our study contributed to the goal of improving in vitro strategies for genotoxicity testing.

Presenters
avatar for Pooja Gupta

Pooja Gupta

Post doc, Freie Universität Berlin



Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

15:40

D01: Integration of Linked Data into Galaxy using AskOmics
Authors
Xavier Garnier 1,2, Olivier Dameron 2, Olivier Filangi 1,3, Fabrice Legeai 1,4, Anthony Bretaudeau 1,3

1 : Institut de Génétique, Environnement et Protection des Plantes  (IGEPP), Institut National de la Recherche Agronomique : UMR1349, Universite de Rennes 1 : UMR1349, Agrocampus Ouest : UMR1349, Domaine de la Motte au Vicomte BP 3532735653 Le Rheu -  France
2 : DYLISS  (INRIA - IRISA)INRIA, Universite de Rennes 1, CNRS : UMR6074, Campus de Beaulieu 35042 Rennes cedex -  France
3 : Plateforme bio-informatique Genouest  (Genouest), genouestIRISA/INRIA Plateforme GenOuest Campus de Beaulieu 35042 Rennes cedex, France -  France
4 : GenScaleI, RISA/INRIA équipe Genscale Campus de Beaulieu 35042 Rennes cedex, France -  France


Abstract
Experiments that produce large amounts of data are now routinely performed at the scale of a laboratory. Results of these experiments are stored into files or loaded into relational databases. However, analyzing these data typically requires to combine them with other data and knowledge bases.

Semantic Web technologies such as RDF and SPARQL are key elements for combining datasets, which has led to the emergence of linked data.

AskOmics (AGPL, https://github.com/askomics/) is a web application supporting both intuitive data integration and querying. It provides a user-friendly interface to hide the technical difficulties inherent to the use of RDF and SPARQL technologies.

AskOmics automatically integrates tabular or GFF files and generates the corresponding RDF triples. It also provides a visually intuitive graph-based querying interface. Complex queries are built by selecting a sequence of nodes in a graph, and defining data filters. The corresponding SPARQL query is generated by AskOmics and executed on the selected datasets.

In this demo we will present the integration of AskOmics with Galaxy: Galaxy tools to integrate data into a remote AskOmics server (using Askocli, AGPL, https://github.com/askomics/askocli), and AskOmics ability to retrieve datasets from Galaxy and export results into it (using BioBlend). This opens the door to an Interactive Environment currently under final development.
The compatibility of AskOmics with Galaxy allows users to connect their pipelines to AskOmics in a simple way and bridge the gap between Galaxy and the Semantic Web.

Presenters
avatar for Xavier Garnier

Xavier Garnier

INRA, IGEPP and Dyliss Team
I am working on AskOmics, a web tool to integrate and query biological data using Semantic Web technologies, and its interaction with Galaxy



Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

D03: De novo transcriptome annotation visualization and filtering
Authors
Eduardo De Paiva Alves 1

1 : Centre for Genome-Enabled Biology and Medicine, University of Aberdeen  (CGEBM), 23 St Machar Dr, Aberdeen, UK AB24 3UU -  United Kingdom

Abstract
De novo transcriptomes lack genome level coordinates and are frequently fragmented or contain chimeric sequences or multiple copies of the same region. These issues pose several difficulties for annotation, since protein predictions may appear in multiple contigs which correspond to the same transcript, may be fragmented over multiple contigs, or one single contig may have more than one protein prediction in the case of chimeric sequences. We present a Galaxy tool which allows for visualization of annotation predictions from the prediction sequence and allow the user to select the contigs that best match the predictions and export the filtered results.

Presenters
ED

Eduardo De Paiva Alves

Centre for Genome-Enabled Biology and Medicine, University of Aberdeen (CGEBM)


Thursday June 29, 2017 15:40 - 17:00
Le Corum Le Corum

15:40

17:00

The Institute of Computational Biology: an overview
Slides

Authors

Eric Rivals, Institute of Computational Biology

Abstract
Core biological objects, such as DNA, RNA, proteins or epigenomic marks are modeled as discrete structures (e.g., sequence, structures, motifs, etc), and their interactions are viewed as discrete connections that formed important networks (e.g., the regulatory network where genes are logically connected with the genes that control their expression through binding at a precise location in their promoter or enhancer region). Thus, many biological systems and processes can be investigated through computational methods, which intrinsically target digital structures. The development of life sciences more and more requires the design of computational solutions (algorithms, models, data structures, etc). To match this need, the IBC was launched in 2012 to foster and catalyze multidisciplinary research on computational approaches for life sciences. The IBC gathers more than 100 researchers from 13 institutions of Montpellier area (and beyond), at the nexus of life sciences, computer science, and mathematics. 

The wealth of data acquired in biology projects using recent technologies, like high throughput sequencing or imaging techniques, also demands for efficient and scalable bioinformatic tools.

Hence, IBC researchers aim at designing algorithms, pipelines, models, inference methods, databases for tackling key computational questions at large scales such as image processing, sequence analysis, evolutionary inferences, structural modeling or data integration. Computational solutions are then applied to investigate multiple biological issues: like the control of gene expression, genome assembly, evolutionary or developmental questions, related to a wide variety of biological models from HIV to worms, algae, pathogens, plants (like rice) or other eukaryotic species.

In this presentation, i will give an overview of some computational developments and biological applications that have been investigated in the last five years at the IBC.

IBC is accredited and supported by eight national trustees: CNRS, INRA, INSERM, INRIA, IRD, CIRAD, SupAgro, and of course the Université de Montpellier, which hosts the IBC in a building of St Priest Campus.

Presenters


Thursday June 29, 2017 17:00 - 17:30
Einstein Auditorium Le Corum, Level 0

17:00

Session 4
Moderators
avatar for Bérénice Batut

Bérénice Batut

Post-doc, University of Freiburg

Thursday June 29, 2017 17:00 - 18:10
Einstein Auditorium Le Corum, Level 0

17:30

Improving Your Text Life
Slides

Authors

James Johnson 1

1 : University of Minnesota Supercomputing Institute  (MSI), 599 Walter Library, 117 Pleasant St. SE Minneapolis, MN 55455 -  United States

Abstract
The Query Tabular Galaxy tool loads any number of tabular datasets into a new or existing SQLite database allowing the full power of a SQL query to produce a new tabular output. Long, complicated workflows of Galaxy text manipulation tools can be replaced by Query Tabular in a single step.

The Query Tabular tool provides default names for tables: t1, t2, etc. and columns: c1, c2, etc., but a user can specify meaningful names for tables and columns. When specifying names for columns, a user can choose to load only those columns that are given names.

Regex functions are added to sqlite connections so that re.search, re.match, and re.replace functions are available for use in the SQL query.

Line filters can be applied while reading tabular input files to include, exclude, or modify lines before entering the values as rows in the database table. A column replace line filter can use a regex to change a date value to the SQLite recognized format. A normalize filter can convert list fields in the input to first normal form with an individual list item per row.



Presenters
avatar for James Johnson

James Johnson

Minnesota Supercomputing Institute, University of Minnesota



Thursday June 29, 2017 17:30 - 17:36
Einstein Auditorium Le Corum, Level 0

17:30

Lightning Talks Thursday
Moderators
avatar for Bérénice Batut

Bérénice Batut

Post-doc, University of Freiburg

Thursday June 29, 2017 17:30 - 18:10
Einstein Auditorium Le Corum, Level 0

17:36

Galaxy-E : Towards an accessible, reproducible and transparent data analysis and management universe dedicated to Ecology
Slides

Authors

Valentin Chambon, Thimothée Virgoulay, Eloïse Trigodet, Marie Delannoy, Marianne Linares, Grégoire Loïs, Romain Julliard, Yvan Le Bras 1, *

 1 : MNHN CESCO & Staiton de Biologie Marine de Concarneau, Muséum National d'Histoire Naturelle (MNHN)
 * : Corresponding author


Abstract
Reproduce and compare process' results is the base of all science fields. Have access to the workflow should be the easiest things to do when you want to reproduce analysis from papers. But today it's one of the most critical point in science's domains. If things are going better in Geographical Information System (GIS) related domains thanks to initiatives like INSPIRE directive or Bioinformatics, due notably to community efforts, state in Ecology is still in a critical situation.

Many points can be related ; macro-ecology is too changing from a case to another. Regular workflow do not exist, as in omics data analysis. Furthermore after data pre-processing steps, you need to draw quick models to optimize your workflow, so it can't be automatized easily.
In the scope of the “65 Millions observers” a French national project, we are directly confronted to these issues. We have to implement a national collaborative web platform dedicated to macro-ecological data access and analysis. We aim to facilitate and enhance participation to citizen science projects.
We assume that Galaxy is adapted to our problem. Recent evolution of Galaxy allowing us to dream, especially “interactive environment” functionality. Moreover, macro-ecologists are not really working directly on databases but from database extraction files (csv, tabular flat files), and some bioinformaticians seem to be ready to work as ecoinformaticians. This paves the way to the emergence of a Galaxy-E universe, with dedicated tools and communities.




Thursday June 29, 2017 17:36 - 17:42
Einstein Auditorium Le Corum, Level 0

17:42

The Galaxy Gateway on F1000Research
Slides

Authors

Holly Murray 1

1 : F1000, Science Navigation Group, Middlesex House, 34-42 Cleveland St -  United Kingdom


Abstract
The publishing platform F1000Research offers the Galaxy research community their own publishing gateway within F1000Research, where authors can decide which research outputs (articles, posters, and slides) they wish to share and when, without editorial censorship. All articles are openly and formally peer-reviewed by invited expert referees, post-publication. Authors can revise their articles as many times as they wish, creating new, independently citable but linked versions. What's more, publishing a Software tool article is an excellent way to get credit for what you have created; the F1000Research platform, with its article versioning system, support for LaTeX submissions and proper syntax highlighting, is particularly well-suited to publishing Galaxy tools and workflows.

Presenters


Thursday June 29, 2017 17:42 - 17:48
Einstein Auditorium Le Corum, Level 0

17:48

GalaxyCat - The online catalog of Galaxy tools across instances
Slides

Authors

Julien Seiler 1*, Stéphanie Le Gras 1,

1 : Institut de Génétique et de Biologie Moléculaire et Cellulaire  (IGBMC), CNRS : UMR7104, Inserm : U964, université de StrasbourgParc D'Innovation 1 Rue Laurent Fries - BP 10142 67404 ILLKIRCH CEDEX -  France
* : Corresponding author

Abstract
Since the creation of Galaxy many generalist or thematic instances have been opened to the community throughout the world. However, when it is desired to use a particular tool, it is often difficult to determine which instance to use.

The GalaxyCat is an online catalog that lists all the tools available on various Galaxy instances and thus allows through a simple web interface to quickly find on which instances a tool is usable.

The GalaxyCat package includes all scripts to automatically feed the catalog database through the command line and the web application interface.

http://galaxycat.france-bioinformatique.fr

Presenters
SL

Stéphanie Le Gras

Institut de Génétique et de Biologie Moléculaire et Cellulaire | CNRS : UMR7104, Inserm : U964, université de Strasbourg | Parc D'Innovation 1 Rue Laurent Fries - BP 10142 67404 ILLKIRCH CEDEX - France
avatar for Julien Seiler

Julien Seiler

Head of IT, IGBMC



Thursday June 29, 2017 17:48 - 17:54
Einstein Auditorium Le Corum, Level 0

17:54

Analyzing Relationships among Clinical Factors of Cancer Patients by Regression Analysis
Slides

Authors:

Sugandima Vidanagamachchi 1, Thamara Waidyarathna 1 *

 1 : Department of Computer Science  (University of Ruhuna)
 * : Corresponding author


Abstract
Regression analysis is one of the machine learning techniques that can be utilized in analyzing relationships among the datasets in randomized trials or experiments. Cancer has made a critical impact on the society and numbers of cancer patients are growing day by day. The analysis of their clinical and genetics data in terms of different parameters/factors is required as it will provide insights into drug therapies and diagnosis of the disease. In this work, we have found some significant relationships among different parameters such as age, bone marrow blast percentage, overall survival months of Acute Myeloid Leukemia and Merged Cohort of LGG and GBM clinical datasets of the cBioPortal database after applying simple and multiple regression analysis.

Presenters
SV

Sugandima Vidanagamachchi

Department of Computer Science, University of Ruhuna, Sri Lanka



Thursday June 29, 2017 17:54 - 18:00
Einstein Auditorium Le Corum, Level 0

18:00

Birds-of-a-Feather (BoF) Flocking Session 2
Birds of a Feather are informal gatherings of people that are interested in a common topic.  

If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.

And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.

Thursday June 29, 2017 18:00 - 19:00
Le Corum Le Corum

18:05

GalaxyAdmins BoF

BoF Live Notes Document 

GalaxyAdmins
 is a group of people that are responsible for administering Galaxy instances.  We meet online and at events like GCC2017, where a lot of us happen to be.  

GCC2017 will be the fifth in-person GalaxyAdmins meetup. Previous GalaxyAdmins BoFs were very well attended and have resulted in several action items, many of which have since been implemented.

This meetup will discuss plans for the coming year, GalaxyAdmins leadership, and whatever else participants want to talk about.


If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.


And
, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.


Moderators
avatar for Dave Clements

Dave Clements

Training and Outreach Coordinator, Galaxy Project, Johns Hopkins University

Thursday June 29, 2017 18:05 - 18:50
Lobby, Level 0 Level 0, Lecorum

18:05

Genome Annotation in Galaxy BoF

Slides

Discuss genome annotation pipelines in Galaxy. Integration of tools, current / future offerings, needs of biologists. (https://galaxy-genome-annotation.github.io/).


Moderators
avatar for Anthony Bretaudeau

Anthony Bretaudeau

INRA, BIPAA and GenOuest platforms
I work on insect genomics: analysis tools in galaxy and the galaxy genome annotation project | I also like to talk about bunnies and other fun animals.
avatar for Nathan Dunn

Nathan Dunn

Lead Software Engineer, Lawrence Berkeley National Laboratory
Lawrence Berkeley National Lab (LBNL)
avatar for Björn Grüning

Björn Grüning

University of Freiburg
avatar for Eric Rasche

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology @ Texas A&M University

Thursday June 29, 2017 18:05 - 18:50
BoF Space 1, Antigone Room Antigone Room, Level 2

19:00

Gala Dinner
Located halfway between Montpellier and Nîmes, the Château de Pouget (12th century, late 18th century) offers you authenticity, atmosphere and elegance.


Thursday June 29, 2017 19:00 - 23:30
Chateau du Pouget Chemin des Brus 34400 VERARGUES
 
Friday, June 30
 

08:00

Conference desk open
Friday June 30, 2017 08:00 - 18:00
Le Corum Le Corum

09:00

Session 5: Galaxy for Life Science and Medicine
Moderators
avatar for Christophe Antoniewski

Christophe Antoniewski

Head of ARTbio bioinformatics, CNRS - Institut de Biologie Paris Seine
Institut de Biologie Paris-Seine (IBPS)

Friday June 30, 2017 09:00 - 10:40
Einstein Auditorium Le Corum, Level 0

09:10

Personalized Analysis of Cancer Data: From Genes to Pathways (and Back)
Slides w/ animation (as PDF)

Abstract:

Dr. Eytan Domany will cover his work in human genomics and computational biology principally in the cancer domain. Eytan Domany is a Professor of Computational Systems Biology in the Department of Physics and Complex Systems at the Weizmann Institute of Science in Rehovot, Israel. The main focus of his group is to mine data from large-scale experiments in biology. The work includes development of mathematical methods, their implementation in algorithms (which are incorporated in user-friendly computational tools), which are then applied to study biological data. Eytan was awarded the Sergio Lombroso Award in Cancer Research in 2016 for his research efforts.

Presenters
ED

Eytan Domany

Professor of Computational Systems Biology in the Department of Physics and Complex Systems at the Weizmann Institute of Science in Rehovot, Israel. The main focus of his group is to mine data from large-scale experiments in biology. The work includes development of mathematical... Read More →



Friday June 30, 2017 09:10 - 10:00
Einstein Auditorium Le Corum, Level 0

10:00

Comparing hundreds of RNA-seq libraries using Galaxy and SeqResults
Slides (web)

Authors

Brad Langhorst, New England Biolabs (NEB)


Abstract
Transcript abundance, aligment rate, Spike-in abundance, 5'-3' coverage, GC distribution and library diversity are all important quality factors in the context of RNA-seq experiments. We describe the infrastructure in Galaxy to assess library performance across these dimensions, the tools we use to aggregate results in a relational database, and data interpretation methods. Interactive visualization features allow bench scientists to correlate metadata with results, compare transcript levels, select and examine individual transcript properties from the hundreds of thousands of available results. 

Presenters
avatar for Brad Langhorst

Brad Langhorst

New England Biolabs (NEB)



Friday June 30, 2017 10:00 - 10:20
Einstein Auditorium Le Corum, Level 0

10:20

Approaches for small RNA-seq analysis in Galaxy
Slides

Authors

Mallory Freeberg, Biology Department, Johns Hopkins University (JHU)
James Taylor, Biology Department, Johns Hopkins University (JHU)


Abstract
Small RNAs are short, non-coding RNA molecules that play conserved roles in regulating gene expression across all metazoans. These ~20-50nt RNAs typically form sequence-specific RNA-RNA interactions with protein-coding mRNAs to regulate translation rates, promote RNA degradation, and maintain inheritence of epigenetic markers. To study small RNA populations on a global scale, second generation deep sequencing technologies are used to sequence and identify individual small RNAs among various cellular, genetic, and environmental contexts. Here, I will describe related Galaxy pipelines developed for the analysis of small RNA-seq datasets in the nematode Caenorhabditis elegans. These pipelines cover processing raw sequencing data, aligning reads to genome or small RNA references, classifying small RNA subcategories based on various features, and testing for differential expression of small RNAs. In addition to outlining the steps of small RNA-seq analyses, I will also discuss how the pipelines can be optimized for analyzing specific subclasses of small RNAs, which sometime require specialized parameters based on the underlying small RNA biology.

Presenters
avatar for Mallory Freeberg

Mallory Freeberg

Galaxy Project, Johns Hopkins University
Johns Hopkins University



Friday June 30, 2017 10:20 - 10:40
Einstein Auditorium Le Corum, Level 0

10:40

Break
Friday June 30, 2017 10:40 - 11:10
Le Corum Le Corum

11:10

COMBAT TB: an integrated environment for Tuberculosis data analysis and surveillance
Slides

Authors

Peter Van Heusden, South African National Bioinformatics Institute (SANBI)
Thoba Lose, South African National Bioinformatics Institute (SANBI)
Ziphozakhe Mashologu, South African National Bioinformatics Institute (SANBI)
Alan Christoffels, South African National Bioinformatics Institute (SANBI)


Abstract
Tuberculosis (TB), caused by the M. tuberculosis bacteria, continues to be one of the leading causes of morbidity and mortality in sub-Saharan Africa. TB surveillance in Africa requires computational tools at the sites where patients are being treated to facilitate rapid analyses of mycobacterial genomes. In 2016, under the auspices of the COMBAT TB project we have developed an integrated environment (the COMBAT TB Explorer) that allows researchers to interrogate their in-house data as well as interpret their data in the context of data available in the publicly available annotation of M. tuberculosis. We used the international Global Alliance for Genomic Health (GA4GH) variant model to implement genetic variant storage in Neo4J. Users can import variant collections and associated phylogenetic trees from Galaxy. All user data is available from a “My Data” view that allows examination of variants, trees and associate genome browser tracks and lists of genes associated with sequence variation. From the “My Data” view users can select variant collections and (using Galaxy) compute novel trees showing the phylogenetic relationship between samples and variant collections. A task queue allows jobs submitted via the Explorer interface to execute asynchronously, with the user notified when new data is available for viewing. The COMBAT TB environment (including the Explorer and Galaxy server environment) is available as a Docker container, with all code available on Github.

Presenters
PV

Peter Van Heusden

South African National Bioinformatics Institute (SANBI)



Friday June 30, 2017 11:10 - 11:30
Einstein Auditorium Le Corum, Level 0

11:10

11:30

The RNA Workbench - reproducible, transparent, and accessible RNA research
Slides

Authors

Florian Eggenhofer, University of Freiburg (ALU)
Jörg Fallmann, University of Leipzig
Björn Grüning, University of Freiburg (ALU), Center for Biological Systems Analysis (ZBSA)  
Bérénice Batut, University of Freiburg (ALU)  
Peter Stadler, University of Leipzig, Institute for Theoretical Chemistry (TBI), Max Planck Institute for Mathematics in the Sciences (MPI)  
Rolf Backofen, University of Freiburg (ALU), Center for Biological Systems Analysis (ZBSA), Centre for Biological Signaling Studies [Freiburg] (BIOSS)


Abstract
RNA centric research is of growing importance for medicine and molecular biology. Increasing amounts of data from deep sequencing experiments create a demand for automatic analysis and interpretation solutions.

The RNA-Workbench offers a wide range of tools covering classic RNA-bioinformatics as well as RNA-seq fields. Predefined workflows for the annotation of non-coding RNAs or identification of differentially expressed genes are subsets of over 50 included tools from the categories RNA structure analysis, RNA alignment, RNA annotation, RNA-protein interaction, ribosome profiling, RNA-seq analysis and RNA target prediction. RNA specific visualisation solutions for dot-bracket plots and secondary structures are part of the workbench.

In contrast to pre-existing solutions, our community driven approach allowes us to include classic RNA-bioinformatics tools often with the direct support of the tool-authors. These contributions enable us to provide excellent documentation, training material and interactive tours demonstrating the functionality of the workbench.

Building on the Galaxy framework the workbench offers sophisticated analyses to users without command line knowledge, while emphasising reproducibility, customization and effortless scale up to larger infrastructures. The workbench is implemented as Galaxy Docker flavour and therefore easily extendable by additional tools, workflows, tours or training data, that can be installed from the Galaxy ToolShed. The workbench will be further improved and maintained in an ongoing community effort.


Presenters
avatar for Florian Eggenhofer

Florian Eggenhofer

Postdoc, University of Freiburg (ALU)
University of Freiburg (ALU)



Friday June 30, 2017 11:30 - 11:50
Einstein Auditorium Le Corum, Level 0

11:50

Interaction Hub: a comprehensive Galaxy-based chromatin folding database, browser, and analysis platform

Slides

Authors
Michael Sauria, Department of Biology, Johns Hopkins University
James Taylor, Department of Biology, Johns Hopkins University


Abstract
Chromatin architecture is recognized as an integral component of cellular differentiation, gene regulation, and epigenetic homeostasis. In recent years there has been a acceleration in the production of chromatin interaction data, primarily Hi-C data. Full utilization of the data produced from Hi-C experiments has been challenging for many researchers because of computational limitations, bioinformatics knowledge, and a lack of user-friendly tools. To address these challenges, we have compiled a comprehensive database of Hi-C data, supported by Galaxy, and integrated with analysis and visualization tools allowing truly open access to more than 1,500 Hi-C datasets.

To best enable use of these data, we have created a uniform processing and analysis pipeline, executed using CWL workflows and run in containerized environments. We also have developed quality metrics for Hi-C samples to help evaluate sample quality and replicate reproducibility. Each processing step is made available rather than simply endpoint data, including quality metrics for each phase. Data were all processed using HiFive, a Hi-C analysis suite available on Galaxy main.

We have also created a 2-dimensional genome browser connected to Trackster for easy data exploration within Galaxy. Samples can be directly loaded from the data library into Trackster-2D for visual assessment, comparison to one-dimensional genomic annotations, or for Hi-C inter-dataset comparison. In order to support fast browsing and compact of these sparse datasets, we also have developed a multi-resolution 2-dimensional binary tree file format, allowing easy access to any level of resolution and the random access to data necessary for real-time browsing.


Presenters
MS

Michael Sauria

Johns Hopkins University



Friday June 30, 2017 11:50 - 12:10
Einstein Auditorium Le Corum, Level 0

12:10

Mechanizing our biologists with Galaxy and IRIDA platforms
Slides

Authors

Philip Mabon, Public Health Agency of Canada (PHAC)


Abstract
Galaxy has been in use at Canada's National Microbiology Laboratory, our national hub for infectious diseases in public health, since 2010. Over the last few years, Galaxy has empowered our biologists to run their own large scale analyses with collections and published standardized workflows. Our job submissions for our local Galaxies instances are running an average of 100k jobs per month and rising fast. With the combination of our IRIDA platform—designed for project and sample management as it relates to next-generation sequencing, sample metadata, and data analysis pipelines—our biologists have greater autonomy that allows our bioinformatics core to focus on tool and workflow development.

Through the perspective of a government research environment, we will demonstrate how we transitioned from a biologist request for new tools/workflows to development and deployment process into our local sandbox Galaxy instance. There the workflow is tested and refined until proven to be useful. Then it continues its transition into our IRIDA platform as an actionable pipeline used in real-time for infectious disease surveillance and response using high throughput sequencing data. This overview will also highlight how we can maximize the flexibility of Galaxy to easily evolve with this rapidly evolving field.


Presenters
avatar for Philip Mabon

Philip Mabon

Senior Bioinformatician, Public Health Agency of Canada



Friday June 30, 2017 12:10 - 12:30
Einstein Auditorium Le Corum, Level 0

12:40

Lunch
Friday June 30, 2017 12:40 - 13:45
Le Corum Le Corum

12:40

Arts & Crafts & CTF

Arts & Crafts BoF

GCC sure can be overwhelming sometimes! This BoF is a quiet place to do some stress free, science related, arts and crafts.

Hack the Universe and Capture the Flag!

Saskia and Eric have also put together a Capture the Flag (CTF) event for GCC2017.  Hack the Universe is a completion where you attempt to hack into a Galaxy instance and learn about Galaxy, Galaxy Administration, vulnerabilities, and good security practices.  The CTF competition will start on the first night of GCC2017, and run for a week after.

Anyone! We strongly recommend that you form teams involving at least one bioinformatician or biologist, and one computer person. Get out there, make new friends!

Interested?  See https://ctf.galaxians.org/ for more and visit Saskia and Eric at the Arts & Crafts BoF.    

If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.

And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.


Moderators
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus MC
avatar for Eric Rasche

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology @ Texas A&M University

Friday June 30, 2017 12:40 - 13:45
TBA

12:40

Birds-of-a-Feather (BoF) Flocking Session 3
Birds of a Feather are informal gatherings of people that are interested in a common topic.  

If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.

And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.

Friday June 30, 2017 12:40 - 13:45
Le Corum Le Corum

12:40

Come Together: Exploring Proteomic Analysis within Galaxy BoF

BoF Live Discussion Document

Mass spectrometry-based proteomic analysis is evolving along with new and emerging research disciplines. For example, advances in genomics and transcriptomics have offered opportunities in multi-omics exploration of biological systems. In particular, conversion of RNASeq data into protein FASTA format greatly aids the field of proteogenomics. Moreover, functional microbiome research is being greatly helped by newer developments in metaproteomics research. There is a need for improved Galaxy workflows for these multi-omics research areas and emerging methods such as data-independent analysis. The GCC offers a great forum for community to come together and discuss challenges, opportunities and possibilities.


Moderators
avatar for Timothy Griffin

Timothy Griffin

Center for Mass Spectrometry and Proteomics, University of Minnesota
avatar for Pratik Jagtap

Pratik Jagtap

Research assistant Professor, University of Minnesota
Pratik Jagtap is a Research Assistant Professor at the Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis (USA). In 2000, he received his PhD at the Center for Cellular and Molecular Biology, Hyderabad (India). Later, during his pos... Read More →
avatar for James Johnson

James Johnson

Minnesota Supercomputing Institute, University of Minnesota

Friday June 30, 2017 12:40 - 13:45
Lobby, Level 0 Level 0, Lecorum

12:40

Containerized Galaxy Deployments and Advanced Testing BoF

During this buzzword heavy BoF, we hope to gather deployers interested in a broad range of topics from big picture choices (such as DevOps, Containers, and Clouds) to specific technologies (for instance Ansible, Condor, and Kubernetes) and specific Galaxy communities (including docker-galaxy-stable, CloudMan, and GalaxyKickStart). This last year has seen significant sharing and splintering of resources for containerized deployments - we hope this BoF can serve as a venue to coordinate plans for the next year to maximize reuse we can accomplish going forward.

Possible Topics Include:

 

  • Discuss plans for major Galaxy virtualization projects for the next year - including at least docker-galaxy-stable but hopefully more such CloudMan and GalaxyKickStart.
  • Discuss different options for deploying Galaxy to Kubernetes/Swarm/OpenShift and potentially coordinate reuse across methods.
  • Discuss multi-container deployment options being developed as part of the docker-galaxy-stable project including planned development and testing of complete deployment recipes for Kubernetes, Swarm, and OpenShift.
  • Discuss plans for major Galaxy-related Ansible efforts this year and develop plans to reduce duplication across projects.
  • Discuss our efforts to automate testing of Galaxy deployment artifacts over the last year and how they can be leveraged. Discuss plans for future testing.

Moderators
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Björn Grüning

Björn Grüning

University of Freiburg

Friday June 30, 2017 12:40 - 13:45
BoF Space 1, Antigone Room Antigone Room, Level 2

12:40

Galaxy for Core Facilities BoF

Do you work with a sequencing core facility? How are you using galaxy (QC, sample tracking, etc)? What missing things do you need?


Moderators
avatar for Brad Langhorst

Brad Langhorst

New England Biolabs (NEB)

Friday June 30, 2017 12:40 - 13:45
BoF Space 2, Antigone Room Antigone Room, Level 2

12:40

Trackster support group BoF

We will gather to discuss Trackster, Galaxy's very own genome browser. We'll discuss the benefits of Trackster and start an ongoing discussion on what features could be added to further increase the utility of this valuable feature.


Moderators
avatar for Frederik Coppens

Frederik Coppens

ELIXIR Belgium & VIB
ELIXIR Belgium and VIB
avatar for Mo Heydarian

Mo Heydarian

Galaxy Project, Johns Hopkins University
MS

Michael Sauria

Johns Hopkins University

Friday June 30, 2017 12:40 - 13:45
Lobby, Level 0 Level 0, Lecorum

13:45

ELIXIR supports Galaxy users and developers
Slides

Abstract
ELIXIR is an inter-governmental organisation which builds on existing data resources and services within Europe (www.elixir-europe.org). It follows a Hub and Nodes model, with a single Hub located alongside EMBL-EBI in Hinxton, Cambridge, UK and a growing number of ELIXIR Nodes, providing services to ELIXIR, located at centres of excellence throughout Europe.

ELIXIR is organised with 5 technical platforms: Compute, Data, Interoperability, Tools and Training. In 2015 a Galaxy Working Group was founded (www.elixir-europe.org/galaxy-wg), which is part of the Tools platform. We aim to further build and support the Galaxy community in Europe, improve integration of ELIXIR resources into Galaxy.

Presenters
avatar for Frederik Coppens

Frederik Coppens

ELIXIR Belgium & VIB
ELIXIR Belgium and VIB



Friday June 30, 2017 13:45 - 14:00
Einstein Auditorium Le Corum, Level 0

13:45

Session 7: Galaxy in today's infrastructures
Moderators
avatar for Björn Grüning

Björn Grüning

University of Freiburg

Friday June 30, 2017 13:45 - 15:20
Einstein Auditorium Le Corum, Level 0

14:00

Advancing metaproteomics research via community-based informatics development and dissemination

Slides

Authors

Pratik Jagtap, University of Minnesota (Galaxy-P)
Björn Grüning, University of Freiburg
James Johnson, University of Minnesota
Alessandro Tanca, Porto Conte Ricerche
Bart Mesuere, Ghent University
Judson Hervey, Naval Research Laboratory
Carolin Kolmeder, University of Helsinki
Jeremy Fischer, Indiana University
Thomas Doak, Indiana University
Thilo Muth, Robert Koch Insitute Berlin
Dave Clements, Johns Hopkins University
Praveen Kumar, University of Minnesota
Subina Mehta, University of Minnesota
Thomas McGoan, University of Minnesota
Clemens Blank, University of Freiburg
Bernhard Renard, Robert Koch Insitute Berlin
Josh Elias, Stanford University
Joel Rudney, University of Minnesota 
Timothy Griffin, University of Mineesota


Abstract
The impact of microbial gene expression (‘microbiome') on human health and environment is receiving increased attention, mostly by applying gene-centric metagenomics analytical approaches. As genes can only reveal potential functions, the characterization via metaproteomics analysis (study of expressed microbial proteins) may reveal the mechanism of microbial responses in ecosystems. However, computational metaproteomics needs deliberate, international coordination to generate robust and sustainable software that can contribute to advance metaproteomics research.

Researchers on this project are metaproteomics informatics developers and users who communicate via mailing list, gitter, and GitHub to jointly define and develop the most useful tools to enable metaproteomics analysis. In December 2016, the researchers held an online Metaproteomics Contribution Fest to develop and implement metaproteomics-focused software tools. Prioritization of tasks resulted in identification of the following tool categories within the complex metaproteomics analytical pipeline:

1) Database generation tools using results from the 16S rRNA data/approach to define taxonomy lists and shotgun metagenome sequencing data (May et al J Proteome Res. 15:2697).

2) Peptide spectral matching tools such as SearchGUI /PeptideShaker and post-processing by MetaProteomeAnalyzer.

3) Taxonomic classification and functional characterization using UniPept and packaging and testing of DIAMOND to generate outputs for MEGAN analysis.

These developed tools are being made accessible through the Galaxy Toolshed and a publicly available metaproteomics gateway at NCGAS. Vetted tools and workflows are accessible and will be updated and tested on this Jetstream cloud computing infrastructure.

Our tools, workflows and resources are being promoted via publications, presentations and training workshops at scientific conferences.


Presenters
avatar for Pratik Jagtap

Pratik Jagtap

Research assistant Professor, University of Minnesota
Pratik Jagtap is a Research Assistant Professor at the Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis (USA). In 2000, he received his PhD at the Center for Cellular and Molecular Biology, Hyderabad (India). Later, during his pos... Read More →



Friday June 30, 2017 14:00 - 14:20
Einstein Auditorium Le Corum, Level 0

14:20

Making Galaxy User Interface Pluggable with Webhooks

Slides

Authors

Evgeny Anatskiy, Bioinformatics Group at Albert-Ludwigs-Universität, Freiburg
Anup Kumar, Bioinformatics Group at Albert-Ludwigs-Universität, Freiburg
Joachim Wolff, Bioinformatics Group at Albert-Ludwigs-Universität, Freiburg
Clemens Blank, Bioinformatics Group at Albert-Ludwigs-Universität, Freiburg
Eric Rasche, Department of Biochemistry and Biophysics, Texas A&M University
Martin Cech, Department of Biochemistry and Molecular Biology, Penn State University
Björn Grüning, Bioinformatics Group at Albert-Ludwigs-Universität, Freiburg


Abstract
Historically Galaxy's implementation of the user interface (UI) did not allow for rich user experience features familiar for users of Facebook, Gmail, GitHub and other sites. To address this Galaxy has been undergoing an architectural transformation towards API-driven framework design. Here we describe benefits of this shift -- the new design philosophy allows us to dramatically improve user experience by enabling features and uses that were simply not possible before.

In particular, the addition of Galaxy Webhooks -- a plug-in system designed for customization of individual Galaxy instances -- is one of the prominent developments enabled by the new architecture. Webhooks are configurable and often community-contributed pieces of code altering the UI and providing additional features. The primary benefit to the community will be the ability to personalize Galaxy instances tailoring them to the needs of individual groups.

Notable examples of webhooks:

  • Tool Describing Tours - providing capability for executing a ‘demo' run of any tool using the tool's own test case and data. This is the ideal medium for guiding and educating users by directly exposing them to tools parameters, inputs and outputs.
  • Overlay Search - allowing rich exploration across all objects in the Galaxy including datasets, tools, histories, workflows, libraries and so on.
  • Tool Flavoring - generating a list of installed tools that can be readily packaged into docker images mimicking the original Galaxy's toolset.

We believe webhooks represent a logical result of project's sustained focus on building robust, reliable framework for integration of tools and plugins.


Presenters
avatar for Björn Grüning

Björn Grüning

University of Freiburg
avatar for Martin Čech

Martin Čech

Dev, Galaxy Project, Penn State University



Friday June 30, 2017 14:20 - 14:40
Einstein Auditorium Le Corum, Level 0

14:40

A reproducible data analysis environment for next-generation sequencing on public cloud computer

Slides

Authors
Manabu ISHII, RIKEN ACCC Bioinformatics Research Unit
Matsushima Akihiro, RIKEN ACCC Bioinformatics Research Unit
Mika Yoshimura, RIKEN ACCC Bioinformatics Research Unit
Hiroki Danno, RIKEN ACCC Bioinformatics Research Unit
Itoshi NIKAIDO, RIKEN ACCC Bioinformatics Research Unit


Abstract
With the progress of DNA sequencing methods, it continues to increase a quantity of data and type of data to be produced. To analyze such data, we need massive computer resources and setup of various software and databases. Many data-analysis techniques and databases are constantly developed. Accordingly, it takes plenty of time and works to construct an analysis environment, such as procurement of computers, installation of software, and construction of data analysis pipelines.

To cope with both reproducibility and flexibility of the environment, we develop a Docker image with-in Galaxy, job scheduler, and data-analysis pipeline. We also construct a deployment system of the Docker image on a public cloud system such as Microsoft Azure. The procedure of deployment is implemented by source codes using Chef (Infrastructure as Code). The cloud computer system automatically expanded and destroyed computing nodes depending on a demand of amount of jobs.

In this presentation, we will discuss the comparison the setting time of environment, cost, reproducibility of the pipeline, calculation speed between on-premise and public cloud system. We also demonstrate that the system is constructed from a web browser conveniently. Using this system, we have operated an analysis environment for thousands of single-cell RNA-sequencing in our laboratory. The system including data-analysis pipeline has been tested its idempotence with continuous integration / continuous delivery way.


Presenters
MI

Manabu Ishii

Technical Staff, RIKEN ACCC Bioinformatics Research Unit
RIKEN ACCC Bioinformatics Research Unit



Friday June 30, 2017 14:40 - 15:00
Einstein Auditorium Le Corum, Level 0

15:00

Developing analytical workflows for plant genomics through the integration of Galaxy and Tripal
Slides

Authors

Sean Buehler 1, Nic Herndon 1, Emily Grau 1, Ming Chen 2, Abdullah Almsaeed 2, Connor Wytko 3, Brian Soto 3, Sook Jung 3, Shawna Spoor 3, Kuangching Wang 4, Chun-Huai Cheng 3, Nick Watts 4, Lacey Sandserson 5, Jill Wegrzyn 1, Doreen Main 3, Alex Feltus 4, Margaret Staton 6, Stephen Ficklin 3, Nathan Henry 6 

 1 : University of Connecticut
 2 : University of Tennessee Institute of Agriculture
 3 : Washington State University
 4 : Clemson University
 5 : University of Saskatchewan
 6 : University of Tennessee


Abstract
Species or clade specific genomics databases offer curated and specialized data (as well as relevant metadata) to scientists. As the quantity of next generation sequencing sourced data increases, the need to store, transfer, and analyze efficiently becomes a tremendous challenge. The open-source platform, Tripal, connects Drupal (a content management system) and Chado1,2 (a standardized relational database model for biological data). Today, a coalition of genomics databases implement Tripal for their online data repository needs. Recent development in Tripal is focused on achieving not just storage, but cross-database discovery to allow delivery of data directly to the Galaxy platform3. Through the new Tripal Galaxy module, scientists can select custom datasets from within and across Tripal databases and import those directly to a Galaxy instance from within a Tripal repository. A team of developers representing several plant genomics databases are focused on implementing workflows for differential gene expression, variant detection, and association genetics. These workflows are provided to the public as modules for Tripal databases. Current efforts at TreeGenes4, a repository for forest tree genomics, are focused on implementing association mapping and landscape genomics analysis in Tripal and Galaxy. This will include data integration and analytical capabilities for thousands of individual tree accessions in the form of genotype, phenotype, and environmental data. 

Presenters
SB

Sean Buehler

University of Connecticut



Friday June 30, 2017 15:00 - 15:20
Einstein Auditorium Le Corum, Level 0

15:20

15:20

15:20

P02: Pulling Galaxy's Strings
Poster

Authors

Jeffrey Miller 1

1 : The University of Iowa, 230 S. Madison St. Iowa City, IA 52242 -  United States


Abstract
Puppet allows developers and systems administrators to define a state for the operating system. In addition, applications may be configured as well according to whatever state is defined in puppet code. When configuration drift occurs in a system, puppet automatically restors the state to what it shoud be according to the code that defines that state. This poster details a way of leveraging puppet for Galaxy infrastructure management.

Presenters
avatar for Jeffrey Miller

Jeffrey Miller

Sr. Systems Administrator, The University of Iowa
As a systems administrator with The University of Iowa ITS Research Services group, I primarily support the storage and computational infrastructure for the Iowa Institute of Human Genetics. Talk to me about scaling infrastructures, virtualization and automation with Puppet.



Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P04: Integration of openBIS - improving metadata management in Galaxy
Authors
Aarif Mohamed Nazeer Batcha 1*, Guokun Zhang 1*, Ulrich Mansmann 1

1 : Department of Medical Informatics, Biometry and Epidemiology, University of Munich  (IBE, LMU), Marchioninistrasse 15, 81377 Munich -  Germany
* : Corresponding author 

Abstract
The NGS-Fablab, an IT infrastructure for handling Next-Generation Sequencing data, was built for researchers at the university hospital and medical faculty of University of Munich. As one of the core modules of NGS-Fablab, Galaxy (version 2015) was deployed to provide bioinformatics platform for biomedical research projects in many different fields, including variant calling procedure, differential gene expression and copy number analyses. Since various types of NGS data could be provided from different locations, it was noticed that the data library inside of Galaxy is short of the strength to manage the metadata in an effective manner. To solve that issue, the Open Source Biology Information System (openBIS) was introduced into NGS-Fablab which has strong ability to track, annotate and share data throughout distributed research projects in the biological sciences. A direct connection between Galaxy and openBIS is till now not available. We have integrated openBIS into Galaxy by using Web2py (a python based webserver frame) so that a smooth communication between two systems is built. In our poster, we aim to elaborate the technical issues regarding building connection between Galaxy and openBIS, the functionality such as querying the metadata, running Galaxy workflow and updating downstream data afterwards. The potential further development will also be discussed, such as graphically summarizing the metadata and visualizing the down streaming analysis.

Presenters
AM

Aarif Mohamed Nazeer Batcha

Department of Medical Informatics, Biometry and Epidemiology, University of Munich (IBE, LMU)
GZ

Guokun Zhang

Department of Medical Informatics, Biometry and Epidemiology, University of Munich (IBE, LMU)


Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P06: A Public Galaxy platform @Pasteur used as an execution engine for web services
Poster

Authors

Fabien Mareuil 1, Olivia Doppelt-Azeroual 1, Hervé Ménager 1

1 : Hub Bioinformatique et Biostatistique, Institut Pasteur [Paris]C3BI 25-28, rue du Docteur Roux, 75724 Paris cedex 15 -  France


Abstract
Galaxy (Afgan et al. 2016) is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research.

Since 2013, scientists from the Institut Pasteur have access to a galaxy instance where they can ask to use any tool available on the Institute cluster. Today, Galaxy@Pasteur instance is public; more than 280 tools are available and usable with approximately 3000 jobs per month for the 695 registered users as well as 1500 jobs per month launched by anonymous users.

However, in some cases, the complexity and specificities of the required applications call for the development of custom web interface.

For the last 4 years, to answer this problematic, several web services were created around this Galaxy instance:

For these web services, Galaxy is used as an execution engine to launch a tool or a workflow on the Institut Pasteur cluster. Web services communicate with Galaxy using directly the Galaxy API or via the Bioblend library (Sloggett et al. 2013).

This approach allows to manage only one server opened to external users; giving an access to the Pasteur ressources (storage and cluster power). Moreover use Galaxy as an execution engine decreases the development effort for web services.



Presenters
FM

Fabien Mareuil

Hub Bioinformatique et Biostatistique



Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P08: GEGA: An application for secure access and analysis of EGA data with Galaxy
Authors
David Van Zessen 1, Alexander Snef 2, Youri Hoogstrate 1, Dylan Spalding 2, Andrew Stubbs 1*
1 : Bioinformatics Dept., Erasmus University Medical Center  (Erasmus MC)
2 : European Bioinformatics Institute [Hinxton]  (EMBL-EBI), EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK -  United Kingdom
* : Corresponding author


Abstract
The European Genome-phenome Archive (EGA) was created to manage both access and distribution, while providing long-term archival of controlled-access bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. A European research infrastructure for life-science data (ELIXIR), initiated a project (2016 Human Data Implementation Study) to determine ELIXIR requirements for secure management of controlled-access data. The project resulted in a prototype application to access EGA from Galaxy [Hoogstrate Y, et al 2016 Hoogstrate Y, et al. 2016 ]. We subsequently implemented a fully functional Galaxy EGA (GEGA) tool to allow end users easy and secure access to data deposited at the EGA and to run any workflow that is available in Galaxy. The data remains in an encrypted state on disk using the fuse layer developed at the European Bioinformatics Institute (EBI) to ensure data security. A new feature in GEGA is the ability for users to combine EGA and non-EGA data in the same Galaxy workflow without compromising secure access to the EGA. GEGA is builds upon our FAIR data management and analysis application (myFAIR) and will be freely available for download and use from https://github.com/ErasmusMC-Bioinformatics/ega_galaxy.

Presenters
DV

David Van Zessen

Bioinformatics Dept., Erasmus University Medical Center (Erasmus MC)


Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P10: ToolDog - generating tool descriptors from the ELIXIR tool registry
Poster

Authors

Kenzo-Hugo Hillion 1*, Ivan Kuzmin 2, Hedi Peterson 2, Jon Ison 3, Hervé Ménager 1*

1 : Center of Bioinformatics, Biostatistics and Integrative Biology  (C3BI)Institut Pasteur de Paris
2 : Institute of Computer Science, University of Tartu
3 : Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark* : Corresponding author


Abstract
Over the last years, the use of bioinformatics tools has been eased by the use of workbench systems such as Galaxy or frameworks that use the Common Workflow Language (CWL). Still, the integration of these resources in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete description of tools that are often missing information such as some parameters, a description or metadata.

ToolDog (Tool DescriptiOn Generator) is the main component of the Workbench Integration Enabler service of the ELIXIR bio.tools registry. The goal of this tool is to guide the integration of tools into workbench environments. In order to do that, ToolDog is divided in two main parts: the first part analyses the source code of the bioinformatics software with language dedicated tools and generates a Galaxy XML or CWL tool description. Then, the second part is dedicated to the annotation of the generated tool description using metadata provided by bio.tools. This annotator can also be used on its own to enrich existing tool descriptions with missing metadata such as the recently developed EDAM annotation.



Presenters
KH

Kenzo-Hugo Hillion

Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI); Institut Pasteur de Paris



Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P12: Integration of Alvis within the OpenMinTeD platform as Galaxy utilities
Authors
Jean-Baptiste Bohuon*, Mouhamadou Ba, Robert Bossy 1, Claire Nédellec  1

1 : Mathématique et Informatique Appliquées du Génome à l'Environnement  (MaIAGE)Institut national de la recherche agronomique (INRA)MaIAGE, INRA, F78350 Jouy-en-Josas - France -  France
* : Corresponding author


Abstract
Alvis is a high-performance Text and Data Mining (TDM) suite, tailored to the needs of information extraction from scientific publications in Life Science and Agriculture. The Alvis suite is developed at Bibliome, an INRA team specialized in the development of information extraction, ontology building and TDM methods. The Alvis library of modules include state of the art entities and relations annotation modules, some of which use machine learning algorithms and semantic resources. Application design with Alvis is done by the combination and adaptation of many components to the task in the form of workflows. Alvis is one of the TDM platform solutions (with Gate and UIMA) that are being integrated by the OpenMinTeD community to create an open, service-oriented e-Infrastructure for TDM of scientific publications. OpenMinTeD is a European project that aims to develop a platform to foster and facilitate the use of TDM technologies by the scientific community. The communiAlvis;OpenMinTeDty is dedicated to developing workflows enabling non-experts computer scientists to apply state of the art TDM workflows. Galaxy is used by OpenMinTeD as the workflow engine and is associated to a resource registry and user interfaces within the platform. This platform enables input data, such as corpora and ontologies to be tighly associated to specific workflows. The Bibliome team is currently integrating Alvis modules as Galaxy tools into the OpenMinTeD platform. This poster specially presents the Alvis approach, how it is adapted to OpenMinTeD and how it is being integrated to Galaxy through specific use cases.

 



Presenters

Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P14: Du Novo: a simple, reference-free tool for turning duplex sequencing data into extremely accurate reads
Authors
Nicholas Stoler 1*, Barbara Arbeithuber 2, Wilfried Guiblet 1, Marcia Su 3, Kateryna Makova 2, Anton Nekrutenko 3*
  
1 : Department of Biochemistry and Molecular Biology, Penn State University (BMB), University Park, PA 16802 -  United States
2 : Department of Biology, Penn State University, University Park, PA 16802 -  United States
4 : The Jackson Laboratory For Genomic Medicine (JAX), Farmington, CT 06032 -  United States
* : Corresponding author 


Abstract
Next-generation technology has revolutionized sequencing in terms of the magnitude of data generated. However, the accuracy of the technology has not improved at anywhere near the rate of its throughput. For variant detection in diploid systems, the existing error rate is generally adequate. However, for detecting low-frequency variants in non-diploid systems like bacterial/viral populations, cancer genetics, somatic variation, and mitochondria/chloroplasts, the current error rates are prohibitively high. Single-molecule barcoding techniques now enable much higher accuracy, with duplex sequencing able to deliver per-base accuracies four orders of magnitude greater. The existing pipeline for processing duplex reads is based on aligning to a reference sequence, a restriction which introduces biases and prevents use in certain de novo applications. It also is sensitive to sequencing error in barcodes, which causes loss of valuable data. Here, we present Du Novo, a reference-free pipeline which can produce highly accurate reads and recover data by correcting errors in barcodes. Du Novo is based in Galaxy, allowing users to analyze their raw data with a simple graphical interface. Using simulations and published data previously analyzed with the existing pipeline, we show that Du Novo is able to identify variants at a frequency as low as 1 in 10,000. We show an application of the pipeline to reliably identify low-frequency variants in a non-diploid system, mitochondrial DNA. Du Novo is open source, and available at github.com/galaxyproject/dunovo.

Presenters
avatar for Nick Stoler

Nick Stoler

Penn State University
Penn State University


Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P16: Galaxy-P rides the Jetstream: Cloud-based multi-omic informatics
Authors
Timothy Griffin 1*, Matthew Chambers 2, James Johnson 1, Thomas McGowan 1, Thomas Doak 3, Jeremy Fischer 3, Praveen Kumar 1, Pratik Jagtap 1

1 : University of Minnesota
2 : Vanderbilt University
3 : Indiana University
* : Corresponding author



Abstract
The collaborative Galaxy for proteomics project (Galaxy-P) has demonstrated Galaxy's value in integrating genomic and mass spectrometry (MS)-based proteomic software for multi-omic applications such as proteogenomic and metaproteomic analysis. Thus far, Galaxy-P tools and workflows have been most readily available to other users who are operating local Galaxy instances. To increase accessibility, the Galaxy-P team has partnered with the NSF-funded cloud-based cyberinfrastructure Jetstream. In the initial phase of developing the Galaxy-P/Jetstream resources, we have focused on developing an instance with tools and workflows for integrating RNA-seq and MS-based proteomics data for proteogenomics studies. A main workflow generates protein sequence databases from in-silico translation of RNA-seq data, focusing on potentially expressed proteins from RNA variants, such as single nucleotide variants and insertion-deletions. The second workflow is focused on sequence database searching utilizing the customized FASTA database from the first workflow. This workflow also includes a step where putative variant peptide sequences are searched against known proteins via BLAST-P, to confirm their novelty. More recently we have established a second instance for metaproteomics analysis, where proteins expressed by microbial communities are characterized. This instance combines software for generating protein sequence databases from metagenomics data, with tools for sequence database searching and also visualization of taxonomy and molecular functions represented by the identified meta-proteins. For both resources, we are leveraging Jetstream's ability to handle increasing memory and processor load as data-intensive analyses require. We are also using the extensibility of the Galaxy framework to continue to extend the functionality of these cloud-based instances.

Presenters
avatar for Timothy Griffin

Timothy Griffin

Center for Mass Spectrometry and Proteomics, University of Minnesota


Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P18: Implementation of an interface for exploring molecular diversity by proteogenomics
Authors
Yannick Cogne 1, Christine Almunia 1, Olivier Pible 1, Duarte Gouveia 2, Arnaud Chaumot 2, Olivier Geffard 2, Jean Armengaud 1*

1 : Innovative technologies for Detection and Diagnostics  (Li2D), CEA Marcoule, BP17171, F-30207 BAGNOLS-SUR-CEZE -  France
2 : Milieux aquatiques, écologie et pollutions  (UR MALY), CEMAGREF, 5 rue de la Doua, CS70077, 69626 Villeurbanne Cedex, France -  France
* : Corresponding author



Abstract
Defining molecular markers to assess health status or modes of action of chemical contaminants in any sentinel organism is an important objective in ecotoxicology. With the long-term goal of testing the biological quality of freshwater systems, we developed a biomonitoring approach based upon a widespread sentinel organism, namely the crustacean amphipod Gammarus fossarum. Biomarkers of interest from this organism, which are proteins whose abundance varies depending on the presence of pollutants, can be monitored by proteomics. G. fossarum is a so-called "non-model" species because there is no reference genomic sequence for this genus. A reference protein sequence database can be obtained by de novo assembly of mature transcript sequences with the help of proteogenomics data. To take into account the diversity of Gammarus populations and cryptic species, a combination of bioinformatics tools for assembling RNAseq data and proteomic assignment is required. We choose the Galaxy working environment with an easy portable solution using Docker, a container management program runable under Windows. Galaxy Docker (developed by Björn Grüning) with one container per tool option provides version management as well as a high reproducibility of environment variables. The implementation for evaluating strategies for exploring molecular diversity through proteogenomics, including the management of data from new RNA sequencing technologies and their assembly will be detailed.

Presenters
YC

Yannick Cogne

Innovative technologies for Detection and Diagnostics (Li2D)


Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P20: Development and implementation of a transversal NGS & bioinformatics platform at the Belgian Institute of Public Health: Deployment of a Galaxy instance for routine use in a public health setting
Poster

Authors

Julien Van Braekel 1, Bert Bogaerts 1, Raf Winand 1, Quiang Fu 1, Sigrid De Keersmaecker 1, Nancy Roosens 1, Kevin Vanneste 1*

1 : Belgian Institute of Public Health  (WIV-ISP), Rue Juliette Wytsman 14 1050 Ixelles -  Belgium
* : Corresponding author 


Abstract
Despite being a well-established research method, the use of NGS and bioinformatics for routine analysis in a public health setting remains a challenge. The NGS & bioinformatics platform was set up at the Belgian Institute of Public Health with the aim of utilizing NGS & bioinformatics for the diagnosis, surveillance, control and characterisation of potentially harmful organisms; and to promote public health genomics by the effective integration of NGS and bioinformatics into clinical use and public health policy. The platform develops solutions and provides data acquisition and analysis tools to complement the WIV-ISP laboratories services (including several national reference centres and laboratories); and to integrate the knowledge of genomics into public health policy. The platform has built up the capacity to generate and analyse NGS data through an in-house Miseq and advanced bioinformatics pipelines and databases. An in-house instance of Galaxy was deployed and specifically tailored to provide user-friendly access in a routine setting to non-expert users. Both bioinformatics tools and validated standardized pipelines that generate detailed output reports are directly integrated into this Galaxy instance. Analyses are distributed over a cluster and tailored storage solutions to deal with resource-intensive applications. Relevant databases are automatically updated and offered directly through the user-interface. All analysis is traced by storing relevant parameters of every executed job in specific in-house developed databases. These modifications ensure that all bioinformatics tools, pipelines, and databases can be offered as a high-quality service platform for routine analysis for both surveillance and emergency cases.


Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P22: Enhancing the Multi-omics Visualization Platform (MVP) Plug-in for Galaxy-based Applications
Poster

Authors

Thomas McGowan 1,*, James Johnson 1, Pratik Jagtap 2,* 

 1 : University of Minnesota Supercomputing Institute  (MSI) 
 2 : Department of Biochemistry, Molecular Biology and Biophysics (BMBB), University of Minnesota
 * : Corresponding author


Abstract
The Multi-omics Visualization Platform (MVP) visualization plug-in for Galaxy allows users to visually explore the peptides identified by mass spectrometry. The mass spectrometry data is queried from a Galaxy SQLite dataprovider from mz.sqlite datasets generated from mzid (mzIdentML) datasets. MVP provides a variety of filters to allow the user to select peptides of interest. A selected peptide can be viewed in a variety of contexts. The peptide identification can be verified visually by viewing individual Peptide Spectral Matches (PSMs) in the Lorikeet scan viewer. The protein view shows the selected peptide aligned to an identification search protein along with the coverage of other identified peptides. If a search protein is a variant determined from RNAseq analysis of the sample, the protein view may show the variation from a reference protein. If genomic mapping is available for the protein, MVP displays bars along the protein for each exon sequence; clicking on the bar will zoom an IGV browser to that genomic location. IGV can then provide genomic context with concurrent display of RNAseq alignments, RNAseq variants, along with a proBAM or proBED alignment of the peptides identified from mass spectrometry. 

Presenters
avatar for James Johnson

James Johnson

Minnesota Supercomputing Institute, University of Minnesota



Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P24: Annotating microRNA isoforms (isomiRs) of non-model organisms to analyze expression levels using a Galaxy workflow
Poster

Authors

Jochen Bick 1, Susanne Ulbrich 1, Stefan Bauersachs 1

1 : ETH Zurich Animal Physiology


Abstract
The analysis of small RNA-Seq data is more difficult compared to standard RNA-Seq and tools were mainly developed for mouse or human datasets. Non-model organisms such as the pig having a rather poor annotation are more challenging to analyze because the number of known miRNAs is low compared to human. In humans, a great variety of ncRNAs including miRNAs are annotated, which can be used as orthologue information for sequence annotation in other mammalian species. This can help to increase the number of annotated miRNA sequences and their various isoforms (isomiRs). This study presents a data analysis pipeline to filter, annotate, and detect miRNAs and their different isomiRs. The workflow is mainly based on standard Galaxy tools and in-house-scripts. The pipeline is divided in different analysis steps to check for quality and clipping the adapter-sequence which is also used in standard RNA-Seq data analysis. Afterwards all sequences were collapsed to unique sequences and the corresponding read counts. These sequences were mapped using BLASTn-short against a collection of databases containing sequences from miRBase (precursor and canonical mature miRNAs), sequences from NCBI and Ensembl, including ncRNAs and protein-coding transcripts, as well as tRNAs and piRNA cluster sequences. This pipeline was also compared to miRDeep2 to see the differences in mapping of total number of sequences and accuracy of each detected isomiRs. The comparison showed the benefit of mapping all obtained sequences also to rRNAs, tRNAs, and other ncRNAs to identify and eliminate false positives present in miRBase and in the miRDeep2-results.

Presenters
avatar for Jochen Bick

Jochen Bick

ETH Zürich



Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P26: The Burden of Occupational Cancer in Korea
Authors
Mia Son 1

1 Department of preventive medicine, School of Medicine, Kangwon National university


Abstract
1. Objectives: to produce an estimate of the current burden of occupational cancer for Korea.

2. Methods: The carcinogenic agents were identified for each cancer as those classified by the International Agency for Research on Cancer (IARC) as Group 1 carcinogens. Estimates of relative risks of occupational carcinogens for each cancer are obtained from a search of the literatures from Asian studies, using meta analysis. Estimation of the proportion of the population exposed to each carcinogen was calculated, after then the Attributable Fraction(%) for each cancer for the target year i.e. 2007 was calculated.

3. Results: Among total cancer deaths, 7.62% (5115) cancer deaths were attributable for IARC Group 1 of carcinogenicity in humans (men, 10.32% (4387); women, 2.96% (728)) and 6572 (4.22%) cancer incidences (men, 6.81% (n=5590); women, 1.33% (n=982)) for Group 1 of carcinogenicity in humans in 2007. Asbestos contributed to the largest numbers of deaths and incidences, followed by silica, PAHs, diesel engine exhaust, radiation, chrome and mineral oils.

4. Discussions: The numbers of deaths and incidences due to past high occupational exposures will continue to be substantial in the near future, particularly asbestos-related cancers.

Presenters
MS

Mia Son

Kangwon National University (KNU)
Department of preventive medicine, School of Medicine, Kangwon National University (KNU)


Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P28: ChemFlow, chemometrics for everybody
Poster

Authors

Virginie Rossard 1, Eric Latrille 1, Fabien Gogé 2, Jean-Michel Roger 3,  Martin Ecarnot  1,  Jean-Claude Boulet 1

1 : INRA
2 : IRSTEA


Abstract
ChemFlow is a multivariate analysis software developed to provide a free user-friendly tool for researchers, teachers and students interested in chemometrics. Thanks to the Galaxy platform, ChemFlow is a secure, free, multi-user web application. It integrates codes from different languages to facilitate the fast development of new methods. This tool provides the opportunity to graphically edit processing workflows. Another feature is to be able to share these workflows and data between users. Data and processing parameters are saved and managed through Galaxy database. It allows the traceability and the reproducibility of the scientific procedures. This software has been designed to be both a teaching resource and a research and development tool. Indeed,it also supports a MOOC : CheMoocs. ChemFlow offers classical chemometric functions, regression and discrimination tools, and methods of spectral decomposition, calibration transfer or multiblock analysis. Dedicated graphs and diagrams are used to explore results in an interactive way. ChemFlow is installed as a virtual machine on several French academic servers. With 1570 students following the MOOC, 650 ChemFlow accounts were created and the public server https://vm-chemflow.toulouse.inra.fr has managed 47000 queries via chemometric tools. Visit the FUN platform (https://www.fun-mooc.fr) for up to date MOOCs in October 2017: CheMoocs-basics and CheMoocs-advanced.

Presenters
avatar for Virginie Rossard

Virginie Rossard

French National Institute for Agricultural Research, INRA
INRA



Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P30: Pushing the limits of job flexibility on HPC
Authors
Carrie Ganote 1, Sheri Sanders, Phil Blood, Thomas Doak, Bhavya Nalagampalli Papudeshi, Asma Bankapur*, Brian Haas, Tim Tickle, Cicada Brokaw

1 : Indiana University  (IU), 107 S. Indiana Avenue Bloomington, IN 47405-7000 -  United States
* : Corresponding author


Abstract
This poster will accompany the accepted lightning talk with the same title. Indiana University in partnership with the Broad Institute maintain and develop a Galaxy instance devoted to Trinity CTAT tools. Trinity is a memory and I/O intensive program that has really pushed the limits of the hardware devoted to it; this poster will detail a few methods we've used and have started to implement to get the most out of the service.

Presenters
avatar for Carrie Ganote

Carrie Ganote

Indiana University


Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

P32: An odyssey into Galaxy: Quality control and quality assurance of NGS-based laboratory diagnostic tests offered by Newborn Screening Ontario
➔ Poster

Authors

Lemuel J. Racacho  1,*

1 : Newborn Screening Ontario (NSO)
* : Corresponding author

Abstract
Rapidly evolving genomic technologies, such as next generation sequencing (NGS), are being adopted by many public health laboratories to diagnose genetic disorders.  As Newborn Screening Ontario (NSO) implements NGS laboratory diagnostic testing (NGS-LDT), we are facing many challenges in defining, interpreting and monitoring various performance characteristics within the regulatory and privacy frameworks we must remain compliant to.  To date, we have evaluated all quality control (QC) checkpoints within our clinical pipeline using the available tools within a local instance of Galaxy. Additional QC nodes were later identified which enhanced our quality assurance (QA) program. One key benefit of using a local Galaxy is that it offers a simple and controlled visual environment to variant interpreters with little to no background in programming for assessing QC nodes in a clinical NGS pipeline.

Presenters
LJ

Lemuel J. Racacho

Genomics Specialist, Newborn Screening Ontario


Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

15:20

D02: Improve Genome Annotation Accuracy through Manual Annotation in Apollo
Apollo Poster

Authors

Nathan Dunn 1*, Monica Munoz-Torres 1, Deepak Unni 2, Eric Rasche 3, Eric Yao 4, Ian Holmes 4, Christine Elsik 2, Suzanna Lewis 1

1 : Lawrence Berkeley National Lab  (LBNL), Lawrence Berkeley National Lab 717 Potter Street Berkeley, CA 94710 -  United States
2 : Division of Plant Sciences, University of Missouri, 52 Agriculture Lab, Columbia, MO 65211 Columbia, MO 65211 -  United States
3 : Center for Phage Technology, Texas A&M University  (CPT), 2128 TAMU College Station, TX USA 77843-2128 -  United States
4 : Department of Bioengineering, University of California, Berkeley  (UC Berkeley), 119 California Hall Berkeley, CA 94720-1500 -  United States* : Corresponding author


Abstract
Modern genome annotation projects have increasingly involved working with less than perfect data on large and likely geographically disparate teams with more complex workflows. Galaxy excels at managing complex workflows that drive genome annotation projects. However, many times work groups skip this important step, which can be used to visually validate data and correct structural errors, sometimes between 10-30% of annotations.

Apollo is a web-based manual genome annotation tool built on top of the powerful JBrowse genome viewer that can be scaled to multiple genome projects and annotators. Apollo allows for collaborative, real-time editing, similar to Google Docs, and can be integrated within annotation workflows via a full suite of web-services (http://icebox.lbl.gov/Apollo2/WebServices/). To this end, it has been integrated within Docker (https://github.com/GMOD/docker-apollo) as well as Galaxy (https://github.com/GMOD/docker-compose-galaxy-annotation) and as part of a larger consortium of annotation projects (https://github.com/galaxy-genome-annotation/).

Current ongoing projects include support for variant curation and visualization of predicted effects as well as coordinate transformation. Coordinate transformation will allow collapsing of intra- (introns) and inter-genic (space between annotations) to focus attention on data-rich regions. Additionally, it will allow assembly of virtual scaffolds to allow annotation over poorer assemblies.

Find out more: http://genomearchitect.org.


Presenters
avatar for Nathan Dunn

Nathan Dunn

Lead Software Engineer, Lawrence Berkeley National Laboratory
Lawrence Berkeley National Lab (LBNL)



Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

D04: KnetMiner: an application suite to integrate, search and interactively explore large knowledge networks
Authors
Ajit Singh, Monika Mistry, Marco Brandizi, Chris Rawlings, Keywan Hassani-Pak

Rothamsted Research, Harpenden, AL5 2JQ – United Kingdom


Abstract

The process of evaluating the candidacy of potential candidate genes involves numerous challenges in terms of data acquisition, integration, mining and visualisation. The KnetMiner suite of tools aim to facilitate gene discovery and enable biologists and breeders to quickly identify genes, biological processes and pathways influencing complex, polygenic traits. KnetMiner features a data integration platform (www.ondex.org) to integrate and unify information from varied data sources, be it structured or unstructured data, such as gene function annotations, protein-protein interaction data, biochemical pathways, gene expression data, citations in scientific literature and homology information from related organisms, to develop heterogeneous genome-scale knowledge networks.

The KnetMiner web application enables users to interrogate these GSKNs with gene lists, QTL information and trait-related keywords and quickly identify potential candidate genes and networks of associated entities to aid candidate gene discovery and hypothesis generation. This demo will showcase the KnetMiner instance for Arabidopsis. We will query the Arabidopsis knowledge network, which contains several datasets including public GWAS and protein-protein interaction data, with trait-related keywords and explore the ranked candidate genes in Gene View. We will then explore and identify overlapping gene, QTL, SNP and GWAS data in Genomaps and generate gene knowledge networks that can be interactively explore in KnetMaps with a view to identify candidate genes involved in plausible pathways.

KnetMiner is used by different labs at Rothamsted Research and elsewhere to accelerate gene discovery pipelines for crop breeding and crop improvement. While we have so far mostly concentrated on crop species, the approaches we have taken are generic and GSKNs and KnetMiner servers can readily be built for other species as well. KnetMiner is open source and available at http://knetminer.rothamsted.ac.uk.


Presenters
AS

Ajit Singh

Rothamsted Research, Harpenden, AL5 2JQ – United Kingdom


Friday June 30, 2017 15:20 - 16:35
Le Corum Le Corum

15:20

16:35

Using visual programming in Cervical cancer data of The Cancer Genome Atlas
Authors
Thais Hosokawa 1, José Fregnani 1, Rui Reis 1, Adriane Evangelista 1

 1 : Barretos Cancer Hospital  (BCH) , Barretos, São Paulo, Brazil


Abstract
Cervical cancer is the fourth most common cancer in women, and the seventh overall, with an estimated 528,000 new cases in 2012 and is the third most common in Brazil, with estimates of 16,340 in 2016. It accounted in 2012 for 7.5% of deaths by cancer in women.

International consortiums focused on cancer genomes have been using large-scale molecular techniques, generating a huge amount of data, publicly available, allowing new approaches on data analysis. TCGA has more than 2.5 petabytes of data of 11,000 tumor samples. The integration of many large-scale analysis allows a very complete view of cancer molecular basis. However, we need integrative bioinformatics analysis, done by experienced programmers on genomics. In the other hand, we have computational languages representation in an abstract manner by using workflows.

Barretos Cancer Hospital took part on the consortium as a tissue sample. We are willing to analyze the TCGA data, comparing to the Brazilian population and for this we chose Galaxy because it is a free platform that allows access by the web using reproductive workflows, storying all the provenance data, solving the reproducibility problem. Can also be installed locally in a computer or a server.

For this project we want to focus on analyzing DNA sequencing, methylation and gene expression, possibly identify biomarkers that can characterize people in risk for cervical cancer as well as compare molecular data with clinical-pathological characteristics. The protocol is approved by our ethical committee.


Presenters
avatar for Thais Hosokawa

Thais Hosokawa

Projects assistant, Barretos Cancer Hospital
Barretos Cancer Hospital (BCH)


Friday June 30, 2017 16:35 - 16:41
Einstein Auditorium Le Corum, Level 0

16:35

Lightning Talks Friday
Moderators
avatar for Virginie Rossard

Virginie Rossard

French National Institute for Agricultural Research, INRA
INRA

Friday June 30, 2017 16:35 - 17:35
Einstein Auditorium Le Corum, Level 0

16:35

Session 8
Moderators
avatar for Virginie Rossard

Virginie Rossard

French National Institute for Agricultural Research, INRA
INRA

Friday June 30, 2017 16:35 - 18:00
Einstein Auditorium Le Corum, Level 0

16:41

Galaxy at scale: Analyzing thousands of single cell transcriptomes
Authors
Mo Heydarian 1, Enis Afgan 1, James Taylor 1*

1 : Johns Hopkins University
* : Corresponding author

Abstract
Single cell sequencing assays are quickly being adopted in biological research. The rapid rate of standardization and optimization of such assays as single-cell RNA-sequencing (scRNA-seq) requires computational pipelines to cope with these large, complex datasets. Here we present the re-analysis of a scRNA-seq study (GSE81682) on thousands of hematopoietic cells using Galaxy. With only minor modifications to Galaxy using Cloudman and Amazon Web Services, we were able to quantify expression of over 100,000 transcripts across 3,840 individual cells. Using collections to operate on thousands of datasets allowed us to generate a standardized workflow to monitor and filter cells based on quality metrics, generate quality reports on subpopulations of cells, and produce expression tables ready for downstream analysis. This analysis demonstrates Galaxy's ability to scale and reproducibly handle complex pipelines totaling over 100,000 intermediate datasets.

Instructors
avatar for Mo Heydarian

Mo Heydarian

Galaxy Project, Johns Hopkins University


Friday June 30, 2017 16:41 - 16:47
Einstein Auditorium Le Corum, Level 0

16:47

ToolDog - generating tool descriptors from the ELIXIR tool registry
Authors
Kenzo-Hugo Hillion 1, *, Ivan Kuzmin 2, Hedi Peterson 2, Jon Ison 3, Hervé Ménager 1,*

 1 : Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI) Institut Pasteur de Paris
 2 : Center of Bioinformatics, Biostatistics and Integrative Biology3 : Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark
 * : Corresponding author

Abstract
Over the last years, the use of bioinformatics tools has been eased by the use of workbench systems such as Galaxy or frameworks that use the Common Workflow Language (CWL). Still, the integration of these resources in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete description of tools that are often missing information such as some parameters, a description or metadata.

ToolDog (Tool DescriptiOn Generator) is the main component of the Workbench Integration Enabler service of the ELIXIR bio.tools registry. The goal of this tool is to guide the integration of tools into workbench environments. In order to do that, ToolDog is divided in two main parts: the first part analyses the source code of the bioinformatics software with language dedicated tools and generates a Galaxy XML or CWL tool description. Then, the second part is dedicated to the annotation of the generated tool description using metadata provided by bio.tools. This annotator can also be used on its own to enrich existing tool descriptions with missing metadata such as the recently developed EDAM annotation.


Presenters
KH

Kenzo-Hugo Hillion

Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI); Institut Pasteur de Paris


Friday June 30, 2017 16:47 - 16:53
Einstein Auditorium Le Corum, Level 0

16:53

Pushing the limits of job flexibility on HPC
Authors
Carrie Ganote  1

 1 : Indiana University (IU)


Abstract
Galaxy has an extremely flexible and versatile framework for job launching and allocation of resources. At Indiana University, we take advantage of the capabilities of Galaxy as well as adding a few tricks of our own. The following 

topics will be explored:

  • running in RAM
  • job rerunning for programs that checkpoint sensibly
  • Scaling with CPUs and memory using Galaxy Slots
  • Job preemption and hidden tools as a way to sneak tests into production

We will go into a bit of detail about our implementation as well as the problems that pushed us to explore solutions.


Presenters
avatar for Carrie Ganote

Carrie Ganote

Indiana University


Friday June 30, 2017 16:53 - 16:59
Einstein Auditorium Le Corum, Level 0

16:59

Pulling the Galaxy's Strings
Authors
Jeffrey Miller 1,*, Brenna Miller 1

 1 : The University of Iowa  (UI)
 * : Corresponding author


Abstract
Puppet allows developers and systems administrators to define a state for the operating system. In addition, applications may be configured as well according to whatever state is defined in puppet code. When configuration drift occurs in a system, puppet automatically restors the state to what it shoud be according to the code that defines that state. This talk discusses an alternative approach for managing Galaxy systems compared with the often used Ansible management tool.

Presenters
avatar for Jeffrey Miller

Jeffrey Miller

Sr. Systems Administrator, The University of Iowa
As a systems administrator with The University of Iowa ITS Research Services group, I primarily support the storage and computational infrastructure for the Iowa Institute of Human Genetics. Talk to me about scaling infrastructures, virtualization and automation with Puppet.


Friday June 30, 2017 16:59 - 17:05
Einstein Auditorium Le Corum, Level 0

17:05

Using Galaxy as a platform for continuous software development
Authors
Boris Simovski 1, Geir Kjetil Sandve 1, *, Sveinung Gundersen 1,

 1 : Department of Informatics, University of Oslo [Oslo]  (UiO), Oslo Norway
 * : Corresponding author


Abstract
Galaxy is typically used as a platform for disseminating already functioning bioinformatics tools to non-programmers, often by wrapping existing libraries or command-line tools in a user-friendly web interface. In Oslo, we have for nine years used Galaxy in a markedly different setting - developing novel methodology directly in the form of Galaxy tools. We have even been running code through a Galaxy interface during the initial trial-and-error phase of correcting syntax, debugging errors and shaping functionality.Many would consider such direct development within a Galaxy interface as inconvenient. As this has nonetheless become a preferred approach for us, we believe we have evolved routines and a technical setup that represents a novel use case for the Galaxy system. Briefly, our approach brings many of the usual benefits of Galaxy in terms of tracking of executions and sharing of functionality to the setting of prototypic methodology development. Despite some previous weaknesses, we have used our approach for 21 publications on novel methodology and application of custom methodology in the form of Galaxy tools. This year we have finally updated to a modern and appropriate infrastructure, consisting of several connected git repositories, dozens of active branches and deployment based on continuous integration.We will in the talk present the advantages we get as compared to the standard approach of initially developing code locally and testing it out by execution through the command line. We will also present potential challenges and disadvantages, delineating for whom we believe such an approach may be useful.

Presenters
GK

Geir Kjetil Sandve

University of Oslo
avatar for Boris Simovski

Boris Simovski

University of Oslo
University of Oslo


Friday June 30, 2017 17:05 - 17:06
Einstein Auditorium Le Corum, Level 0

17:11

Automated Generation of Complex Toolkits for Galaxy
Authors
Daniel Blankenberg  1

 1 : Penn State / Galaxy Team

Abstract
A key feature of the design of the Galaxy platform is the ease with which new tool configurations can be created and shared. Although the community has done a phenomenal job of making thousands of new tools available, it remains an arduous task to transform large tool suites. For example, the metagenomics packages QIIME (doi:10.1038/nmeth.f.303) and mothur (doi:10.1128/AEM.01541-09) contain over a hundred tools each. Efforts to add both packages have required tremendous effort and patience spanning months to years.

A completely manual process of making command-line utilities function as Galaxy tools does not scale. To address this hurdle, we must enable automatic generation of Galaxy Tools. In the case of singular standalone tools, Planemo via the tool_init command can generate a starter-quality tool configuration. However, for tool generation to be successful on a large-scale, software developers must take care to design well thought out command-line interfaces that make use of standard infrastructure components.

We present two examples of programmatic generation of tools. The first, Anvi'o (doi:10.7717/peerj.1319), is an analysis platform consisting of approximately 50 command-line tools plus an interactive visualization tool allowing users to perform metagenomic binning, characterize single-nucleotide variation, study bacterial pangenomes, predict number of bacterial genomes in a metagenomic assembly, or even remove contamination from eukaryotic assembly projects. Galaxy tool configurations, with production quality interfaces, were generated automatically for each of the Anvi'o platform's commands. A second example is a command-line utility that is able to convert any R package into a set of Galaxy tools.

Presenters
avatar for Dan Blankenberg

Dan Blankenberg

Galaxy Project, Penn State University


Friday June 30, 2017 17:11 - 17:17
Einstein Auditorium Le Corum, Level 0

17:17

Hack the Galaxy: Data Report
Authors
Hack the Galaxy: Data participants

Abstract
A report on what was done at the GCC2017 Hack the Galaxy: Data event this week, and plans for the future.

Friday June 30, 2017 17:17 - 17:23
Einstein Auditorium Le Corum, Level 0

17:23

Hack the Galaxy: Dev Report
Authors
Hack the Galaxy: Dev participants

Abstract
A report on what was done at the GCC2017 Hack the Galaxy: Dev event this week, and plans for the future.

Friday June 30, 2017 17:23 - 17:29
Einstein Auditorium Le Corum, Level 0

17:30

JBrowse and Circos Tool Updates
Authors
Eric Rasche 1, Saskia Hiltemann 2,

1 : Center for Phage Technology, Texas A&M University  (CPT)
2 : Bioinformatics Dept., Erasmus University Medical Center  (Erasmus MC)


Abstract
Since a lightning talk at GCC2016, we have made numerous improvements to the JBrowse and Circos tools for Galaxy. JBrowse has added support for plugins and tracking metadata across time, enabling much better reproducibility. The Circos tool now supports link data and many customization options, allowing you to produce production ready graphics.

Presenters
avatar for Eric Rasche

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology @ Texas A&M University


Friday June 30, 2017 17:30 - 17:36
Einstein Auditorium Le Corum, Level 0

17:36

Parsec: Intergalactic DevOps and Analysis Automation
Authors
Eric Rasche 1

1 : Center for Phage Technology, Texas A&M University  (CPT)


Abstract
Parsec is a new tool which allows users to build simple command line pipelines based on bioblend functions, all without writing a single line of python. Perfect for everything from quick access to administrative functions (adding / removing users from groups), all the way to more complex tasks (filtering a list of histories and doing some operation on all of them.) Parallel projects exist for the python-chado and python-apollo libraries. We have experimented with generating Galaxy tools in addition to the command line tools.

Presenters
avatar for Eric Rasche

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology @ Texas A&M University


Friday June 30, 2017 17:36 - 17:42
Einstein Auditorium Le Corum, Level 0

17:42

GCC2018 Launch
Slides

Information and details about the 2018 Galaxy Community Conference!

Volunteers
avatar for Dave Clements

Dave Clements

Training and Outreach Coordinator, Galaxy Project, Johns Hopkins University


Friday June 30, 2017 17:42 - 17:50
Einstein Auditorium Le Corum, Level 0

17:50

GCC2017 Closing
Presenters

Friday June 30, 2017 17:50 - 18:00
Einstein Auditorium Le Corum, Level 0