➞ Slides
This workshop will focus on introducing the Galaxy user interface and how it can be used to analyze large datasets. We will cover the basic features of Galaxy, including where to find tools, how to import and use your data, and an introduction to workflows. This session is recommended for anyone who has not used, or only rarely uses Galaxy.
➞ Slides
This workshop will introduce the concepts behind transcriptomics with NGS data and how to analyze this data in Galaxy. Specifically, this workshop will focus on de novo transcriptome reconstruction of RNA-seq data with the following goals:
➔ Slides (doi: 10.7490/f1000research.1114389.1)
This workshop will focus on visualisation of large datasets using the built-in tools of Galaxy, focusing on primary next-generation sequencing (NGS) data and the resulting downstream, aggregated data. First, using a multi-omic dataset consisting of exome and transcriptome (RNA-seq) data, participants will visualise alignments, variation, expression levels, and annotations using the Galaxy’s built-in genome browser, Trackster. Participants will learn how to create a genome visualisation, add data, configure data, move between a linear genome browser view and a Circos view, and generate complex genome visualisations (figures) with more than 12 NGS datasets. Second, using a processed multi-omic dataset, participants will create a several numerical plots (e.g., scatter plot, histogram) to gain an overview of the data. Based on insight gained from these visualisations, participants will create a heatmap to identify patterns and potential causal factors. All visualisations will be created, saved, and shared using only Galaxy and a web browser; no data or software uploads or downloads will be necessary.
➞ Slides
This workshop will introduce the concepts behind transcriptomics with NGS data and how to analyze this data in Galaxy. Specifically, this workshop will focus on de novo transcriptome reconstruction of RNA-seq data with the following goals:
➔ Tutorial, Slides
Workshop will cover the basics of de novo genome assembly using a small genome example. This includes project planning steps, selecting fragment sizes, initial assembly of reads into fully covered contigs, and then assembling those contigs into larger scaffolds that may include gaps. The end result will be a set of contigs and scaffolds with sufficient average length to perform further analysis on, including genome annotation. This workshop will use tools and methods targeted at small genomes. The basics of assembly and scaffolding presented here will be useful for building larger genomes, but the specific tools and much of the project planning will be different.
➔ Slides (web)
➔ Usegalaxy.org Global View (web)
Want to know the big picture about what is going on inside Galaxy? This workshop will give participants a practical introduction to the Galaxy code base with a focus on changing those parts of Galaxy most often modified by local deployers and new contributors.
The workshop will include the following specific content:
➔ Slides, Virtual Machine
What is important when you set up a Galaxy server from scratch, what are the pitfalls you might run into, how to interact with the potential users of the service you gonna offer, and how to make sure, the Galaxy instance you have set up is really used in the end. After a general introduction, several Galaxy installations are presented. The session will include some demonstrations and hands-on exercises. We will finish with a panel discussion, where we intend to discuss questions from the workshop participants.
➔ Slides, Exercises
This session will walk developers and bioinformaticians through the process of taking a working script or application and turning it into a Galaxy tool. It will also cover the basics of using Planemo: a command-line utility to assist in building and publishing Galaxy tools. We will investigate wrapping, common parameters, tool linting, best practices, loading tools into Galaxy, citations, and publishing tools to Github and the Galaxy Tool Shed. Common tips and tricks will be discussed as well as insights from experienced tool developers.
➔ Tutorial
The Galaxy bioinformatics platform has emerged as a valuable resource for mass spectrometry (MS) based proteomic informatics. An active community of researchers and users, including the Galaxy for proteomics (Galaxy-P) team, continues to extend Galaxy for these applications.
This hands-on workshop will guide participants through the essential steps for using Galaxy for the analysis of MS-based proteomics data, focusing on protein identification and more advanced multi-omic applications. Workflows from emerging applications integrating genomic and proteomic data (such as proteogenomics and metaproteomics) will also be demonstrated.
In order to extend the reach of these workflows to the greater community, the Galaxy-P team has partnered with both the JetStream cyberinfrastructure resource (http://jetstream-cloud.org/) and Amazon Web Services (https://aws.amazon.com).
The workshop will be constructed to follow the steps based on the structure below:
At the end of the workshop, attendees will have working knowledge of MS-based proteomics tools; experience in setting up basic workflows for protein identification, as well as more advanced workflows in proteogenomics and metaproteomics.
Participants will be given temporary accounts to a cloud-based Galaxy instance to participate in hands-on workshop activities.
Prerequisites➔ Slides (web), Tutorial (web)
After metagenomic data generation, you need to extract useful information such as the taxonomic composition of your samples or the metabolics functions made by the studied environment sample. Several tools have recently been integrated into Galaxy for metagenomic data analysis: Mothur, QIIME, MetaPhlAN, HUMAnN, FROGS,....
We will show in this training how to analyze metagenomic and amplicon data inside Galaxy:
➔ Slides (doi: 10.7490/f1000research.1114389.1)
This workshop will focus on visualisation of large datasets using the built-in tools of Galaxy, focusing on primary next-generation sequencing (NGS) data and the resulting downstream, aggregated data. First, using a multi-omic dataset consisting of exome and transcriptome (RNA-seq) data, participants will visualise alignments, variation, expression levels, and annotations using the Galaxy’s built-in genome browser, Trackster. Participants will learn how to create a genome visualisation, add data, configure data, move between a linear genome browser view and a Circos view, and generate complex genome visualisations (figures) with more than 12 NGS datasets. Second, using a processed multi-omic dataset, participants will create a several numerical plots (e.g., scatter plot, histogram) to gain an overview of the data. Based on insight gained from these visualisations, participants will create a heatmap to identify patterns and potential causal factors. All visualisations will be created, saved, and shared using only Galaxy and a web browser; no data or software uploads or downloads will be necessary.
➔ Slides (web)
Do you have your lab's Galaxy instance set up and configured but want to give it some more love without diving too deep into the code? This training will show you step by step how to modify some advanced but not complex parts of the installation. We will teach you how to:
➔ Conda Slides, Container Slides
This workshop is aimed at people with some experience developing tools but may also be of use to deployers who need to manage complex sets of dependencies for tools.
Galaxy tools define the applications and other dependencies they require to run using their requirements section. This training session will cover the elements of the requirements section and how Galaxy can be configured to utilize these.
The current best practice for resolving these dependencies is using Conda and Bioconda, and so a substantial amount of time will be spent on these topics. We will go through the process of creating, testing, and publishing a Bioconda package. We will work through an example of connecting these packages to Galaxy.
We will also discuss how the Biocontainers project constructs Docker containers from Bioconda packages and how to emulate this process for local testing before publication. Finally, we will review approaches to leveraging these containers from Galaxy to run jobs within containers.
Prerequisites:➔ Slides
Did my IP work? Where is my signal? How well do my replicates correlate? What might my peaks even look like? Where are my peaks (or signal) in relationship to transcription start sites (or other features)? These are common questions that biologists first pose when dealing with ChIPseq data. We will use deepTools and MACS within Galaxy to demonstrate effective methods of
➔ Slides & Tutorials
Galaxy is a great platform for teaching diverse scientific topics to a broad user base. The flexibility, reproducibility, and scalability of Galaxy make it an ideal environment for teaching and training. The Galaxy Training Network is a community initiative dedicated to high-quality Galaxy-based training around the world. One of its objectives is to support trainers with complete training material and recommendations about bioinformatics training. Templates and best training practices were defined to help trainers create new material, unify the different material, and make training materials more accessible and easy for users to learn and for teachers to teach.
This workshop will first introduce participants to the infrastructure of the GTN training materials and describe how to generate training materials following best practices. Participants will generate Galaxy Interactive Tours and create Docker Flavours intended for teaching and training sessions. The workshop will also cover best practices for running Galaxy-based workshops, focusing on how to plan a training session based on number of attendees, time constraints, resource availability, and some best practices for leading Galaxy training sessions.
➔ Slides, Tutorial
SNiPlay workflow allows to exploit high density SNP data from a VCF file. In this trainings, we will show how to analyze SNP data in different ways:
SNiPlay has been integrated into Galaxy and is available via the main Tool Shed as a complete workflow.
Dereeper A, Homa F, Andres G, Sempere G, Sarah G, Hueber Y, Dufayard JF, Ruiz M. SNiPlay3: a web-based application for exploration and large scale analyses of genomic variations. Nucleic Acids Res. 2015 Jul 1;43(W1):W295-300.
Prerequisites➔ Tutorial
A compressed top level review of the advanced parts of Galaxy Administrators Course offered at Salt Lake City in November and in Melbourne in February. Given the size of the scope of this topic we will be explaining advanced concepts, pointing out resources and providing guidance, tips, and tricks rather than going through the exercises and into details.
➔ Slides:
Charts (web)
Generic (web)
➔ Tutorials
Charts (web)
Generic (web)
In this age of high-throughput analysis and big data, visualisations have become an invaluable resource for the presentation and exploration of these often high-dimensional, complex, and large datasets.
While many tools in Galaxy produce static visual outputs (graphs, trees, etc), often some more interactivity is desired to aid in the exploration of these datasets. To support this need, Galaxy offers a range of visualisation options, such as Trackster for browsing genomic data and Charts for the interactive visualisation of tabular data and other datatypes.
In this workshop participants will learn how to develop such visualisations in Galaxy, more specifically: - Develop a module within the Charts visualisation plugin using Javascript - Develop a simple visualisation plugin from scratch
Abstract
Reproducible data analysis requires reproducible software installation. There are many approaches to reproducible software installation – DebianMed, Docker, homebrew-science, software modules, and others. Many work well in cloud and container-enabled environments – where the researcher has full control of a virtual machine or container host and may choose whatever software installation mechanism makes sense. However, these same approaches are less appropriate at high performance computing (HPC) centers where large centralized resources mean such freedom is unavailable. On the other hand, the HPC-centric approaches do not provide options such as ready-to-run software containers ideal for the cloud. Furthermore, some approaches are built to work with command-line scripting while others are built for specific computational platforms or deployment technologies. Here we will outline an approach that covers all of these scenarios with a great deal of flexibility – allowing for the execution of the same binaries regardless of which technologies are selected. For Galaxy in particular, this approach allows the same packages and binaries to be used inside and outside of containerized environments automatically without extra annotation in Galaxy tools.
This approach to reproducibility is the combination of Bioconda and BioContainers.We will update the community on progress in Bioconda adoption and demonstrate that it has improved Galaxy dependency management for both developers and deployers. We will then focus in depth on BioContainers - containerized environments built automatically from Bioconda packages and how they enable containerized tool execution across all best practice Galaxy tools without requiring extra work by tool authors or administrators.
➔ Slides (web)
Authors
Bérénice Batut, University of Freiburg
Galaxy Training Network
Dave Clements, Johns Hopkins University
Björn Grüning, University of Freiburg (ALU)
Abstract
With the advent of high-throughput platforms, life science data analysis is tightly linked to the use of bioinformatics tools, resources, and high-performance computing. However, the scientists who generate the data often do not have the knowledge required to be fully conversant with such analyses. To involve them in their own data analysis, these scientists must acquire bioinformatics vocabulary and skills through training.
Data analysis training is particularly challenging without a computational background. The Galaxy framework is addressing this problem by offering a web-based, intuitive and accessible user interface to numerous bioinformatics tools.
Recently, the Galaxy Training Network (GTN) set up a new open, collaborative, online model for delivering high-quality bioinformatics training material: http://galaxyproject.github.io/training-material.
Each of the current 12 topics provides tutorials with hands-on, slides and interactive tours. Tours are a new way to go through an entire analysis, step by step inside Galaxy in an interactive and explorative way. All material is openly reviewed, and iteratively developed in one central repository by 40 contributors. Content is written in Markdown and, similarly to Software/Data Carpentry, the model separates presentation from content. In addition, the technological infrastructure needed to teach each topic is described with a list of needed tools. The data (citable via DOI) required for the hands-on, time and resource estimations and flavored Galaxy Docker images are also provided.
This material is automatically propagated to Elixir's TeSS portal. With this community effort, the GTN offers an open, collaborative, FAIR and up-to-date infrastructure for delivering high-quality bioinformatics training for scientists.
➔ Slides
Authors
Abstract
With more than 90 public and hundreds of non-public Galaxy servers, there is a growing demand for managing Galaxy application servers. Even though on the surface this is a straightforward task, in the long run or when an instance serves a large number of users, the administration requirements become significant. Examples of administration tasks include proper selection of hardware, deploying in a production-ready mode, connecting to a compute cluster or external authentication, keeping an instance up to date, etc. Cumulatively, these tasks require detailed knowledge of the Galaxy administration principles. In response, we have designed Galaxy Admin Training materials - a set of tutorials catered to current and future administrators intended to educate them about the principles of Galaxy administration.
The Galaxy Admin Training is envisioned as a multi-day interactive training workshop and/or a set of materials that can be followed in a self-paced setting. The current set of topics cover a range from introductory steps required to setup an instance of Galaxy on one's laptop, to advanced Galaxy server or cluster setup, to managing users and resources, to setting up Interactive Environments. All the materials are developed and available in a public GitHub repository (https://github.com/galaxyproject/dagobah-training), facilitating revisions and expansion of topics.
Conceptually, the materials represent an educational framework for training Galaxy server administration. Any community additions/modifications/updates to the materials can be added to the master repository via a pull request. Community involvement and reuse of the materials is encouraged, with help available from the past instructors.
➔ Slides
Authors
Geir Kjetil Sandve, Department of Informatics, University of Oslo
Boris Simovski, Department of Informatics, University of Oslo
Sveinung Gundersen, Department of Informatics, University of Oslo
Diana Domanska, Department of Informatics, University of Oslo
Christin Lund-Andersen, Department of Informatics, University of Oslo
Abstract
Biomedical investigations increasingly consider patient- and cell-type-specific data. These may be generated for a particular study or gathered from public reference collections like ENCODE and Roadmap Epigenomics, which provide hundreds of cell-specific tracks for a variety of markers. These developments make it increasingly important to offer bioinformatics users the ability to efficiently define and manage large collections of datasets.
The introduction of dataset lists in Galaxy has greatly improved handling of multiple datasets. Still, to allow users to casually compile and analyze hundreds of datasets in one go, we argue that the present Galaxy lists should be complemented with a representation where dataset collections are first-class entities in the system. The GSuite format may serve such a purpose, as it represents a collection of datasets as a tabular file in history, allowing the collection to be modified using standard Galaxy tools. Furthermore, users can even manipulate a collection by downloading the tabular file, processing it by any custom script or standard spreadsheet software, and upload the modified file back to Galaxy.
We also present the recently published GSuite HyperBrowser (PMID:28459977), a public Galaxy instance that spearheads efficient and user-friendly analysis of patient- and cell-type-specific data (https://hyperbrowser.uio.no). For instance, a user can easily retrieve hundreds of cell-specific DNase datasets from a repository like ENCODE, fine-tune the collection using a variety of customization tools, and use the collection to asses cell specificity of a separate dataset of disease-associated locations. A regular user can perform a complete analytical scenario like this within minutes.
Apollo has been successfully integrated with Galaxy via Docker, and externally via its web-services, allowing the community to refine predicted genome elements generated via Galaxy workflows. Annotated genomic elements may be exported as FASTA, GFF3, or as a Chado database.
We introduce two important features nearing completion. The first is variant annotation, which provides both a way to annotate and visualize variants as well as to visualize individual and combined effects of each variant on a given annotation. The second is coordinate transformation, which allows the visualization of two or more genomic regions, from the length of entire chromosomes to just a few exons, within an artificially constructed “assemblage”. This facilitates annotation of genomic features split across two or more regions of a fragmented assembly, while informing potential improvements to the genome assembly in the process. Additionally, inter- and intragenic regions can be hidden to focus on regions of interest. For example, bringing the sequences of exons separated by thousands of base-pairs to be shown adjacently.
Learn more at http://genomearchitect.org/.
The French Institute of Bioinformatics (IFB) is a national service infrastructure in bioinformatics constituted of 35 bioinformatics platforms spanning the entire territory.
The principal mission is to provide an integrates and sustains bioinformatics resources and services across the life science community.
These services can be grouped in:
The IFB-Infrastructure coupled to the required computing and storage capacity in a national bioinformatics cloud. To address the most common needs, a selection of major scientific software tools was made and they were installed in pre-configured virtual images (cloud appliances), ready to run on the IFB’s cloud (biosphere). IFB is also using the lightweight virtualization based on Docker containers to provide bioinformatics tools and pipelines ready to run in the cloud or locally to personalize their virtual research environment. One of the most widely adopted images/containers coming from Galaxy.
The IFB also contributes for structuring and organizing the French bioinformatics community on emergent needs like Galaxy platforms. At the National level, IFB member is gathering in the French Galaxy Working. The objective of this working group is to federate the French bioinformatics community of developers working on the Galaxy environment. This working group activities are related to the 3 following topics:
Arts & Crafts BoF
GCC sure can be overwhelming sometimes! This BoF is a quiet place to do some stress free, science related, arts and crafts.
Hack the Universe and Capture the Flag!
Saskia and Eric have also put together a Capture the Flag (CTF) event for GCC2017. Hack the Universe is a completion where you attempt to hack into a Galaxy instance and learn about Galaxy, Galaxy Administration, vulnerabilities, and good security practices. The CTF competition will start on the first night of GCC2017, and run for a week after.
Anyone! We strongly recommend that you form teams involving at least one bioinformatician or biologist, and one computer person. Get out there, make new friends!
Interested? See https://ctf.galaxians.org/ for more and visit Saskia and Eric at the Arts & Crafts BoF.And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
The aim for this BoF is a very concrete one: to initiate a collaborative project towards a manuscript on best practices for reproducibility and transparency when doing computational analysis involving large numbers of datasets. Anyone curious or potentially interested in contributing to such a manuscript is encouraged to join!
Ensuring that omics analyses are truly transparent and reproducible is always a challenge, and even more so when analyses involve large numbers of datasets. The exact scope of a manuscript on this topic and organization of the work is open for discussion at the BoF session. How to make transparent analyses using Galaxy and dataset collections will clearly be central. How much focus to put on comparison with alternative interfaces is open for discussion. As for organization of the work, collaborative editing on an open document might serve as a main pillar, but this would also be a point of discussion at the BoF.
And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
How can we support Galaxy community members whose (main) contribution is not coding? I.e. they might be doing training (although GTN is generally covering these people), being "ambassadors" for Galaxy, doing user support. Some questions:
➔ Slides (web)
Authors
Björn Grüning, Uni-Freiburg (ALU)
Bérénice Batut, Uni-Freiburg (ALU)
Marius Van den Beek, Institut Curie
Daniel Blankenberg, Penn State University
Dave Bouvier, Penn State University
Anthony Bretaudeau, Institut de Génétique, Environnement et Protection des Plantes (IGEPP)
Nate Coraor, Penn State University
John Chilton, Penn State University
Peter Cock, The James Hutton Institute
Saskia Hiltemann, Erasmus University Rotterdam , Rotterdam
Youri Hoogstrate, Erasmus MC
James Johnson, University of Minnesota
Greg von Kuster, Penn State University
Lance Parsons, Princeton University
Eric Rasche, Center for Phage Technology, Texas A&M University (CPT)
Nicola Soranzo, Earlham Institute
Abstract
Galaxy tools are a first class object in Galaxy and virtually any tool that can be run from the command line or has some kind of API can be integrated into Galaxy.
The ability to seamlessly and easily integrate tools into Galaxy spawned a large community of Galaxy tool developers and a large suite of programs and services around Galaxy tool development. The Galaxy Tool Shed (https://usegalaxy.org/toolshed) is an App Store for Galaxy tools, Cargo Port is a package mirror to make software location sustainable and Planemo is the Galaxy Tool development kit, to name a few.
The Intergalactic Utilities Commission (IUC) was founded in 2012 as a community group to define standards, to develop best-practices, to maintain tools and the services we have built as a community. The IUC is actively supporting all Galaxy tool developers and is maintaining high-quality training material to provide training events worldwide.
In this talk we will highlight the achievements of the IUC community since the last year. We will update the community on our on-going effort in shifting our dependency stack to BioConda and BioContainers, 3 new community members and our regular Contribution fests that we organised since last GCC. Moreover, we would like to talk about our plans for next year and invite everyone to join our ranks to shape the future of the Galaxy tool community.
➔ Slides
Authors
Abstract
Metabolomics data analysis is a complex, multistep process, which is constantly evolving with the development of new analytical technologies, mathematical methods, and bioinformatics tools and databases. The Workflow4Metabolomics (W4M) project aim to develop full LC/MS, GC/MS, FIA/MS and NMR pipelines using Galaxy framework for data analysis including preprocessing, normalization, quality control, statistical analysis and annotation steps.
The W4M Core Team is fully involved building tools for the metabolomics community and its does particular efforts in tool quality and for disseminate their work. The tools developed strive to adopt recommendations implemented by the Galaxy team and the IUC Best Practices. Wrappers are openly available on GitHub and automatically tested using Planemo on the TravisCI platform. The dependencies are managed with Conda packages. Eventually, wrappers are distributed on the ToolShed. Thanks to the Galaxy community, we are allowed to provide some ready-for-use Docker Galaxy flavor and Vagrant VM using Ansible roles.
Meanwhile, the Workflow4Metabolomics Galaxy infrastructure (workflow4metabolomics.org) provides an on-line, user-friendly and high-performance environment to build, run and share metabolomics workflows for LC-MS, GC-MS, FIA/MS and NMR technologies. In parallel of providing expert and reference workflows, the W4M infrastructure is totally open to community contributions. This contribution should take different forms: i) as complete integration in the W4M ecosystem, shared input/output formats and support involvement or ii) using W4M portal and infrastructure as a showcase for external developers, proposing a functional version of a tool.
➔ Slides
Authors
Abstract
South Green is a bioinformatics platform applied to the genomic resource analysis of southern and Mediterranean plants. The South Green web portal (http://www.southgreen.fr/) provides access to a large panel of bioinformatics resources including its own Galaxy instance which support a large community of users in Montpellier, France and beyond.
In addition to the generic tools provided with the standard installation of Galaxy, the South Green Galaxy instance (http://galaxy.southgreen.fr/galaxy/) contains a large collection of exclusive tools, Galaxy wrappers and workflows designed for analyses applied to plant genomes.
It actually comprises more than 100 Galaxy wrappers, 9 pre-configured workflows designed for recurrent analyses such as NGS mapping/cleaning, RNAseq, SNP calling and filtering, Genome-Wide Association Study, basic population genetics, structural variations, metagenomics and phylogenetics. We also developed innovative solution to graphically display outputs of each workflows.
Home-made Galaxy wrappers have been deposited in our local/central toolshed (http://galaxy.southgreen.fr/toolshed/) or in github (https://github.com/SouthGreenPlatform/galaxy-wrappers). Galaxy is extensively used to conduct capacity building activities. It is currently connected to HPC but we are also initiating use of Docker to disseminate some workflows in the IFB cloud, thus facilitating training activities worldwide.
➔ Slides
Authors
Abstract
Phylogenetic analyses aim at reconstructing the evolutionary history of biological objects from molecules to species, and populations. Faced with the number of programs available and the difficulty for scientists to combine them, we designed in 2008 Phylogeny.fr, which has quickly become one of the most used platforms to perform phylogenetic analyses. However, due to the diversity of analyses performed (phylogeny.fr can be simultaneously used by hundreds of students or can be used through batch scripts), the number of analyses performed (50,000 per month), and the number of new phylogenetic tools available, the need to refactor Phylogeny.fr has become crucial.
In this talk, we introduce NGPhylogeny.fr (Next Generation Phylogeny.fr), developed within a Python Web framework (Django), in which we have refactored Phylogeny.fr and made it distributable by designing a scalable environment, an easy-to-use web interface based on a series of modular Galaxy workflows able to perform a very large variety of phylogenetic analyses. Moreover, we have performed a reproducibility study, to systematically compare the results obtained by the Galaxy-based NGPhylogeny.fr workflow and the original phylogeny.fr, using real datasets.
Our talk will highlight how (i) NGPhylogeny.fr can be used in a functional genomics context to quickly analyze large sets of protein superfamilies, (ii) in-depth studies can be quickly launched and (iii) NGPhylogeny.fr can be installed on a wide variety of configurations. On a more generic aspect, we will underline the benefit of designing a coupled Django-interface / workflow-Galaxy environment for end-users.
Setting up a galaxy instance requires a lot of effort. Solutions like docker-galaxy-stable and the ephemeris installer have reduced this effort considerably. Unfortunately the amount of configuration is still immense.
Galaxy-docker-ansible is a collection of ansible scripts that simplifies the installation of galaxy on a server to running just one command for installation and one command for provisioning. Configuration is host and group based which allows for a multitude of servers to be set up with just one command.
The project can be found at https://github.com/LUMC/galaxy-docker-ansible. It builds upon work done at https://github.com/bgruening/docker-galaxy-stable and https://github.com/galaxyproject/ephemeris to make installation of a Galaxy server very simple.
French community has always been active in tools development. This resulted to a large heterogeneity of wrappers, some of them not having benefited from recent advances in terms of good practices. The goal of the project “Galaxy For Life Science” (GFLS) is to bring to several french communities the recent good practices established by the galaxy community developers. The raw material of the project is a set of wrappers and workflows grouped in several use cases: plant science, statistical analysis, livestocks, bacterias.
In this poster we want to present 2 use cases implemented in the project : statistical analysis and plant science. These use cases are interesting because it is possible for the first one to fully apply a set of good practices while we choose to make concessions for the second one. A second interesting result is that we are able to identify a generic and reproducible method of work for any initiative to make tools available under Galaxy, matching with the community's way of working.
The second script - galaxy-fuse - is a file system creation script that makes the Galaxy histories of a particular user available on the local file system in a matching directory structure. It uses the Galaxy user's api and bio-blend access to the Galaxy database to name the files.
This poster describes the rationale, development and distribution of the two tools.
This constitutes a demonstration of GGA for rapid, flexible and reproducible deployment of information systems for non-model organisms, where Galaxy is used as an orchestrator for data management.
In this poster we will expose the architecture of this new system as used on BIPAA (10 genomes already online). We will also highlight the specific developments that were done, and present the planned features for the coming months.
All developments are available under a free license (MIT or AGPL) and were contributed to the GGA or GMOD GitHub repositories.
The Query Tabular tool provides default names for tables: t1, t2, etc. and columns: c1, c2, etc., but a user can specify meaningful names for tables and columns. When specifying names for columns, a user can choose to load only those columns that are given names.
Regex functions are added to sqlite connections so that re.search, re.match, and re.replace functions are available for use in the SQL query.
Line filters can be applied while reading tabular input files to include, exclude, or modify lines before entering the values as rows in the database table. A column replace line filter can use a regex to change a date value to the SQLite recognized format. A normalize filter can convert list fields in the input to first normal form with an individual list item per row.
Many points can be related ; macro-ecology is too changing from a case to another. Regular workflow do not exist, as in omics data analysis. Furthermore after data pre-processing steps, you need to draw quick models to optimize your workflow, so it can't be automatized easily.
In the scope of the “65 Millions observers” a French national project, we are directly confronted to these issues. We have to implement a national collaborative web platform dedicated to macro-ecological data access and analysis. We aim to facilitate and enhance participation to citizen science projects.
We assume that Galaxy is adapted to our problem. Recent evolution of Galaxy allowing us to dream, especially “interactive environment” functionality. Moreover, macro-ecologists are not really working directly on databases but from database extraction files (csv, tabular flat files), and some bioinformaticians seem to be ready to work as ecoinformaticians. This paves the way to the emergence of a Galaxy-E universe, with dedicated tools and communities.
The GalaxyCat is an online catalog that lists all the tools available on various Galaxy instances and thus allows through a simple web interface to quickly find on which instances a tool is usable.
The GalaxyCat package includes all scripts to automatically feed the catalog database through the command line and the web application interface.
http://galaxycat.france-bioinformatique.frBoF Live Notes Document
GalaxyAdmins is a group of people that are responsible for administering Galaxy instances. We meet online and at events like GCC2017, where a lot of us happen to be.
GCC2017 will be the fifth in-person GalaxyAdmins meetup. Previous GalaxyAdmins BoFs were very well attended and have resulted in several action items, many of which have since been implemented.
This meetup will discuss plans for the coming year, GalaxyAdmins leadership, and whatever else participants want to talk about.
And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
Slides
Discuss genome annotation pipelines in Galaxy. Integration of tools, current / future offerings, needs of biologists. (https://galaxy-genome-annotation.github.io/).
The RNA-Workbench offers a wide range of tools covering classic RNA-bioinformatics as well as RNA-seq fields. Predefined workflows for the annotation of non-coding RNAs or identification of differentially expressed genes are subsets of over 50 included tools from the categories RNA structure analysis, RNA alignment, RNA annotation, RNA-protein interaction, ribosome profiling, RNA-seq analysis and RNA target prediction. RNA specific visualisation solutions for dot-bracket plots and secondary structures are part of the workbench.
In contrast to pre-existing solutions, our community driven approach allowes us to include classic RNA-bioinformatics tools often with the direct support of the tool-authors. These contributions enable us to provide excellent documentation, training material and interactive tours demonstrating the functionality of the workbench.
Building on the Galaxy framework the workbench offers sophisticated analyses to users without command line knowledge, while emphasising reproducibility, customization and effortless scale up to larger infrastructures. The workbench is implemented as Galaxy Docker flavour and therefore easily extendable by additional tools, workflows, tours or training data, that can be installed from the Galaxy ToolShed. The workbench will be further improved and maintained in an ongoing community effort.
➔ Slides
Authors
Michael Sauria, Department of Biology, Johns Hopkins University
James Taylor, Department of Biology, Johns Hopkins University
Abstract
Chromatin architecture is recognized as an integral component of cellular differentiation, gene regulation, and epigenetic homeostasis. In recent years there has been a acceleration in the production of chromatin interaction data, primarily Hi-C data. Full utilization of the data produced from Hi-C experiments has been challenging for many researchers because of computational limitations, bioinformatics knowledge, and a lack of user-friendly tools. To address these challenges, we have compiled a comprehensive database of Hi-C data, supported by Galaxy, and integrated with analysis and visualization tools allowing truly open access to more than 1,500 Hi-C datasets.
To best enable use of these data, we have created a uniform processing and analysis pipeline, executed using CWL workflows and run in containerized environments. We also have developed quality metrics for Hi-C samples to help evaluate sample quality and replicate reproducibility. Each processing step is made available rather than simply endpoint data, including quality metrics for each phase. Data were all processed using HiFive, a Hi-C analysis suite available on Galaxy main.
We have also created a 2-dimensional genome browser connected to Trackster for easy data exploration within Galaxy. Samples can be directly loaded from the data library into Trackster-2D for visual assessment, comparison to one-dimensional genomic annotations, or for Hi-C inter-dataset comparison. In order to support fast browsing and compact of these sparse datasets, we also have developed a multi-resolution 2-dimensional binary tree file format, allowing easy access to any level of resolution and the random access to data necessary for real-time browsing.
Through the perspective of a government research environment, we will demonstrate how we transitioned from a biologist request for new tools/workflows to development and deployment process into our local sandbox Galaxy instance. There the workflow is tested and refined until proven to be useful. Then it continues its transition into our IRIDA platform as an actionable pipeline used in real-time for infectious disease surveillance and response using high throughput sequencing data. This overview will also highlight how we can maximize the flexibility of Galaxy to easily evolve with this rapidly evolving field.
Arts & Crafts BoF
GCC sure can be overwhelming sometimes! This BoF is a quiet place to do some stress free, science related, arts and crafts.
Hack the Universe and Capture the Flag!
Saskia and Eric have also put together a Capture the Flag (CTF) event for GCC2017. Hack the Universe is a completion where you attempt to hack into a Galaxy instance and learn about Galaxy, Galaxy Administration, vulnerabilities, and good security practices. The CTF competition will start on the first night of GCC2017, and run for a week after.
Anyone! We strongly recommend that you form teams involving at least one bioinformatician or biologist, and one computer person. Get out there, make new friends!
Interested? See https://ctf.galaxians.org/ for more and visit Saskia and Eric at the Arts & Crafts BoF.And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
BoF Live Discussion Document
Mass spectrometry-based proteomic analysis is evolving along with new and emerging research disciplines. For example, advances in genomics and transcriptomics have offered opportunities in multi-omics exploration of biological systems. In particular, conversion of RNASeq data into protein FASTA format greatly aids the field of proteogenomics. Moreover, functional microbiome research is being greatly helped by newer developments in metaproteomics research. There is a need for improved Galaxy workflows for these multi-omics research areas and emerging methods such as data-independent analysis. The GCC offers a great forum for community to come together and discuss challenges, opportunities and possibilities.
During this buzzword heavy BoF, we hope to gather deployers interested in a broad range of topics from big picture choices (such as DevOps, Containers, and Clouds) to specific technologies (for instance Ansible, Condor, and Kubernetes) and specific Galaxy communities (including docker-galaxy-stable, CloudMan, and GalaxyKickStart). This last year has seen significant sharing and splintering of resources for containerized deployments - we hope this BoF can serve as a venue to coordinate plans for the next year to maximize reuse we can accomplish going forward.
Possible Topics Include:
Do you work with a sequencing core facility? How are you using galaxy (QC, sample tracking, etc)? What missing things do you need?
We will gather to discuss Trackster, Galaxy's very own genome browser. We'll discuss the benefits of Trackster and start an ongoing discussion on what features could be added to further increase the utility of this valuable feature.
➔ Slides
Authors
Researchers on this project are metaproteomics informatics developers and users who communicate via mailing list, gitter, and GitHub to jointly define and develop the most useful tools to enable metaproteomics analysis. In December 2016, the researchers held an online Metaproteomics Contribution Fest to develop and implement metaproteomics-focused software tools. Prioritization of tasks resulted in identification of the following tool categories within the complex metaproteomics analytical pipeline:
1) Database generation tools using results from the 16S rRNA data/approach to define taxonomy lists and shotgun metagenome sequencing data (May et al J Proteome Res. 15:2697).
2) Peptide spectral matching tools such as SearchGUI /PeptideShaker and post-processing by MetaProteomeAnalyzer.
3) Taxonomic classification and functional characterization using UniPept and packaging and testing of DIAMOND to generate outputs for MEGAN analysis.
These developed tools are being made accessible through the Galaxy Toolshed and a publicly available metaproteomics gateway at NCGAS. Vetted tools and workflows are accessible and will be updated and tested on this Jetstream cloud computing infrastructure.
Our tools, workflows and resources are being promoted via publications, presentations and training workshops at scientific conferences.
➔ Slides (doi: 10.7490/f1000research.1114475.1)
Authors
In particular, the addition of Galaxy Webhooks -- a plug-in system designed for customization of individual Galaxy instances -- is one of the prominent developments enabled by the new architecture. Webhooks are configurable and often community-contributed pieces of code altering the UI and providing additional features. The primary benefit to the community will be the ability to personalize Galaxy instances tailoring them to the needs of individual groups.
Notable examples of webhooks:
We believe webhooks represent a logical result of project's sustained focus on building robust, reliable framework for integration of tools and plugins.
➔ Slides
Authors
Manabu ISHII, RIKEN ACCC Bioinformatics Research Unit
Matsushima Akihiro, RIKEN ACCC Bioinformatics Research Unit
Mika Yoshimura, RIKEN ACCC Bioinformatics Research Unit
Hiroki Danno, RIKEN ACCC Bioinformatics Research Unit
Itoshi NIKAIDO, RIKEN ACCC Bioinformatics Research Unit
Abstract
With the progress of DNA sequencing methods, it continues to increase a quantity of data and type of data to be produced. To analyze such data, we need massive computer resources and setup of various software and databases. Many data-analysis techniques and databases are constantly developed. Accordingly, it takes plenty of time and works to construct an analysis environment, such as procurement of computers, installation of software, and construction of data analysis pipelines.
To cope with both reproducibility and flexibility of the environment, we develop a Docker image with-in Galaxy, job scheduler, and data-analysis pipeline. We also construct a deployment system of the Docker image on a public cloud system such as Microsoft Azure. The procedure of deployment is implemented by source codes using Chef (Infrastructure as Code). The cloud computer system automatically expanded and destroyed computing nodes depending on a demand of amount of jobs.
In this presentation, we will discuss the comparison the setting time of environment, cost, reproducibility of the pipeline, calculation speed between on-premise and public cloud system. We also demonstrate that the system is constructed from a web browser conveniently. Using this system, we have operated an analysis environment for thousands of single-cell RNA-sequencing in our laboratory. The system including data-analysis pipeline has been tested its idempotence with continuous integration / continuous delivery way.
Since 2013, scientists from the Institut Pasteur have access to a galaxy instance where they can ask to use any tool available on the Institute cluster. Today, Galaxy@Pasteur instance is public; more than 280 tools are available and usable with approximately 3000 jobs per month for the 695 registered users as well as 1500 jobs per month launched by anonymous users.
However, in some cases, the complexity and specificities of the required applications call for the development of custom web interface.
For the last 4 years, to answer this problematic, several web services were created around this Galaxy instance:
MetaGenSense (Correia et al. 2015) (https://metagensense.web.pasteur.fr), an online LIMS which launches workflows to analyse metagenomic data.
NGphylogeny.fr an updated of the well known phylogeny.fr website (GCC17 Talk by Damien Correia: Performing Next Generation Phylogenetic Analyses with NGPhylogeny.fr)
Booster, (http://booster.c3bi.pasteur.fr/)
Shaman, (Quereda et al. 2016) a SHiny application for Metagenomic ANalysis (http://shaman.c3bi.pasteur.fr/, in progress)
For these web services, Galaxy is used as an execution engine to launch a tool or a workflow on the Institut Pasteur cluster. Web services communicate with Galaxy using directly the Galaxy API or via the Bioblend library (Sloggett et al. 2013).
This approach allows to manage only one server opened to external users; giving an access to the Pasteur ressources (storage and cluster power). Moreover use Galaxy as an execution engine decreases the development effort for web services.
ToolDog (Tool DescriptiOn Generator) is the main component of the Workbench Integration Enabler service of the ELIXIR bio.tools registry. The goal of this tool is to guide the integration of tools into workbench environments. In order to do that, ToolDog is divided in two main parts: the first part analyses the source code of the bioinformatics software with language dedicated tools and generates a Galaxy XML or CWL tool description. Then, the second part is dedicated to the annotation of the generated tool description using metadata provided by bio.tools. This annotator can also be used on its own to enrich existing tool descriptions with missing metadata such as the recently developed EDAM annotation.
Apollo is a web-based manual genome annotation tool built on top of the powerful JBrowse genome viewer that can be scaled to multiple genome projects and annotators. Apollo allows for collaborative, real-time editing, similar to Google Docs, and can be integrated within annotation workflows via a full suite of web-services (http://icebox.lbl.gov/Apollo2/WebServices/). To this end, it has been integrated within Docker (https://github.com/GMOD/docker-apollo) as well as Galaxy (https://github.com/GMOD/docker-compose-galaxy-annotation) and as part of a larger consortium of annotation projects (https://github.com/galaxy-genome-annotation/).
Current ongoing projects include support for variant curation and visualization of predicted effects as well as coordinate transformation. Coordinate transformation will allow collapsing of intra- (introns) and inter-genic (space between annotations) to focus attention on data-rich regions. Additionally, it will allow assembly of virtual scaffolds to allow annotation over poorer assemblies.
Find out more: http://genomearchitect.org.
The process of evaluating the candidacy of potential candidate genes involves numerous challenges in terms of data acquisition, integration, mining and visualisation. The KnetMiner suite of tools aim to facilitate gene discovery and enable biologists and breeders to quickly identify genes, biological processes and pathways influencing complex, polygenic traits. KnetMiner features a data integration platform (www.ondex.org) to integrate and unify information from varied data sources, be it structured or unstructured data, such as gene function annotations, protein-protein interaction data, biochemical pathways, gene expression data, citations in scientific literature and homology information from related organisms, to develop heterogeneous genome-scale knowledge networks.
The KnetMiner web application enables users to interrogate these GSKNs with gene lists, QTL information and trait-related keywords and quickly identify potential candidate genes and networks of associated entities to aid candidate gene discovery and hypothesis generation. This demo will showcase the KnetMiner instance for Arabidopsis. We will query the Arabidopsis knowledge network, which contains several datasets including public GWAS and protein-protein interaction data, with trait-related keywords and explore the ranked candidate genes in Gene View. We will then explore and identify overlapping gene, QTL, SNP and GWAS data in Genomaps and generate gene knowledge networks that can be interactively explore in KnetMaps with a view to identify candidate genes involved in plausible pathways.
KnetMiner is used by different labs at Rothamsted Research and elsewhere to accelerate gene discovery pipelines for crop breeding and crop improvement. While we have so far mostly concentrated on crop species, the approaches we have taken are generic and GSKNs and KnetMiner servers can readily be built for other species as well. KnetMiner is open source and available at http://knetminer.rothamsted.ac.uk.
International consortiums focused on cancer genomes have been using large-scale molecular techniques, generating a huge amount of data, publicly available, allowing new approaches on data analysis. TCGA has more than 2.5 petabytes of data of 11,000 tumor samples. The integration of many large-scale analysis allows a very complete view of cancer molecular basis. However, we need integrative bioinformatics analysis, done by experienced programmers on genomics. In the other hand, we have computational languages representation in an abstract manner by using workflows.
Barretos Cancer Hospital took part on the consortium as a tissue sample. We are willing to analyze the TCGA data, comparing to the Brazilian population and for this we chose Galaxy because it is a free platform that allows access by the web using reproductive workflows, storying all the provenance data, solving the reproducibility problem. Can also be installed locally in a computer or a server.
For this project we want to focus on analyzing DNA sequencing, methylation and gene expression, possibly identify biomarkers that can characterize people in risk for cervical cancer as well as compare molecular data with clinical-pathological characteristics. The protocol is approved by our ethical committee.
ToolDog (Tool DescriptiOn Generator) is the main component of the Workbench Integration Enabler service of the ELIXIR bio.tools registry. The goal of this tool is to guide the integration of tools into workbench environments. In order to do that, ToolDog is divided in two main parts: the first part analyses the source code of the bioinformatics software with language dedicated tools and generates a Galaxy XML or CWL tool description. Then, the second part is dedicated to the annotation of the generated tool description using metadata provided by bio.tools. This annotator can also be used on its own to enrich existing tool descriptions with missing metadata such as the recently developed EDAM annotation.
topics will be explored:
We will go into a bit of detail about our implementation as well as the problems that pushed us to explore solutions.