GCC 2017 has ended
The 2017 Galaxy Community Conference (GCC2017) is being held in Montpellier, France, 26-30 June.  GCC2017 will include keynotes and accepted talks, poster sessions, demos, birds-of-a-feather meetups, exhibitors, and plenty of networking opportunities. There will also be three days of pre-conference activities, including hackathons and training. If you work in data-intensive biomedical research, there is no better place than GCC2017 to present your work and to learn from others.

The full printed program is also available.
Back To Schedule
Friday, June 30 • 15:20 - 16:35
P14: Du Novo: a simple, reference-free tool for turning duplex sequencing data into extremely accurate reads

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!



Nicholas Stoler 1*, Barbara Arbeithuber 2, Wilfried Guiblet 1, Marcia Su 3, Kateryna Makova 2, Anton Nekrutenko 3*
1 : Department of Biochemistry and Molecular Biology, Penn State University (BMB), University Park, PA 16802 -  United States
2 : Department of Biology, Penn State University, University Park, PA 16802 -  United States
4 : The Jackson Laboratory For Genomic Medicine (JAX), Farmington, CT 06032 -  United States
* : Corresponding author 

Next-generation technology has revolutionized sequencing in terms of the magnitude of data generated. However, the accuracy of the technology has not improved at anywhere near the rate of its throughput. For variant detection in diploid systems, the existing error rate is generally adequate. However, for detecting low-frequency variants in non-diploid systems like bacterial/viral populations, cancer genetics, somatic variation, and mitochondria/chloroplasts, the current error rates are prohibitively high. Single-molecule barcoding techniques now enable much higher accuracy, with duplex sequencing able to deliver per-base accuracies four orders of magnitude greater. The existing pipeline for processing duplex reads is based on aligning to a reference sequence, a restriction which introduces biases and prevents use in certain de novo applications. It also is sensitive to sequencing error in barcodes, which causes loss of valuable data. Here, we present Du Novo, a reference-free pipeline which can produce highly accurate reads and recover data by correcting errors in barcodes. Du Novo is based in Galaxy, allowing users to analyze their raw data with a simple graphical interface. Using simulations and published data previously analyzed with the existing pipeline, we show that Du Novo is able to identify variants at a frequency as low as 1 in 10,000. We show an application of the pipeline to reliably identify low-frequency variants in a non-diploid system, mitochondrial DNA. Du Novo is open source, and available at github.com/galaxyproject/dunovo.

avatar for Nick Stoler

Nick Stoler

Penn State University
Penn State University

Friday June 30, 2017 15:20 - 16:35 CEST
Le Corum Le Corum

Attendees (5)