➔ Poster
AuthorsJochen Bick 1, Susanne Ulbrich 1, Stefan Bauersachs 1
1 : ETH Zurich Animal Physiology
Abstract
The analysis of small RNA-Seq data is more difficult compared to standard RNA-Seq and tools were mainly developed for mouse or human datasets. Non-model organisms such as the pig having a rather poor annotation are more challenging to analyze because the number of known miRNAs is low compared to human. In humans, a great variety of ncRNAs including miRNAs are annotated, which can be used as orthologue information for sequence annotation in other mammalian species. This can help to increase the number of annotated miRNA sequences and their various isoforms (isomiRs). This study presents a data analysis pipeline to filter, annotate, and detect miRNAs and their different isomiRs. The workflow is mainly based on standard Galaxy tools and in-house-scripts. The pipeline is divided in different analysis steps to check for quality and clipping the adapter-sequence which is also used in standard RNA-Seq data analysis. Afterwards all sequences were collapsed to unique sequences and the corresponding read counts. These sequences were mapped using BLASTn-short against a collection of databases containing sequences from miRBase (precursor and canonical mature miRNAs), sequences from NCBI and Ensembl, including ncRNAs and protein-coding transcripts, as well as tRNAs and piRNA cluster sequences. This pipeline was also compared to miRDeep2 to see the differences in mapping of total number of sequences and accuracy of each detected isomiRs. The comparison showed the benefit of mapping all obtained sequences also to rRNAs, tRNAs, and other ncRNAs to identify and eliminate false positives present in miRBase and in the miRDeep2-results.