➔ Poster (doi:
10.7490/f1000research.1114353.1)
AuthorsLose Thoba 1, Ziphozakhe Mashologu 1, Peter Van Heusden 1*, Alan Christoffels 1*
1 :
South African National Bioinformatics Institute (SANBI)
* : Corresponding author
Abstract
Graph database implementation such as Neo4J are increasingly used within the biomedical research space, eg. disease network underpinned by a protein and metabolic framework. We previously developed a Galaxy datatype and an interactive environment for storing and exploring Neo4j graph databases within Galaxy. Building on this work we generate a M. tuberculosis genomic database from multiple sources of annotation. This database follows a Chado-like schema with graph nodes named according to sequence ontology terms. Thus, making it natural to the researcher to make queries using the Cypher query language. NGS data is processed to yield novel variants that are stored in the database using a schema derived from the GA4GH variant model. Using the resultant Neo4j database and Cypher queries in the context Mycobacterium tuberculosis drug resistance, we able to prioritize SNPs for further experimental investigation of their association with multi-drug resistance in Mtb.