Using Genome Query Language to uncover genetic variation.

TitleUsing Genome Query Language to uncover genetic variation.
Publication TypeJournal Article
Year of Publication2013
AuthorsKozanitis, C, Heiberg, A, Varghese, G, Bafna, V
JournalBioinformatics
Date Published2013 Jun 10
ISSN1367-4811
iDASH CategoryGenomics
AbstractMOTIVATION: With high-throughput DNA sequencing costs dropping <$1000 for human genomes, data storage, retrieval and analysis are the major bottlenecks in biological studies. To address the large-data challenges, we advocate a clean separation between the evidence collection and the inference in variant calling. We define and implement a Genome Query Language (GQL) that allows for the rapid collection of evidence needed for calling variants. RESULTS: We provide a number of cases to showcase the use of GQL for complex evidence collection, such as the evidence for large structural variations. Specifically, typical GQL queries can be written in 5-10 lines of high-level code and search large datasets (100 GB) in minutes. We also demonstrate its complementarity with other variant calling tools. Popular variant calling tools can achieve one order of magnitude speed-up by using GQL to retrieve evidence. Finally, we show how GQL can be used to query and compare multiple datasets. By separating the evidence and inference for variant calling, it frees all variant detection tools from the data intensive evidence collection and focuses on statistical inference. AVAILABILITY: GQL can be downloaded from http://cseweb.ucsd.edu/~ckozanit/gql. CONTACT: ckozanit@ucsd.edu or vbafna@cs.ucsd.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
DOI10.1093/bioinformatics/btt250
Alternate JournalBioinformatics
PubMed ID23751181