DBP Tools

Listed below are tools developed by the iDASH team to enable analyses conducted within iDASH’s Driving Biological Projects.

Click on the name in the left-hand column to download or access the tool.

Name Description

Genetic Query Language (GQL)

 

 

 

 

Genome Query Language (GQL) allows for the efficient querying of genomic fragment data to uncover evidence for variation in the sampled genomes. Note that our tool does not replace other variant calling tools but is complementary to existing efforts. It focuses on the collection of evidence that all inference tools can use to make custom inference. By providing a simple interface to extract the required evidence from the raw data stored in the cloud, GQL can free callers from the need to handle large data efficiently. By speeding up and simplifying existing variant calling tools, larger speedups are possible through a cloud-based parallel GQL implementation. Furthermore, analysts can examine and visualize the evidence for each variant, independent of the tool used to identify the variant.

 

Platform: Written in C++ with some additional python modules

 

Datasets that can be used: Any human dataset in BAM format given that no reads appear with multiple alignments. GQL was tested with datasets produced from the latest Illumina technology.

 

Tutorial: Instructions, syntax, and semantics

 

Citation: Kozanitis C, Bafna V, and Varghese G. “Querying the Genome with GQL,” NCBC Showcase 2012. (poster)

MAGI: Microrna Analysis in GPU Infrastructure

 

 

A web service for MicroRNA Analysis in a GPU environment.

 

Platform: Ubuntu

 

Datasets that can be used: Any miRNA-Seq FASTQ files 

 

Tutorial: http://elgar.ucsd.edu/software/magi/

 

Citation: Kim J, Levy E, Ferbrache A, et al. "MAGI: a Node.js web service for fast microRNA-Seq analysis in a GPU infrastructure," Bioinformatics, 2014 Oct;30(19):2826-7. doi: 10.1093/bioinformatics/btu377. Epub 2014 Jun 6.

Observational Cohort Event Analysis and Notification System (OCEANS)

 

 

OCEANS aims to develop novel statistical methods to detect adverse event signals in observational cohort data from electronic health records, clinical registries, and administrative databases. This toolbox was developed to enable rapid integration and dissemination of statistical modules to support retrospective and prospective static or automated real-time medical product surveillance.

 

Platform: Java and C#

 

Dataset that can be used: http://sourceforge.net/projects/oceans/files/OutcomePrediction.csv/download

 

Citation: Matheny M, Nookala L, Eden S, Govindarajulu U, Normand S-L, Cope R, Ohno-Machado L, and Resnic FS. “OCEANS: Observational Cohort Event Analysis and Notification System,” AMIA 2011 Annual Symposium. (poster)

SenSed: Sensing Sedentary (and physical activity behavior)

Note: Click on “Data” tab, then “Public” folder to download the .zip file.

 

This is an application that collects accelerometer and gyroscope data from the smart phone sensors to classify walking, running, and sitting activities.  The program uses a machine learning developed decision tree algorithm to process the raw sensor data.   The app has two screen views: an activity recognition feedback screen that is updated every minute, and a daily accumulated activity screen view.   The app also includes a basic goal-setting feature to encourage activity accumulation during the day.

 

Platform: Web service.

 

Citation: Norman GJ, Wu W, Ramirez ER, Dasgupta S, and Peterson C.  “Wireless Technology for Health Behavior Change Measurement & Intervention,” 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2012. (presentation)

WebGLORE 2.0

(web service version of GLORE: Grid binary LOgistic REgression)

 

 

 

WebGLORE is a webservice for biomedical researchers to build a global predictive logistic regression model without sharing data. The tool leverages a distributed Newton-Raphson algorithm and an easy-to-use interface to exchange aggregated statistics from participating institutions, which are less privacy sensitive compared to the raw data, to overcome the regulation barriers. The results are guaranteed to be accurate as if models are trained from combined raw data in a central repository. WebGLORE is the first-of-its-kind that enables iterative optimization procedures to be executed over the network in realtime. Meaningful use of WebGLORE can improve statistical power, speedup discovery, and make a difference to applications where sample size matters.

 

Platform: Web service

 

Datasets that can be used: tab-delimited flat file

 

Tutorial: http://dbmi-engine.ucsd.edu/webglore2/instructions.html

 

Citation: Jiang W, Li P, Wang S, Wu Y, Xue M, Ohno-Machado L, and Jiang X. "WebGLORE: a Web service for Grid LOgistic REgression." Bioinformatics 2013 Dec 15;29(24):3238-40. PMID: 24072732. PMCID: PMC3842761

Wu Y, Jiang X, Kim J, and Ohno-Machado L. “Grid binary Logistic Regression (GLORE): building shared models without sharing data,” Journal of the American Medical Informatics Association, 19(5):758-64, 2012. PMID: 22511014. PMCID: PMC3422844

Wang S, Jiang X, Cui L, Cheng S, and Ohno-Machado L. “EXpectation Propagation based LOgistic REgRession (EXPLORER): Distributed Privacy Preserving Online Model Learning,” J Biomed Inform. 2013 Jun;46(3):480-96. PMID: 23562651. PMCID: PMC3676314