Genomics

Listed below are tools developed by the iDASH team for genomics.

Click on the name in the left-hand column to download or access the tool.

Name Description

absCNseq

absCNseq is an R package to infer tumor purity and ploidy from next generation sequencing data.

 

Platform: Linux

 

Datasets that can be used: Example datasets are included in the tar file.

 

Citation: Bao L, Pu M, Messer K."AbsCN-seq: a statistical method to estimate tumor purity, ploidy and absolute copy numbers from next generation sequencing data." Bioinformatics. 2014 Jan 13. PMID: 24389661

CUDA-miRanda

 

CUDA-miRanda is a fast microRNA target identification algorithm that aligns short nucleotide sequences ( nucleotides) against longer reference sequences (e.g., 20k nucleotides). CUDA-miRanda takes advantage of massively parallel computing on GPUs using NVIDIA’s Compute Unified Device Architecture (CUDA). It has the ability to report multiple alignments and the corresponding traceback sequences for any given query-reference pair with up to 166x speedup compared to regular CPUs.

 

Platform: Linux / Windows

 

Datasets that can be used: Target Scan database

 

Citation: Wang S, Kim J, Jiang X, Ohno-Machado L, "GAMUT: GPU Accelerated MicroRNA analysis to Uncover Target genes through CUDA-miRanda." BMC Med Genomics. 2014;7 Suppl 1:S9. PMID: 25077821

IDEPI (IDentify EPItopes)

IDEPI is a domain-specific and extensible software library for supervised learning of models that relate genotype to phenotype for HIV-1 and other organisms. IDEPI makes use of open source libraries for machine learning, sequence alignment, sequence manipulation, and parallelization, and provides a programming interface to allow the users to engineer sequence features and select machine learning algorithms appropriate for their application.

 

Platform: Linux

 

Datasets that can be used: Example datasets can be found here

 

Citation: N Lance Hepler, Konrad Scheffler, Steven Weaver, Ben Murrell, Douglas D Richman, Dennis R Burton, Pascal Poignard, Davey M Smith, Sergei L Kosakovsky Pond (2014) “IDEPI: rapid prediction of HIV-1 antibody epitopes and other phenotypic features from sequence data using a flexible machine learning platform.” PLoS Comput Biol. 2014 Sep 25;10(9):e1003842. PMID: 25254639 PMCID: PMC4177671

MAAMD: A Workflow to Standardize Meta-Analyses and Comparison of Affymetrix Microarray Data

 

MAAMD is a fast, automated workflow that enables researchers with minimal programming skills to standardize their affymetrix microarray data analysis and make inter-dataset comparisons. MAAMD is embedded with two freely available stand-alone software tools, R and AltAnalyze, as well as Bioconductor packages such as GEOquery and arrayQualityMetrics. The inputs of MAAMD are user-editable csv files, which contain sample information and parameters describing the locations of input files and required tools. The output of MAAMD is a structured folder containing the microarray data and analyzed results.

 

Platform: Linux / Macintosh

 

Datasets that can be used: Any Affymetrix platform microarray dataset

REPREVER: REPeat REsolVER - find and reconstruct extra copies given copy number gain regions

 

Reprever aims to identify the insertion breakpoints where extra duplicons are inserted into the donor genome and the actual sequence of the duplicon. Reprever resolves ambiguous mapping signatures from existing homologs, repetitive elements and sequencing errors to identify breakpoint. At each breakpoint, Reprever reconstructs the inserted sequence using profile hidden Markov model (PHMM)-based guided assembly. This genomic sequence duplication is an important mechanism for genome evolution, often resulting in large sequence variations with implications for disease progression.

 

Platform: Linux / Macintosh

 

Datasets that can be used: BAM files with copy number variation information

 

Tutorial: http://sourceforge.net/p/reprever/wiki/Home/

 

Citation: Kim, S., Medvedev, P., Paton, T.A., and Bafna, V. “REPREVER: Resolving low-copy duplicated sequences using template driven assembly.” Nucleic Acids Res. 2013 Jul;41(12):e128. PMID: 23658221. PMCID: PMC3695505

VIRMID (VIRrtual MIcroDissection for SNP calling)

 

VIRMID is a Java based variant caller designed for disease-control matched samples. Virmid is also specialized for identifying potential within individual contamination where the disease sample cannot be purified enough. While the SNP calling rate is severely compromised with this heterogeneity, Virmid can uncover SNPs with low allele frequency by considering the level of contamination (alpha). The important features of Virmid are a) estimation of accurate proporation of control sample in a (mixed) disease sample and b)  improved SNP and somatic mutation calling with regard to the estimated proportion.

 

Platform: Linux / Macintosh

 

Datasets that can be used: Matched pair of cancer/normal sequence alignment file (BAM)

 

Tutorial: http://sourceforge.net/p/virmid/wiki/Home/

 

Citation: Kim, S., Jeong, K., Bhutani, K., Lee, J., Patel, A. Scott, E., Nam, H., Lee, H. Gleeson, J. Bafna, V. “Virmid: accurate detection of somatic mutations with sample impurity inference.” Genome Biol., 2013 Aug 14(8):R90,. PMID: 23987214. PMCID: PMC4054681

WESSIM: Whole Exome Sequencing SIMulator using in silico exome capture

 

WESSIM is a simulator for a targeted resequencing as generally known as exome sequencing. Wessim basically generates a set of artificial DNA fragments for next generation sequencing (NGS) read simulation. In the targeted resequencing, we constraint the genomic regions that are used to generated DNA fragments to be only a part of the entire genome; they are usually exons and/or a few introns and untranslated regions (UTRs).

 

Platform: Linux / Macintosh / Windows

 

Datasets that can be used: Genome sequence (fasta) with exome capture library information

 

Tutorial: http://sak042.github.io/Wessim/

 

Citation: Kim, S., Jeong, K., Bafna, V. “Wessim: a whole-exome sequencing simulator based on in silico exome capture.” Bioinformatics. 2013 Apr 15;29(8):1076-7. PMID: 23413434. PMCID: PMC3624799

Whole Genome RVista (WGRV)

 

 

 

 

 

This comparative genomic tool enables the user to query a long list of genes (e.g. set of coregulated genes from a RNA-sequencing experiment, or set of pathway related genes) for statistical enrichment of evolutionarily conserved (between two distantly related species) transcription factor binding sites within a defined window (up to 5kb) upstream of the transcriptional start site. 

This tool has a number of features that make it innovative. It has been robustly tested in both published (Zambon AC et al  PNAS 2005 102: 8561-6, Ling H et al Circ Res 2013 112(6): 935-44) and unpublished datasets. There are a number of features that indicate that this tool performs better than other existing tools designed for similar methodology.  

 

Platform: Web service

 

Datasets that can be used: Example datasets can be found here

 

Tutorial: http://genome.lbl.gov/cgi-bin/WGRVistaInputCommon.pl

 

Citation:  Zambon AC et al PNAS 2005 102: 8561-6, Ling H et al Circ Res 2013 112(6): 935-44
Tool features, test datasets and user manual:  Dubchak I et al. Bioinformatics 2013 29(16): 2059-61