<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>19</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Bafna, Vineet</style></author><author><style face="normal" font="default" size="100%">Deutsch, Alin</style></author><author><style face="normal" font="default" size="100%">Heiberg, Andrew</style></author><author><style face="normal" font="default" size="100%">Kozanitis, Christos</style></author><author><style face="normal" font="default" size="100%">Ohno-Machado, Lucila</style></author><author><style face="normal" font="default" size="100%">Varghese, George</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Abstractions for Genomics: Or, which way to the Genomic Information Age?</style></title><secondary-title><style face="normal" font="default" size="100%">Communications of the ACM</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2013</style></year></dates><language><style face="normal" font="default" size="100%">eng</style></language></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Zakov, Shay</style></author><author><style face="normal" font="default" size="100%">Kinsella, Marcus</style></author><author><style face="normal" font="default" size="100%">Bafna, Vineet</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">An algorithmic approach for breakage-fusion-bridge detection in tumor genomes.</style></title><secondary-title><style face="normal" font="default" size="100%">Proc Natl Acad Sci U S A</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Proc. Natl. Acad. Sci. U.S.A.</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2013</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2013 Apr 2</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">110</style></volume><pages><style face="normal" font="default" size="100%">5546-51</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">Breakage-fusion-bridge (BFB) is a mechanism of genomic instability characterized by the joining and subsequent tearing apart of sister chromatids. When this process is repeated during multiple rounds of cell division, it leads to patterns of copy number increases of chromosomal segments as well as fold-back inversions where duplicated segments are arranged head-to-head. These structural variations can then drive tumorigenesis. BFB can be observed in progress using cytogenetic techniques, but generally BFB must be inferred from data such as microarrays or sequencing collected after BFB has ceased. Making correct inferences from this data is not straightforward, particularly given the complexity of some cancer genomes and BFB's ability to generate a wide range of rearrangement patterns. Here we present algorithms to aid the interpretation of evidence for BFB. We first pose the BFB count-vector problem: given a chromosome segmentation and segment copy numbers, decide whether BFB can yield a chromosome with the given segment counts. We present a linear time algorithm for the problem, in contrast to a previous exponential time algorithm. We then combine this algorithm with fold-back inversions to develop tests for BFB. We show that, contingent on assumptions about cancer genome evolution, count vectors and fold-back inversions are sufficient evidence for detecting BFB. We apply the presented techniques to paired-end sequencing data from pancreatic tumors and confirm a previous finding of BFB as well as identify a chromosomal region likely rearranged by BFB cycles, demonstrating the practicality of our approach.</style></abstract><issue><style face="normal" font="default" size="100%">14</style></issue></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Kozanitis, Christos</style></author><author><style face="normal" font="default" size="100%">Saunders, Chris</style></author><author><style face="normal" font="default" size="100%">Kruglyak, Semyon</style></author><author><style face="normal" font="default" size="100%">Bafna, Vineet</style></author><author><style face="normal" font="default" size="100%">Varghese, George</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Compressing genomic sequence fragments using SlimGene.</style></title><secondary-title><style face="normal" font="default" size="100%">J Comput Biol</style></secondary-title><alt-title><style face="normal" font="default" size="100%">J. Comput. Biol.</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Algorithms</style></keyword><keyword><style  face="normal" font="default" size="100%">Data Compression</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">Genomics</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">Polymorphism, Single Nucleotide</style></keyword><keyword><style  face="normal" font="default" size="100%">Sequence Analysis, DNA</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2011</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2011 Mar</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">18</style></volume><pages><style face="normal" font="default" size="100%">401-13</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">With the advent of next generation sequencing technologies, the cost of sequencing whole genomes is poised to go below $1000 per human individual in a few years. As more and more genomes are sequenced, analysis methods are undergoing rapid development, making it tempting to store sequencing data for long periods of time so that the data can be re-analyzed with the latest techniques. The challenging open research problems, huge influx of data, and rapidly improving analysis techniques have created the need to store and transfer very large volumes of data. Compression can be achieved at many levels, including trace level (compressing image data), sequence level (compressing a genomic sequence), and fragment-level (compressing a set of short, redundant fragment reads, along with quality-values on the base-calls). We focus on fragment-level compression, which is the pressing need today. Our article makes two contributions, implemented in a tool, SlimGene. First, we introduce a set of domain specific loss-less compression schemes that achieve over 40× compression of fragments, outperforming bzip2 by over 6×. Including quality values, we show a 5× compression using less running time than bzip2. Second, given the discrepancy between the compression factor obtained with and without quality values, we initiate the study of using &quot;lossy&quot; quality values. Specifically, we show that a lossy quality value quantization results in 14× compression but has minimal impact on downstream applications like SNP calling that use the quality values. Discrepancies between SNP calls made between the lossy and loss-less versions of the data are limited to low coverage areas where even the SNP calls made by the loss-less version are marginal.</style></abstract><issue><style face="normal" font="default" size="100%">3</style></issue></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Ohno-Machado, Lucila</style></author><author><style face="normal" font="default" size="100%">Bafna, Vineet</style></author><author><style face="normal" font="default" size="100%">Boxwala, Aziz A</style></author><author><style face="normal" font="default" size="100%">Chapman, Brian E</style></author><author><style face="normal" font="default" size="100%">Chapman, Wendy W</style></author><author><style face="normal" font="default" size="100%">Chaudhuri, Kamalika</style></author><author><style face="normal" font="default" size="100%">Day, Michele E</style></author><author><style face="normal" font="default" size="100%">Farcas, Claudiu</style></author><author><style face="normal" font="default" size="100%">Heintzman, Nathaniel D</style></author><author><style face="normal" font="default" size="100%">Jiang, Xiaoqian</style></author><author><style face="normal" font="default" size="100%">Kim, Hyeoneui</style></author><author><style face="normal" font="default" size="100%">Kim, Jihoon</style></author><author><style face="normal" font="default" size="100%">Matheny, Michael E</style></author><author><style face="normal" font="default" size="100%">Resnic, Frederic S</style></author><author><style face="normal" font="default" size="100%">Vinterbo, Staal A</style></author></authors><translated-authors><author><style face="normal" font="default" size="100%">and the iDASH team</style></author></translated-authors></contributors><titles><title><style face="normal" font="default" size="100%">iDASH: integrating data for analysis, anonymization, and sharing.</style></title><secondary-title><style face="normal" font="default" size="100%">J Am Med Inform Assoc</style></secondary-title><alt-title><style face="normal" font="default" size="100%">J Am Med Inform Assoc</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2011</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2011 Nov 10</style></date></pub-dates></dates><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">iDASH (integrating data for analysis, anonymization, and sharing) is the newest National Center for Biomedical Computing funded by the NIH. It focuses on algorithms and tools for sharing data in a privacy-preserving manner. Foundational privacy technology research performed within iDASH is coupled with innovative engineering for collaborative tool development and data-sharing capabilities in a private Health Insurance Portability and Accountability Act (HIPAA)-certified cloud. Driving Biological Projects, which span different biological levels (from molecules to individuals to populations) and focus on various health conditions, help guide research and development within this Center. Furthermore, training and dissemination efforts connect the Center with its stakeholders and educate data owners and data consumers on how to share and use clinical and biological data. Through these various mechanisms, iDASH implements its goal of providing biomedical and behavioral researchers with access to data, software, and a high-performance computing environment, thus enabling them to generate and test new hypotheses.</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Kinsella, Marcus</style></author><author><style face="normal" font="default" size="100%">Harismendy, Olivier</style></author><author><style face="normal" font="default" size="100%">Nakano, Masakazu</style></author><author><style face="normal" font="default" size="100%">Frazer, Kelly A</style></author><author><style face="normal" font="default" size="100%">Bafna, Vineet</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Sensitive gene fusion detection using ambiguously mapping RNA-Seq read pairs.</style></title><secondary-title><style face="normal" font="default" size="100%">Bioinformatics</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Bioinformatics</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Cell Line</style></keyword><keyword><style  face="normal" font="default" size="100%">Gene Expression Profiling</style></keyword><keyword><style  face="normal" font="default" size="100%">Gene Fusion</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">Male</style></keyword><keyword><style  face="normal" font="default" size="100%">Prostatic Neoplasms</style></keyword><keyword><style  face="normal" font="default" size="100%">RNA, Messenger</style></keyword><keyword><style  face="normal" font="default" size="100%">Sequence Analysis, RNA</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2011</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2011 Apr 15</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">27</style></volume><pages><style face="normal" font="default" size="100%">1068-75</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">MOTIVATION: Paired-end whole transcriptome sequencing provides evidence for fusion transcripts. However, due to the repetitiveness of the transcriptome, many reads have multiple high-quality mappings. Previous methods to find gene fusions either ignored these reads or required additional longer single reads. This can obscure up to 30% of fusions and unnecessarily discards much of the data.

RESULTS: We present a method for using paired-end reads to find fusion transcripts without requiring unique mappings or additional single read sequencing. Using simulated data and data from tumors and cell lines, we show that our method can find fusions with ambiguously mapping read pairs without generating numerous spurious fusions from the many mapping locations.

AVAILABILITY: A C++ and Python implementation of the method demonstrated in this article is available at http://exon.ucsd.edu/ShortFuse.

CONTACT: mckinsel@ucsd.edu

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.</style></abstract><issue><style face="normal" font="default" size="100%">8</style></issue></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Lo, Christine</style></author><author><style face="normal" font="default" size="100%">Bashir, Ali</style></author><author><style face="normal" font="default" size="100%">Bansal, Vikas</style></author><author><style face="normal" font="default" size="100%">Bafna, Vineet</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Strobe sequence design for haplotype assembly.</style></title><secondary-title><style face="normal" font="default" size="100%">BMC Bioinformatics</style></secondary-title><alt-title><style face="normal" font="default" size="100%">BMC Bioinformatics</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Algorithms</style></keyword><keyword><style  face="normal" font="default" size="100%">Computational Biology</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">Genomics</style></keyword><keyword><style  face="normal" font="default" size="100%">Haplotypes</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">Polymorphism, Single Nucleotide</style></keyword><keyword><style  face="normal" font="default" size="100%">Sequence Analysis, DNA</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2011</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2011</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">12 Suppl 1</style></volume><pages><style face="normal" font="default" size="100%">S24</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">BACKGROUND: Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype.

RESULTS: We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length.

CONCLUSIONS: Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies.</style></abstract></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Brinza, Dumitru</style></author><author><style face="normal" font="default" size="100%">Schultz, Matthew</style></author><author><style face="normal" font="default" size="100%">Tesler, Glenn</style></author><author><style face="normal" font="default" size="100%">Bafna, Vineet</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">RAPID detection of gene-gene interactions in genome-wide association studies.</style></title><secondary-title><style face="normal" font="default" size="100%">Bioinformatics</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Bioinformatics</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Databases, Genetic</style></keyword><keyword><style  face="normal" font="default" size="100%">Gene Expression</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome-Wide Association Study</style></keyword><keyword><style  face="normal" font="default" size="100%">Genomics</style></keyword><keyword><style  face="normal" font="default" size="100%">Polymorphism, Single Nucleotide</style></keyword><keyword><style  face="normal" font="default" size="100%">Software</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2010</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2010 Nov 15</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">26</style></volume><pages><style face="normal" font="default" size="100%">2856-62</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">MOTIVATION: In complex disorders, independently evolving locus pairs might interact to confer disease susceptibility, with only a modest effect at each locus. With genome-wide association studies on large cohorts, testing all pairs for interaction confers a heavy computational burden, and a loss of power due to large Bonferroni-like corrections. Correspondingly, limiting the tests to pairs that show marginal effect at either locus, also has reduced power. Here, we describe an algorithm that discovers interacting locus pairs without explicitly testing all pairs, or requiring a marginal effect at each locus. The central idea is a mathematical transformation that maps 'statistical correlation between locus pairs' to 'distance between two points in a Euclidean space'. This enables the use of geometric properties to identify proximal points (correlated locus pairs), without testing each pair explicitly. For large datasets (∼ 10(6) SNPs), this reduces the number of tests from 10(12) to 10(6), significantly reducing the computational burden, without loss of power. The speed of the test allows for correction using permutation-based tests. The algorithm is encoded in a tool called RAPID (RApid Pair IDentification) for identifying paired interactions in case-control GWAS.

RESULTS: We validated RAPID with extensive tests on simulated and real datasets. On simulated models of interaction, RAPID easily identified pairs with small marginal effects. On the benchmark disease, datasets from The Wellcome Trust Case Control Consortium, RAPID ran in about 1 CPU-hour per dataset, and identified many significant interactions. In many cases, the interacting loci were known to be important for the disease, but were not individually associated in the genome-wide scan.

AVAILABILITY: http://bix.ucsd.edu/projects/rapid.</style></abstract><issue><style face="normal" font="default" size="100%">22</style></issue></record></records></xml>