The transformative role of data repositories in high throughput proteomics


Fri Mar 21, 2014


Nuno Bandeira
Skaggs School of Science and Pharmaceutical Sciences


Data Sharing


Tandem mass spectrometry is the technology of choice in high throughput proteomics enabling the daily identification and quantification of tens of thousands of peptides and proteins per experiment from hundreds of millions of spectra generated worldwide on a daily basis. But despite significant achievements, the dominant computational paradigm for automated peptide identification still 1) ignores the prior knowledge of billions of spectra in the public domain and 2) processes every new spectrum in isolation as if it is the first and only spectrum ever acquired by 3) matching against exponentially large search spaces of all possible variants of post-translationally modified peptide sequences. Such intrinsic fundamental limitations result in a significant majority of the spectra (sometimes over 90%) being discarded as unidentified and dramatically restrict biomedical research in high throughput analysis of complex samples such as altered cancer proteomes, metaproteomes and aberrant genomes/transcriptomes. In contrast, our Mass spectrometry Interactive Virtual Environment (MassIVE) platform builds on the many millions of publicly available spectra to overcome the limitations of the dominant analysis workflows. We show how this approach represents a departure towards a community-centric paradigm where algorithms and statistical models focus primarily on using spectrum matching to map newly acquired raw data to a worldwide `spectral cloud’ and to connect labs within and across multiple fields of research.



Nuno Bandeira is an Assistant Professor at the Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California in San Diego. Having been trained in Computer Science and Bioinformatics, his research focuses on the development of algorithms for interpretation of proteomics and metabolomics mass spectrometry data from endogenous and digested peptides, discovery and localization of post-translational modifications, protein-protein interactions, sequencing of non-linear peptides with unknown amino acids and characterization of microbial, marine, reptile and plant natural products. Dr. Bandeira is also the Executive Director of the UCSD/NIH Biomedical Research Center for Computational Mass Spectrometry, where he leads a team of developers who created and implement the ProteoSAFe platform for computational proteomics that has already enabled searching over 1 Billion mass spectra from >2,200 worldwide users in over 25,000 search jobs. The ultimate goal of his lab is to enable reliable, reproducible and automated identification of any class of molecules suitable for analysis with mass spectrometry instruments.


IMPORTANT NOTICE: This ReadyTalk service includes a feature that allows audio and any documents and other materials exchanged or viewed during the session to be recorded.  By joining this session, you automatically consent to such recordings. If you do not consent to the recording, discuss your concerns with the meeting host prior to the start of the recording or do not join the session. Please note that any such recordings may be subject to discovery in the event of litigation.