The Tranche data repository: Progress made and lessons learned from 24.4 TB of data, 1,612 users and 12,458 depositions.


Fri Oct 19, 2012


Philip Andrews
University of Michigan


Data Sharing





High-throughput data production in proteomics and other post-genome disciplines has resulted in many data sharing, annotation, and dissemination challenges. Successful systems require as much thought and effort be put into the social aspects of engineering as the software side.  Close relationships with data generators and users are crucial as is the recognition that development of a database to support a static field is quite different from research fields where the data types, qualities, and relationships change constantly.  What happens after initial development is also crucial.  Models for long term maintenance and continued development are challenging to implement and require flexibility and significant institutional investment.  The Tranche data repository provides public access to the large, complex, and expensive to generate data sets common in the field of proteomics.  This  support for data sharing reinforces the peer review process in proteomics and allows reuse of data through centralized databases and the ability to cross-correlate and aggregate these data sets.  The Tranche project is a distributed, free to use, open-source system specifically dedicated to alleviating these problems and designed to be a community resource that interfaces readily with existing databases and computational resources.  The Tranche and projects have several direct benefits to researchers.  Together they provide a secure resource for archiving and annotating large datasets while inherently maintaining data integrity and provenance, they provide a proper citation for data, allow a high degree of compliance with annotation standards, and allow facile access to public datasets.



Dr. Andrews is a Professor in the departments of Bioinformatics, Chemistry, and Biological Chemistry at the University of Michigan.  He received his B.S. degree in Chemistry at the Georgia Institute of Technology and his Ph.D. in Biochemistry at Purdue University with Dr. Larry Butler.  His graduate and post-doctoral work was in enzymology and protein chemistry at Purdue University with Dr. Jack Dixon.  Initial post-doctoral research dealt with the post-translational processing associated with the maturation of peptide hormones.  Subsequent research has focused on the roles of post-translational modifications in modulating the functions of proteins and on structural mass spectrometry.  A continuing effort of the laboratory has been development and application of computational methods to proteomics and the application of new chemistries and mass spectrometric techniques to proteomics.  Recent work in Dr. Andrews’ laboratory has focused on the molecular architecture of organelles as well as chemistries for phosphoprotein analysis, methods for quantitative proteomics, approaches to improving interaction maps, and new technology crosslinkers for monitoring protein interactions.

Proteome informatics projects include development of new tools for de novo sequence analysis, spectral clustering (Bonanza), an information management system for proteomics (PRIME), assessment of search results, and specialized tools for viewing and processing proteomics data (MSExpedite).  The Andrews laboratory has also been active in development of standards and supports Open Access Data with the Tranche dissemination system and several projects associated with the resource.


IMPORTANT NOTICE: This ReadyTalk service includes a feature that allows audio and any documents and other materials exchanged or viewed during the session to be recorded.  By joining this session, you automatically consent to such recordings. If you do not consent to the recording, discuss your concerns with the meeting host prior to the start of the recording or do not join the session. Please note that any such recordings may be subject to discovery in the event of litigation.