Privacy-Preserved Sharing and Analysis of Human Genomic Data

The rapid development of Next Generation Sequencing (NGS) technologies significantly reduces the cost for producing DNA data. As a result, genome sequencing may soon become a routine tool for clinical diagnosis and therapy selection.  In the mean time, the demand for large-scale meta-analysis of human genomic data from patients with various diseases is expected to grow substantially in the near future. However, the effort to meet such a demand has not benefited sufficiently from the progress in sequencing technologies, due to the massive amount of computational resources needed for storing and analyzing the NGS data and the complicated procedures for researchers to get access to the data, which are put in place to protect the privacy of human subjects.    

To address such challenges and facilitate secure and also convenient DNA data sharing, we propose to study and develop a suite of innovative and transformative techniques aimed at achieving practical and cost-effective genomic data protection. Using these techniques, an NIH data center can offer a centralized analysis service on the genome data it hosts, execute the analysis programs submitted by the data users, and control release of analysis outcomes to ensure the privacy of DNA donors. Our techniques will also help the center outsource the computation tasks it does not have sufficient resources to handle to the computing systems rented locally and remotely in a highly privacy-preserving manner. The proposed research will be conducted in a close collaboration with iDASH, a National Center for Biomedical Computing for “integrating Data for Analysis, Anonymization and Sharing”, using its infrastructure and data.

Privacy-Preserved Sharing and Analysis of Human Genomic Data is funded by the National Human Genome Research Institute, part of the National Institutes of Health. (R01HG007078)

Principal Investigators

Haixu TangDr. Haixu Tang is an associate professor in School of Informatics and Computing, and a co-Director of the Center for Genomics and Bioinformatics at Indiana University Bloomington. He received his Ph.D in Molecular Biology from Shanghai Institute of Biochemistry, Chinese Academy of Sciences in 1998, and worked as a postDoc associate in University of Southern California, and a project scientist in University of California, San Diego before joining Indiana University. Dr. Tang has been working on algorithmic problems in bioinformatics for over 20 years, and worked on genome privacy problems since 2008. He was a recipient of the NSF CAREER award in 2007, and an outstanding junior faculty award from Indiana University in 2009.


Dr. XiaoFeng Wang is an associate professor in the School of Informatics and Computing at Indiana University, Bloomington. He received his Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University in 2004, and has since been a faculty member at IU.  Dr. Wang is a recognized active researcher on system and data security.   His work focuses on cloud and mobile security, and data privacy (particularly the privacy challenges in large-scale analysis and dissemination of human genomic data). He is a recipient of 2011 Award for Outstanding Research in Privacy Enhancing Technologies (the PET Award) and the Best Practical Paper Award at the 32nd IEEE Symposium on Security and Privacy.  His work frequently receives attention from the media, including CNN, MSNBC, Slashdot, CNet, PC World, etc.  He served as the director for the Security Informatics program at IU in 2010.