SHARE: system design and case studies for statistical health information release.

TitleSHARE: system design and case studies for statistical health information release.
Publication TypeJournal Article
Year of Publication2012
AuthorsGardner, J, Xiong, L, Xiao, Y, Gao, J, Post, AR, Jiang, X, Ohno-Machado, L
JournalJ Am Med Inform Assoc
Date Published2012 Oct 11
ISSN1527-974X
iDASH CategoryPrivacy Technology
AbstractOBJECTIVES: We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data. MATERIALS AND METHODS: SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE. RESULTS: Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback-Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data. CONCLUSIONS: SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses.
DOI10.1136/amiajnl-2012-001032
Alternate JournalJ Am Med Inform Assoc
PubMed ID23059729