Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text.

TitleEvaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text.
Publication TypeJournal Article
Year of Publication2014
AuthorsSouth, BR, Mowery, D, Suo, Y, Leng, J, Ferrández, Ó, Meystre, SM, Chapman, WW
JournalJ Biomed Inform
Volume50
Pagination162-72
Date Published2014 Aug
ISSN1532-0480
iDASH CategoryNatural Language Processing
Abstract<p>The Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor method requires removal of 18 types of protected health information (PHI) from clinical documents to be considered "de-identified" prior to use for research purposes. Human review of PHI elements from a large corpus of clinical documents can be tedious and error-prone. Indeed, multiple annotators may be required to consistently redact information that represents each PHI class. Automated de-identification has the potential to improve annotation quality and reduce annotation time. For instance, using machine-assisted annotation by combining de-identification system outputs used as pre-annotations and an interactive annotation interface to provide annotators with PHI annotations for "curation" rather than manual annotation from "scratch" on raw clinical documents. In order to assess whether machine-assisted annotation improves the reliability and accuracy of the reference standard quality and reduces annotation effort, we conducted an annotation experiment. In this annotation study, we assessed the generalizability of the VA Consortium for Healthcare Informatics Research (CHIR) annotation schema and guidelines applied to a corpus of publicly available clinical documents called MTSamples. Specifically, our goals were to (1) characterize a heterogeneous corpus of clinical documents manually annotated for risk-ranked PHI and other annotation types (clinical eponyms and person relations), (2) evaluate how well annotators apply the CHIR schema to the heterogeneous corpus, (3) compare whether machine-assisted annotation (experiment) improves annotation quality and reduces annotation time compared to manual annotation (control), and (4) assess the change in quality of reference standard coverage with each added annotator's annotations.</p>
DOI10.1016/j.jbi.2014.05.002
Alternate JournalJ Biomed Inform
PubMed ID24859155
Grant List7R01GM090187 / GM / NIGMS NIH HHS / United States
U54 HL 108460 / HL / NHLBI NIH HHS / United States

iDASH Category: