Tutorial I. Workbench

 

 

For demonstration purposes, five reports from the MTSamples are stored in the results folder of the NLPEW distribution.

 

 

Three startup parameter files reference these reports:

  1. startup.properties UTOPAZ references two sets of
    annotations produced by the UimaTopaz NLP tool; one set of annotations was generated without the set of rules that recognizes negation phrases in order to produce false positive errors,
  2. startup.properties UTOPAZ / KNOWTATOR Mixed compares UTopaz annotation against two documents annotated by hand using Knowtator, and
  3. references UIMA drug files.

To view each annotation set, first copy the corresponding parameter
file to startup.parameters, then run the Workbench as indicated in the Configuration and Startup directions.




 
Comparing Two Topaz Pipelines
  

  1. The workbench compares annotations from two annotators whose identities are denoted by the firstAnnotator and secondAnnotator parameters in the startup.properties file. The startup.properties UTOPAZ file included in the distribution designates these values as primary and secondary, respectively, but users are free to change these names.
  2. The output_directory denotes the directory which contains triples of clinical reports, and the first and second annotations for those reports. For instance, output_directory may contain a clinical document 1.txt and two corresponding annotation files, 1_primary_.xml and 1_secondary_.xml.
  3. At startup, the workbench looks for files ending in ".xml", whose names contain the first / second annotator id. Each annotation file contains the annotator id as well as the name of the source clinical document; for instance, 1_primary_.xml contains the line

 
 

Create UIMA Aggregate Pipeline for Type Modeler Tool (TMT)

 

Create UIMA aggregate pipeline for Type Modeler Tool (TMT). Should contain two components:

  1. Your UIMA pipeline, and
  2. Users/johndoe/Desktop/EvaluationWorkbenchFolder/desc/WorkbenchTypeModelExtractor.xml

Create UIMA aggregate pipeline for annotation generation. Should contain:

  1. Your UIMA pipeline, and
  2. Users/johndoe/Desktop/EvaluationWorkbenchFolder/desc/WorkbenchResultsGenerator.xml

Run the TMT aggregate pipeline to create the UIMA/Workbench type model file.

 




 

Comparing Topaz / Knowtator Pipelines

TBA
 

 

 

Once the program opens, a user may observe results of the two annotation pipelines by holding down the Ctrl key and moving the mouse over the main window.

 

 

1. Toggle between Unified Medical Language System (UMLS) Concept Unique Identifier (CUI) and concept descriptions

 

 

  1. Hold down the Ctrl key and highlight the desired concept.
  2. Let go of the Ctrl, navigate mouse to desired document, press the Ctrl key to view selected document in document display pane.

 

 

  1. Selected document will appear in document display pane.
  2. Navigate to concept using scroll bar.

 

 

  1. Primary document visible in document display pane.




Description of Workbench Evaluation Measures

 

 

  1. TP (True Positive): Found in both primary annotation and secondary annotation
  2. FP (False Positive): Found only in secondary annotation
  3. TN (True Negative): Not found in either document
  4. FN (False Negative): Found only in primary annotation
  5. Accuracy: (TP +TN) / (TP + TN + FP + FN)
  6. PPV (Positive Predictive Value): Eqv. with precision = TP / (TP+FP)
  7. Sensitivity: True positive rate = TP/P = TP / (TP + FN)
  8. NPV (Negative Predicitive Value): TN / (TN + FN)
  9. Specificity (or True Negative Rate): TN/N = TN/(FP+TN) = 1-FPR
  10. Cohen's kappa - measurement of the reliability between two different annotation systems or raters by using the Actual-Observed agreement (TP and TN) and take into account the Expected-Agreement-By-Chance. Cohen's kappa calculates the chance of YES/NO responses for each rater/system alone and then multiples those to get chance agreements.
  11. Scott's Pi - - measurement of the reliability between two different annotation systems or raters by using the Actual-Observed agreement (TP and TN) and take into account the Expected-Agreement-By-Chance.Scott's Pi pools the estimates for random YES/NO responses. This difference may lead to different reliability values depending on bias and prevalence of responses.
  12. F-measure: F-measure is an evaluation metric that does not use the observed TN matches, which is useful when it may be difficult to know for sure if a ground truth is unknown, or if there are so many possible items that TN is very high.