Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Results of optimized set

I. The optimized set
As the result, we concluded case 10.1 is the final optimized set of SD-Rules in the corpus of Lexicon 2015 to include 76 (out of 101) SD-rules to reach:

  • system accuracy rate: 95.22%
  • system coverage rate: 95.70%
  • system performance: 1.9093

This set of SD-rules is expected to reach the same system performance when it is applied to other English corpora under the assumption that:

  • the characteristics of derivations are consistent between from Lexicon and the working general English domain.
    Lexicon is considered as a representable subset (in terms of derivations) for general English. Please refer to future work for this assumption.

II. The methodology
This approach is to find the best set of SD-rules from a set of known candidate SD-rules. Theoretically, a complete set of SD-Rules can be obtained when more SD-rules are evaluated and added. This methodology provides a systematic approach to:

  • measure system performance
  • to evaluate new SD-rules
  • obtain the set of SD-rules according to user's specified target minimum accuracy rate (system performance)
  • choose among parent-child SD-Rules to reach Max. system precision and recall rate.
    • In general, a parent rule has higher recall while a child rule has higher precision
    • This method provides a good way to choose between a parent rule and child rule(s).

III. The target precision and recall rate (95%)

The intersection of curves (optimization) of system precision rate and system recall rate of the final set are at 95%. We also used average values for the window size of 3, 5, 7 rules for these two curves for noise reduction (smoothing algorithm - simple moving average) and find the intersections are all around 95% for all cases (see diagram below). Smoothing this data set allows us to capture the characteristics of this set and leave out noise. Accordingly, our target minimum accuracy rate (95%) is a good choice to obtain the optimized set of SD-rules (close to optimization).