Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Example - Add SD-Rules Derived from factD

The original Lexical Tools collects 4,467 SD-pairs with 4,110 suffix SD-pairs. These SD-pairs can be used to derive possible SD-rules by following the same approach in the nomD session:

  • Identifies possible SD-Rules by stripping the same starting characters of each valid SD-pair generated from factD.
  • Select high frequency SD-rules to add to SD-rules set:
    Possible SD-rule from factDRootRelatedNotes
    $|noun|less$|adj|131|131YesNoneSelected
    $|verb|$ion|noun|111|111YesDuplicatedNot selected
    ist$|noun|y$|noun|63|63YesNoneSelected
    $|adj|ally$|adv|58|58
    => ic$|adj|ically$|adv is used instead
    => need to verify the root stats
    YesNoneSelected
    $|noun|ful$|adj|58|58YesNoneSelected
    c$|adj|s$|noun|54|54
    => ic$|adj|is$|noun is used instead
    => need to verify the root stats
    YesNoneSelected
    on$|noun|ve$|adj|38|38YesNoneNot selected due to low frequency (coverage)
    .........Not selected due to low frequency (coverage)

  • Apply the same procedures to get the optimized set as in add SD-rules from nomD session by using the optimized set of 2.3.4 as new baseline. This task involves:
    • Retrieve all raw SD-pairs from Lexicon (2013) of above four selected SD-rules
    • Tag raw SD-pairs
    • Get stats of SD-pairs of these four SD-rules
    • Add to SD-rules set and find the optimization
    • The total valid SD-Pair no. (TotalYes) needs to be calculated as total valid SD-pair no. from all parent-rules.

    The iterative results are shown as follows:

    IDNew Candidate RuleTotal YesTotal Rule No.Rule No.A. RateOccr.YesNoTbdSD-RuleStatusSourceNotesSys A. RateSys C. RateSys. PerfNotes
    2.3.4
    (prev. optimized set)
      39,197906860.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.05%94.60%1.8965Baseline
    2.3.4.1 12|99.95%|1931|1930|1|0|ic$|adj|ically$|adv|2013|ORG_FACT|SELF 41,127 =
    39,197 + 1930
    916960.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.28%94.85%1.9013Better
    2.3.4.2 15|99.64%|559|557|2|0|$|noun|less$|adj|2013|ORG_FACT|SELF 41,684 =
    41,127 + 557
    927060.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.34%94.92%1.9026Better
    2.3.4.3 40|95.63%|504|482|22|0|ist$|noun|y$|noun|2013|ORG_FACT|SELF 42,166 =
    41,684 + 482
    937160.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.35%94.98%1.9032Better
    2.3.4.4 49|91.70%|277|254|23|0|ic$|adj|is$|noun|2013|ORG_FACT|SELF 42,420 =
    42,166 + 254
    947260.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.32%95.01%1.9033Better
    2.3.4.5 55|89.93%|139|125|14|0|$|noun|ful$|adj|2013|ORG_FACT|SELF 42,545 =
    42,420 + 125
    957360.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.30%95.02%1.9033Best

From above results, all five selected SD-rules (with the highest frequency and precision from factD) improved the system performance. Thus, all these five SD-rles are added to the SD-rule set. Please note that SD-rule ic$|adj|ically$|adv and ic$|adj|is$|noun are suggested SD-rules from their root parent-rule $|adj|ally$|adv and c$|adj|s$|noun, respectively. Both of root parent-rules should be re-evaluated by this system.

The table above shows the iterative results by adding new rules derived from factD step by step. The results show all five selected SD-rules (with the highest frequency from factD) improve the system performance. Thus, all these five SD-rules are added to the SD-rule set to reach better coverage rate (95.02%) and system performance (1.9033) with accuracy rate of 95.30% to include 73 (out of 95) SD-rule in the optimized set. The diagram below shows the system accuracy and coverage curves of this optimized set.