Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Comparison on Optimized Set on 2014 and 2015

I. From 2014 to 2015:

The 2014 optimized set is based on 2013 SD-Rule data. It is used as baseline for 2015. 15 new SD-Rules are then added to the 2014 SD-Rule set for evaluation and used for 2015 release. 11 of them are evaluated as good rules in the optimized set, 2 are bad rules and 2 are duplicated (child rule of existing rules). Also, in the optimized set, 2 child rules are used to replace proposed rules.

SD-RulePrecisionInstancesSourceResults
Good Rules
se$|verb|zation$|noun100.00%1108NOM_DGood SD-Rule
sation$|noun|ze$|verb100.00%1071NOM_DGood SD-Rule
ility$|noun|le$|adj99.94%1625NOM_DGood SD-Rule
$|adj|ally$|adv99.08%2072ORG_DGood SD-Rule
ce$|noun|t$|adj98.82%847NOM_DChild rule nce$|noun|nt$|adj is used
cy$|noun|t$|adj98.77%406NOM_DGood SD-Rule
e$|verb|ion$|noun98.76%2336NOM_DGood SD-Rule
c$|adj|s$|noun91.46%281ORG_DChild rule ic$|adj|is$|noun is used
e$|verb|ing$|noun91.43%210SuggestionsGood SD-Rule
ian$|adj|ia$|noun86.31%263SuggestionsDuplicated, parent rule an$|adj|a$|noun is used
al$|adj|us$|noun84.35%262SuggestionsGood SD-Rule
es$|noun|ic$|adj73.91%23SuggestionsGood SD-Rule
Bad Rules
$|noun|ize$|verb59.05%442SuggestionsBad SD-Rule
ian$|noun|ia$|noun0.36%274SuggestionsDuplicated, parent rule an$|noun|a$|noun is a bad SD-Rule
es$|noun|ic$|noun0.00%19SuggestionsBad SD-Rule

II. Comparison of SD-Rule set:

Item20142015
Total Unique Rules96101
Total Good Rules7376
Opti. System Precision95.30%95.22%
Opti. System Recall95.01%95.70%
Opti. System Performance1.90311.9093
Cufoff Rulear$|adj|e$|noun ar$|adj|e$|noun
Optimized Set 2014 Optimized Set 2015 Optimized Set
Optimized Diagram

For the Optimial set:

  • The optimized set is similar between 14 and 15, please see SD-Rule rank mapping for details.
  • All good rules in 14 are in 15.
  • 2014 optimal set has 96 SD-Rules, 73 of them are good.
  • 2015 optimal set has 101 SD-Rules, 76 of them are good.

III. Transaction Details:

The detail transaction of SD-Rules are described as below:

  • Baseline SD-Rule count:

  • Good SD-Rules count in Optimal Set:
    • 2014 has 73 good rules while 2015 has 76 food rules in optimate set:
    • All 73 good SD-Rules in 2014 are good rules in 2015. They could be identical, or replaced by the parent-rules or child-rules.
    • From the evaluation, 11 of 15 new rules are good. Why is the total number of good SD-Rule only increased by 3 (from 73 to 76), not 84 (73 + 11)? It is because of the complicated child-parent rules situation get involved, please see SD-Rule rank mapping for details. They are summarized as detail below:

      Type20142015Details
      No Change6565...
      Parent-1-Child44
      20142015
      02: ability$|noun|able$|adj09: ility$|noun|le$|adj
      08: ic$|adj|ically$|adv15: $|adj|ally$|adv
      21: ency$|noun|ent$|adj19: cy$|noun|t$|adj
      55: ion$|noun|ional$|adj70: $|noun|al$|adj
      Parent-2-Child42
      20142015
      16: ance$|noun|ant$|adj
      18: ence$|noun|ent$|adj
      18: nce$|noun|nt$|adj
      10: ate$|verb|ation$|noun
      63: se$|verb|sion$|noun
      20: e$|verb|ion$|noun
      New in 201505
      • 02: se$|verb|zation$|noun
      • 03: sation$|noun|ze$|verb
      • 45: e$|verb|ing$|noun
      • 61: al$|adj|us$|noun
      • 67: es$|noun|ic$|adj
      Total7376 

    • The following table shows the transcation on the 15 new propsoed SD-Rules in 2015.

      Computer Generated SD-Rules
      IDProposed New RuleSourceResultsRank & Rule 2015Rank & Rule 2014TypeCount ChangeAccu. Count
      01-CG1se$|verb|zation$|nounnomDGood02: se$|verb|zation$|nounNoneNew in 2015+174
      02-CG2sation$|noun|ze$|verbnomDGood03: sation$|noun|ze$|verbNoneNew in 2015+175
      03-CG3ility$|noun|le$|adjnomDGood09: ility$|noun|le$|adj02: ability$|noun|able$|adjParent-1-Child+075
      04-CG4$|adj|ally$|advorgDGood15: $|adj|ally$|adv08: ic$|adj|ically$|advParent-1-Child+075
      05-CG5nce$|noun|nt$|adjnomDGood18: nce$|noun|nt$|adj 16: ance$|noun|ant$|adj
      18: ence$|noun|ent$|adj
      Parent-2-child-174
      06-CG6cy$|noun|t$|adjnomDGood19: cy$|noun|t$|adj21: ency$|noun|ent$|adjParent-1-Child+074
      07-CG7e$|verb|ion$|nounnomDGood20: e$|verb|ion$|noun 10: ate$|verb|ation$|noun
      63: se$|verb|sion$|noun
      Parent-2-Child-173
      08-CG8c$|adj|s$|nounorgDGood43: ic$|adj|is$|noun41: ic$|adj|is$|nounChild+073
      Expert-Suggested SD-Rules
      09-ES1e$|verb|ing$|nounExpertsGood45: e$|verb|ing$|nounNoneNew in 2015+174
      10-ES2al$|adj|us$|nounExpertsGood61: al$|adj|us$|nounNoneNew in 2015+175
      11-ES3es$|noun|ic$|adjExpertsGood67: es$|noun|ic$|adjNoneNew in 2015+176
      12-ES4$|noun|ize$|verbExpertsBad78: $|noun|ize$|verbNoneNew+076
      13-ES5es$|noun|ic$|nounExpertsBad101: es$|noun|ic$|nounNoneNew+076
      14-ES6ian$|adj|ia$|nounExpertsGood57: a$|noun|an$|adj53: a$|noun|an$|adjDuplicated-Child+076
      15-ES7ian$|noun|ia$|nounExpertsBad99: a$|noun|an$|noun93: a$|noun|an$|nounDuplicated-Child+076

    • In the evaluation process, we removed two proposed new rules (ES-6 and ES-7) because they are child rules of existing rules. After the normalization (alphabetic order and use root-parent-rule), they are duplicated rules. Thus, we did not anlyze the parent-child hierachy on these two rules. Should we analyze them in the future releses?
    • In our process, we only analyze parent-child hierachy for those SD-Rules has parent-child relationship co-exist in the collected set because it is very expensive. Shoule we modify the processes as:
      • Normalize all SD-Rules to it's root-parent-rule.
      • Analyze parent-child-hieracy for all SD-Rules.

      in 2015, we have 14 parents rules. If we modify to this process, there will be 101 parents rules, very expensive!!
    • 2015 has 10 more root parent rules.

The conclusion is the optimized set of SD-Rules is very steady as we expected. Does this imply that Lexicon is a good representative subset of general English?