Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

SD-Rules Optimization Models and Procedures

  • Optimize a set of SD-rules
    Keep the good rules and remove the bad rules from a known set of rules:
    • The original Lexical Tools - 97 SD-Rules, is used as the baseline
    • Duplicated SD-Rules are removed (none is found in these 97-set)
    • Evaluated all parent-child rules and find the optimized set
      Go through each parent-rule to choose parent-rule or child-rules by following steps:
      • All child-rules are temporary removed (keep the parent-rule)
      • Decompose a parent-rule to child-rules, grandchild-rules, etc.
        Only decompose recursively on a child rule if it is a potential good rule:
        • its accuracy rate is higher than root parent-rule
        • its coverage rate is higher than 40% of root parent-rule. 40% is a default number and can be adjusted.
      • Evaluate child-rules for one parent-rule at a time
        Child-rules should be evaluated only if
        • Child-rules have higher accuracy rate than its parent-rule.
          Otherwise, just the ignore child-rule and use its parent-rule because parent-rule will have better accuracy and coverage rate than the child-rule.
        • Child-rules have more than 35% of coverage of its parent-rule.
          All rules have to be a good rule to be in the optimized set. So, child-rules should have good coverage. 35% is a default number and can be adjusted.
        • Compare the system performance between parent-rule and child-rules, and choose the better one:
          • higher system performance
          • more rules if system performance is the same
    • Find the optimized SD-Rule set with best system performance by superposition those better parent-child rules

  • Add a new rule to a set of SD-rules

    The same model and procedures as above can be used to evaluate new SD-rules. If a new SD-rule is suggested to add to a (optimized) set of SD-rules, the procedures are as follows:

    • Check if the new rules is a duplicated rule,
      if so, no need to evaluate the new rule.
    • Check if the new rule is a child-rule,
      if so, use the same parent-child evaluation procedures as above.
    • Check if the new rule is a parent-rule
      if so, use the same parent-child evaluation procedures as above.
    • Others:
      Evaluate it by comparing system performance

    The evaluation procedures need to get tagging stats of matching SD-pairs from Lexicon:

    • Get all raw SD-pairs matches the new SD-Rule from Lexicon
    • Tag the raw SD-pairs
    • These data can be derived from the existing set if the new rule is a child-rule.

The following figure shows the SD-Rules model and optimization procedures.