Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Optimizing 2015 SD-Rule Set - Parent Rules

I. Find all candidate child rules for 14 parent rules

  • DIR: ${SUFFIXD_DIR}
  • Inputs:
    • ${SUFFIXD_DIR}/data/${YEAR}/dataR/SdRulesCheck/decompose/sdPairs.data
      shell> sort -u ../../../data/suffixD.yesNo.data > ./suffixD.yesNo.data.uSort
      shell> flds 1,2,4,5,7 ./suffixD.yesNo.data.uSort > suffixD.yesNo.data.uSort.1.2.4.5.7
      shell> ln -sf ./suffixD.yesNo.data.uSort.1.2.4.5.7 sdPairs.data
    • ${SUFFIXD_DIR}/data/${YEAR}/dataR/SdRulesCheck/decompose/sdRule.data
      => Add all 14 parent SD-Rules to
      => go through one by one by comment out (#) the rest 13
  • Program:
    shell> cd ${SUFFIXD_DIR}/bin
    shell> GetSdRule ${YEAR}
    7
    40 (min. occurrence rate - for decompose)
    => Need to have enough coverage for further decomposition on child rules
    25 (35) (min. coverage rate - for candidate child)
    => Need to have enough coverage to be a qualified child rule
  • Outputs:
    • sdRules.decompose.out

      Child rule must have high accuracy rate (precision) than the root parent-rule and meets the min. coverage rate (recall). Manually look through the output file sdRule.decompose.out and search for "<= Candidate", these candidates are child-rules match following criteria:

      • the accuracy rate (precision) is higher than parent-rule
      • the coverage rate (recall) is higher than 35% (or the specified number)
    • shell>mv sdRules.decompose.out sdRules.decompose.out.no.rule
    • such as shell>mv sdRules.decompose.out sdRules.decompose.out.1.X-ally
  • Repeat this process for all 14 parent rules.

II. Replace 14 parent rules by selected candidate child SD-Rules for optimized set

  • DIR: ${SUFFIXD_DIR}/data/$[year}/dataR/SdRulesOptimum/
    • Create a new directory
      shell>mkdir 1.X-ally
  • Inputs:
    • Update the sdRules.stats.in by replace 1st parent rules with candidate child rules
      shell>cd 1.X-ally
      shell>cp ../0.baseline/sdRules.stats.in .
      => Copy all candidate child rules from ../../SdRulesCheck/decompose/sdRule.decompose.out.1.X-ally to this file
      Update the follows:
      • Change the rank (1st field)a to 241 (Original rank + child level)
      • Move accuracy rate (precision) to 2nd field
      • Add 0 to 6th field (tbd no.)
      • Change fields 11~13 to ${YEAR}|DECOMPOSE|CHILD
      • Comment out (#) those parent/child rules are not in test
      #24|99.08%|2072|2053|19|0|$|adj|ally$|adv|2015|ORG_FACT|PARENT
      241|99.95%|1954|1953|1|0|c$|adj|cally$|adv|2015|DECOMPOSE|CHILD
      #242|99.95%|1949|1948|1|0|ic$|adj|ically$|adv|2015|DECOMPOSE|CHILD
      
  • Program - Get the optimal Set:
    shell> cd ${SUFFIXD_DIR}/bin
    shell> GetSdRule ${YEAR}
    1
    others
    1.X-ally
    46950 <= from baseline
  • Outputs directory:
    • ${SUFFIXD_DIR}/data/${YEAR}/dateR/SdRulesOptimum/1.X-ally
    -- Optimum SD-Rules: 76|61.70%|188|116|72|0|ar$|adj|e$|noun|2013|ORG_RULE|SELF|95.21%|95.50%|1.9071|44835|47089
    
  • Repeat this process for all candidate child rules.
  • Repeat this process for all parent rules.

III. Results

Please refer to the result of optimization log for details of each step for these parent-child rules optimization processes.

The result of the final optimized set of SD-Rules includes 101 unique parents/self/child SD-Rules. They are sorted by a descending order of precision (= relevant, retrieved No./retrieved No.) and then retrieved No. rate. The top 76 SD-Rules are used as the optimized SD-Rule set to cover 95.22% system (accumulated) precision and 95.70% system (accumulated) recall rate with a system performance of 1.9093. The total valid instance number is 46950.

-- Total line no: 147
-- Total comment no: 46
-- Total Sd-Rule no: 101
---------------------------------------
-- Optimum SD-Rules: 76|61.70%|188|116|72|0|ar$|adj|e$|noun|2013|ORG_RULE|SELF|95.22%|95.70%|1.9093|44933|47187

IV. Post-Process

Update ${SUFFIXD_DIR}/data/${YEAR}/dataOrg/sdRules.data.${NEXT_YEAR} by:

  • adding new candidate child rules with better system performance