Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Optimizing 2015 SD-Rule Set - Parent Rules
I. Find all candidate child rules for 14 parent rules
shell> sort -u ../../../data/suffixD.yesNo.data > ./suffixD.yesNo.data.uSort
shell> flds 1,2,4,5,7 ./suffixD.yesNo.data.uSort > suffixD.yesNo.data.uSort.1.2.4.5.7
shell> ln -sf ./suffixD.yesNo.data.uSort.1.2.4.5.7 sdPairs.data
shell> cd ${SUFFIXD_DIR}/bin
shell> GetSdRule ${YEAR}
7
40 (min. occurrence rate - for decompose)
25 (35) (min. coverage rate - for candidate child)
Child rule must have high accuracy rate (precision) than the root parent-rule and meets the min. coverage rate (recall). Manually look through the output file sdRule.decompose.out and search for "<= Candidate", these candidates are child-rules match following criteria:
shell>mv sdRules.decompose.out sdRules.decompose.out.no.rule
shell>mv sdRules.decompose.out sdRules.decompose.out.1.X-ally
II. Replace 14 parent rules by selected candidate child SD-Rules for optimized set
shell>mkdir 1.X-ally
shell>cd 1.X-ally
shell>cp ../0.baseline/sdRules.stats.in .
#24|99.08%|2072|2053|19|0|$|adj|ally$|adv|2015|ORG_FACT|PARENT 241|99.95%|1954|1953|1|0|c$|adj|cally$|adv|2015|DECOMPOSE|CHILD #242|99.95%|1949|1948|1|0|ic$|adj|ically$|adv|2015|DECOMPOSE|CHILD
shell> cd ${SUFFIXD_DIR}/bin
shell> GetSdRule ${YEAR}
1
others
1.X-ally
46950
<= from baseline
-- Optimum SD-Rules: 76|61.70%|188|116|72|0|ar$|adj|e$|noun|2013|ORG_RULE|SELF|95.21%|95.50%|1.9071|44835|47089
III. Results
Please refer to the result of optimization log for details of each step for these parent-child rules optimization processes.
The result of the final optimized set of SD-Rules includes 101 unique parents/self/child SD-Rules. They are sorted by a descending order of precision (= relevant, retrieved No./retrieved No.) and then retrieved No. rate. The top 76 SD-Rules are used as the optimized SD-Rule set to cover 95.22% system (accumulated) precision and 95.70% system (accumulated) recall rate with a system performance of 1.9093. The total valid instance number is 46950.
-- Total line no: 147 -- Total comment no: 46 -- Total Sd-Rule no: 101 --------------------------------------- -- Optimum SD-Rules: 76|61.70%|188|116|72|0|ar$|adj|e$|noun|2013|ORG_RULE|SELF|95.22%|95.70%|1.9093|44933|47187
IV. Post-Process
Update ${SUFFIXD_DIR}/data/${YEAR}/dataOrg/sdRules.data.${NEXT_YEAR} by: