Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

SpVar - Phonetic Tests

I. Introduction

SpVarNorm is used to find spVars from a corpus by normalizing lexical information. The other chracteristic of spVars is having same pronunciation. Serveral Phonetics algorithms are tests for this task.

II. Processes

  • Most of phonetic algorithmis very agreesive and result in high recall and low precision.
  • We perfromed two types of tests: unit test and application test.

III. Results

  • Unit Tests:
    We tested all above algorithms. Soundex, RefinedSoundex, Caverphone 1.0, and Cologne are too aggresive and results in poor precision.

  • Application Tests
    • First Test:
      StepMethodsEdit DistanceSample No.ret-relret-irrelnotRet-relnotRet-irrelPrecisionRecallF1AccuracyNotes
      0GoldStdN/A867,728379,26900488,4591.00001.00001.00001.00001 min.
      1NormN/A867,728305,3093,49573,960484,9640.98870.80500.88740.91071 min.
      2M2ES2867,728364,39664,62214,873423,8370.84940.96080.90170.90842 hr.
      2M3ES2867,728364,87064,37114,399424,0880.85000.96200.90260.90922 hr.
      2C2ES2867,728366,78896,11612,481392,3430.79240.96710.87110.87482 hr.
      2M2CES2867,728363,60957,25915,660431,2000.86400.95870.90890.91602 hr.
      2M3CES2867,728363,59356,95315,676431,5060.86460.95870.90920.91632 hr.

    • 2nd Test (from new spVarNorm):
      StepMethodsEdit DistanceSample No.ret-relret-irrelnotRet-relnotRet-irrelPrecisionRecallF1AccuracyNotes
      0GoldStdN/A867,728379,26900488,4591.00001.00001.00001.00001 min.
      1NormN/A867,728305,3093,49573,960484,9640.98870.80500.88740.91071 min.
      2M2ES2867,728363,95662,97915,313425,4800.85250.95960.90290.9098113 min.
      2M3ES2867,728364,44762,72614,822425,7330.85320.96090.90380.9106114 min.
      2C2ES2867,728366,57494,65412,695393,8050.79480.96650.87230.8763133 min.
      2M2CES2867,728363,22255,61816,047432,8410.86720.95770.91020.917496 min.
      2M3CES2867,728363,20355,31016,066433,1490.86780.95670.91050.917797 min.

    • M2CES and M3CES have very similar results. Considering the recall and software maturity, M2CES model is used to on MEDLINE n-gram distillled set for retrieve LMS candidates.