Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

SpVar Normalization Development Notes

I. Introduction

An iterative progresses were developed to improve precision and recall on SpVarNorm algorithm by:

  • Run SpVarNorm on Lexicon.2015
  • Check all False-Positive instanaces
  • Enhance the spVarNorm algorithm and repeat these three steps

II. Process

  • Enhanced Norm to increase precision by removing genetive only at the end of term (not anywhere in the term). This is used in AMIA paper final submission.

  • Test on False Positives (for increasing precision) - used AMIA final submission as baseline
    StepMethodsEdit DistanceSample No.ret-relret-irrelnotRet-relnotRet-irrelPrecisionRecallF1AccuracyNotes
    0GoldStdN/A867,728379,26900488,4591.00001.00001.00001.00001 Min.
    1Baseline
    AMIA-Final
    N/A867,728305,3093,49573,960484,9640.98870.80500.88740.91071 Min.
    1.1Genetive SpVarsN/A867,728303,8181,75975,451486,7000.99420.80110.88730.91101 Min.
    1.2Dash SpVarsThe False-Postive is very small (199), no enhanced algorithm is implemented.
    1.3Space SpVarsThe False-Postive is very small (41), no enhanced algorithm is implemented.
    1.4Mixed case SpVarsThese False-Postive is actually a valid (TP) due to the error in gold Standard

  • Test on not to remove genitive at all in Norm
    StepMethodsEdit DistanceSample No.ret-relret-irrelnotRet-relnotRet-irrelPrecisionRecallF1Accuracy
    0GoldStdN/A867,728379,77600487,9521.00001.00001.00001.0000
    1NormN/A867,728315,24110,52064,535477,4320.96770.83010.89360.9135
    1.1Norm-no remove genetive at allN/A867,728302,5801,62077,196486,3320.99470.79670.88480.9092

III. Discussion

  • We want a Norm with very high precision (even the recall is lower).
  • The recall can be improved by applying phonetic algorithm, such as Metaphone, Caverphone, etc.
  • If the precision is low at the begining (Norm), it would keep going down when we apply phonetic algorithm.