Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Spelling Variant Patterns - ES (Edit Distance and Sorted Distance)

I. Introduction

After using the normalization and MES (Metaphone, Edit distance & Sorted Distance) more terms (98.5% in Lexicon.2014) are identified and grouped as spelling variants. However, spelling variants such as:

  • Abkhasian|Abkhazian
  • aptha|aphtha
  • ensoul|insoul
  • toxic edema|toxic oedema
  • vinleucinol|vinleukinol
  • wholly|wholely
  • zygapophyseal|zygapophysial
  • ...
are not identified in the previous steps (1-2). The following algorithm are used for further spelling variants identifications.

II. Algorithm Details

For those terms are not identified in any spelling variants group, ES algorithm is used to checks two properties:

  • Limited Edit distance (1: 80.26%, 2:95.41%)
  • The smallest sorted distance:
    • All terms are sorted in the alphabetical order. The difference in indexes of two terms is the sorted distance.
    • The closer the sorted distance of two terms is, the more chance that these terms are spelling variants

III. Algorithm Examples

Example termNormMetaphoneEdit Distance
Edit Distance = 1
Abkhasian|Abkhazianabkhasian|abkhazianABKHXN|ABKHSN1
aptha|aphthaaptha|aphthaAP0|AF01
ensoul|insoulensoul|insoulENSL|INSL1
toxic edema|toxic oedematoxicedema|toxicoedemaTKSSTM|TKSKTM1
vinleucinol|vinleukinolvinleucinol|vinleukinolFNLSNL|FNLKNL1
wholly|wholelywholly|wholelyWL|WLL1
zincemia|zincaemiazincemia|zincaemiaSNSM|SNKM1
zygapophyseal|zygapophysialzygapophyseal|zygapophysialSKPFSL|SKPFXL1

Example termNormMetaphoneEdit Distance
Edit Distance = 2
disconnexion|disconnectiondisconnexion|disconnectionTSKNKSN|TSKNKXN2
racketball|racquetballracketball|racquetballRKTBL|RKKTBL2
subtly|subtlelysubtly|subtlelySBTL|SBTLL2
type 4 collagenase|type IV collagenasetypefourcollagenase|typeivcollagenaseTPKLJNS|TPFKLJN2
woadwax|woadwaxenwoadwax|woadwaxenWTWKS|WTWKSN2