Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Phonetic Exceptions - Heuristic Rules

Introduction

Phonetic algorithms of Double Metaphone and Caverphone 2.0 are used to identify if terms have same pronunciations. However, the precision of these algorithms are not 100%. Heuristic rules are retrieved from Lexicon.2015 to correct these exceptions (false positives). They are described as follows:

  • This list was developed to retrieve inflectional spelling variants from Lexicon.2015
  • Terms matches the IRREG patterns are retrieved. They are terms have same EUI, POS, inflections, same phonetic codes, etc.
  • They are send to linguists to tag valid/invalid spVars
  • Over 100 heuristic rules are derived base on the tagging results

    src-suffixtar-suffixFlag

PhoneticExceptionPattern.java

  • These heuristic rules are read in and loaded into a Map:
    • String: src-suffix|tar-suffix
    • PhoneticExceptionObj (src-suffix|tar-suffix|flag)
  • Check all input pairs on both forward and backward directions and assign flag:
    • IRREG_NO: invalid (different pronunciation)
    • IRREG_YES: valid (same pronunciation)
    • IRREG_TBD: not covered in the current heuristic rules, need to add in