Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Dictionaries from the SPECIALIST Lexicon

I. Introduction

The SPECIALIST lexicon is a large syntactic lexicon of biomedical and general English. All lexical items are reviewed and verified by linguists. Different dictionaries are generated from the Lexicon for different needs.

II. Generation

  • Source Code: lexCheck2016/sources/gov/nih/nlm/nls/lexCheck/Api/ToDicVarsApi.java
  • The annual release of Lexicon is used as input
  • Generated in the pre-process
  • Output: lexiconDic.data
  • All lexical records in the Lexicon are converted to DicVar with 7 fields:

    Word POS Inflection Source (EUI)AcrAbb FlagproperNoun FlagspVar Flag
    Case sensitive
    • adj (1)
    • adv (2)
    • aux (4)
    • compl (8)
    • conj (16)
    • det (32)
    • modal (64)
    • noun (128)
    • prep (256)
    • pron (512)
    • verb (1024)
    • base (1)
    • comparative (2)
    • superlative (4)
    • plural (8)
    • presPart (16)
    • past (32)
    • pastPart (64)
    • pres3s (128)
    • positive (256)
    • singular (512)
    • infinitive (1024)
    • pres123p (2048)
    • pastNeg (4096)
    • pres123pNeg (8192)
    • pres1s (16384)
    • past1p23pNeg (32768)
    • past1p23p (65536)
    • past1s3sNeg (131072)
    • pres1p23p (262144)
    • pres1p23pNeg (524288)
    • past1s3s (1048576)
    • pres (2097152)
    • pres3sNeg (4194304)
    • presNeg (8388608)
    EUI
    • true
    • false
    • true
    • false
    • true
    • false

    * Unique flag from inflVar is not used. It is set to false if all properties are the same, but the type of inflectional rules are different.

III. Output

  • Directory: ${PRE_PROCESS}/data/Lexicon/${YEAR}/outData/Dic

    The following dictionaries are generated

    DictionaryDescription
    lexicon.all.dicAll terms, case sensitive
    lexicon.mw.dicmultiwords, case sensitive
    lexicon.sw.dicsingle-words, case sensitive
    lexicon.nw.dicnon words (unigram, only in mw, not in sw)
    lexicon.ew.dicelement words (= unigram = sw + nw), case sensitive
    lexicon.aa.dicabbreviations or acronyms, case sensitive
    lexicon.pn.dicproper nouns, case sensitive
    lexicon.sv.dicspelling variants, case sensitive
    lexicon.noAa.dicen + pn
    lexicon.paa.dicpure aa, (= aa - en)
    lexicon.en.dicEnglish word (= all - pn - aa), case sensitive
    lexicon.swEn.dicEnglish word, also single word only
    lexicon.noAa.dicEnglish word and proper noun (= all -aa), used in check element words in split

IV. Notes

  • Handles possessive ('s) when checking if a word in the dictionary
  • Source code: DictionaryBasedSpellChecker.java