Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Definition of word count

I. What is a word?

  • Words:
    A word has part of speech, inflections, and meaning.
  • Word boundary:
    spaces (or tabs) are usually used as word boundaries in NLP.
  • Single words:
    A word is separated by a space (tab). Such words are called "single word. Such as "saw", "ice-cream", "clubfoot", and "club-foot".
  • multiwords:
    A multiword is a word (has part of speech and meaning) include space. Such "ice cream", "club foot", and "drop-foot gait".

II. Word count (How many words in the SPECIALIST Lexicon)?

  • The word count is the count for different words.
  • By Definition, a word has part of speech, inflections, and meaning.
    • saw|noun|E0054443 is a different word from saw|verb|E0054444 and saw|verb|E0055007
    • All words with different categories and inflections are considered as different words. However, words with inflection of "base" could be duplicated words and should not be counted twice for categories of adj, adv, verb, aux, modal, and verb (also pres1p23p). The table below illustrates the duplicated cases for inflection of "base".
      CategoryInflVar - InflectionUnique word?Notes
      compl (8)base (1)true 
      conj (16)base (1)true 
      det (32)base (1)true 
      prep (256)base (1)true 
      pron (512)base (1)true 
      adj (1)base (1)falsepositive (256) = base (1)
      adv (2)base (1)falsepositive (256) = base (1)
      verb (1024)base (1)falseinfinitive (1024) = base (1)
      pres1p23p (262144)falseinfinitive (1024) = pres1p23p (262144)
      aux (4)base (1)falseinfinitive (1024) = base (1)
      have - pres123p (2048)falsehave: infinitive (2014) = pre123p (2048)
      modal (64)base (1)falsepres (2097152) = base (1)
      noun (128)base (1)falsesingular (512) = base (1), e.g. paper, fish, sheep
      plural (8) = base (1), e.g. police, fish, sheep
  • For the corpus without information of categories and inflections, only the spelling (forms) are taken into consideration, such as in n-gram set.

III. Lexicon Stats