Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Multiword Candidates Generation Processes:
SpVar Matcher with Frequency in the Distilled Medline N-gram Set

N-grams matches SpVar pattern is a good sources for multiword candidates. Over 10+ SpVar types were developed to identify spVars from a given corpus.

  • For example: terms of

    • bloodpressure
    • blood pressure
    • blood-pressure
    • tradeoff
    • trade off
    • trade-off
    are in a corpus and matches the spVar types (SVT_SPACE|SVT_PUNC_DASH) in the spVar model. Thus, they are good candidates for LMWs.
  • Frequency filter (WC) are added to this list for frequency analysis:
  • Matcher SpVar: Steps 60-61A (08.MatcherSpVar)
  • Some candidate is automatically tag [AUTO_YES|AUTO_NO]
  • Should apply highest frequency strategy
  • Not as productive as expected, not used after 2016+.

  • Generated files:
    Distilled MEDLINE nGram SetCandidate FilesStatusNotes
    2015 DoneTag [Y|N]
    2016+N/APostphone due to limited resources