Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Introduction - Build the SPECIALIST Lexicon

Background

The SPECIALIST Lexicon are widely used for Part of Speech (POS) tagging, indexing, information retrieval, concept mapping, etc. in many Natural Language Processing (NLP) projects, such as Lexical Tools, MetaMap, SemRep, UMLS Metathesaurus, and ClinicalTrials.gov. A new systematic methodology is developed to identify single words and multiwords from MEDLINE through the use of element words. The results show an accelerated growth of the Lexicon, particularly an increase in multiword records. Hence, improvement in recall and precision can be anticipated in NLP projects using the SPECIALIST Lexicon and its applications.

LexBuild Processes

  • The Lexicon is built by linguists through a web-based computer-aided tools, LexBuild. A list of high frequency element words is generated by computer programs from MEDLINE abstracts and titles. These element words are reviewed by linguists to:
    • add new Lexical records if no exact/close match is found in LexBuild
    • update existing lexical records if related records are found by close match

    All lexical records (single words or multiwords) associated with these element words are reviewed through the Essie search engine, Google Scholar, dictionaries, etc. during the LexBuild process.