Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Lexicon Test - Establish the Gold Standard

Introduction

The SPECIALIST Lexicon is a good corpus to be used for testing spVar model. It includes spelling variants in base forms and Inflectional Spelling Variants.

Model (GetGoldStdFromLex.java)

  • Inputs:
    • inflVars.data:
      inflVarcatinflEUIbasecitation
    • LRSPL:
      EUISpVarcitation
    • inflSpVars.data:
      inflSpVar
  • Outputs:
    • goldStd.data
      inflVarspVar tag

      where:

      • inflVar: lowercased inflVar, unique
      • spVar tag: true|false
    • Lex.terms.out (all terms from Lexicon)
  • Algorithm:
    • Go through inflVars.data
    • Tag true if EUI are in the EUI set of base spVars (from LRSPL)
    • Tag true if term are inflSpVars (from inflSpVars.data)
    • In case of an inflVar exist in multiple lexRecords (EUIs), it is tagged as true if one of the them has spVars

  • What are missing:
    The following spelling variants are missing in this program (False Negative). These missing spVar are not included in the gold-standard (final submit) for the AIAM.2016 multiword paper.