Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Factor Process

I. Introduction

We exam each error types (training set) to:

  • Find the cause of error types
  • Fix the error types
    • Enhance the data or algorithm if it is a generic pattern
    • Correct the gold standard (if that is the cause)
  • Rerun the program

II. Detail Process

  • Tokens are not in Brat annotation data (correct spelling)
    • A2.2.1. Not in checkDic, corrected wrong, by dictionary
      => Need to add these words to checkDic
      => Combo-6 (best performance) is chosen as the base model for further enhancement

      Model & EnhancementTP|Ret|RelP|R|F
      Combo-6543|737|8140.7368|0.6671|0.7002
      Add Shorthand

    • A2.2.2. Not in checkDic, corrected wrong, by preCorrection
      => TBD (this analysis focus on dictionary-based correction first)

  • Tokens are in Brat annotation data (spelling error)
    • B1.1. PreCorr (T)
    • B1.2. PreCorr (F)
    • B2.1. DicCorr (T)
    • B2.2. DicCorr (F)
      • B2.2.1. Not detect, real-word (error tag)
      • B2.2.2. Not detect, spelling error (non-word)
      • B2.2.3. Detect, not candidates by edit-distance
      • B2.2.4. Detect, not candidates by suggestion Dic
      • B2.2.5. Detect, not candidates by multi-corrections
      • B2.2.6. Detect, candidates, wrong (not top) rank
      • B2.2.7. Detect, candidates, wrong top rank