Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Ensemble Method Score

I. Introduction

Ensemble method was implemented in CSpell for comparison. The original equation are:

Ensemble Score = 0.15 * (Context Score) + 0.25 * (Frequency Score) + 0.2 * (Orthographic Score)

where:

  • Orthographic Score = (Edit Distance Score + Phonetic Similarity Score + Overlap Similarity Score)
  • Please notes there are slightly difference on the overlap similarity implementation.
  • The word frequency score uses different equation
  • The context score uses dual embedding, input matrix (syn0) and output matrix (syn1n), instead of using single embedding of the input matrix (syn0) for prediction words.

II. Results

Tests on non-word on the development set data with different ranking mode for the function mode of 1-to-1 and Split

Ranking ModeRaw dataPerformance
Orthographic592|769|7740.7698|0.7649|0.7673
Frequency534|770|7740.6935|0.6899|0.6917
Context446|554|7740.8051|0.5762|0.6713
Ensemble586|769|7740.7620|0.7571|0.7596

From the result:

  • The ensemble is a good ranking method with better performance than word frequency and Context (despite the different implementation in the ranking components).