Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Spelling Variant Patterns - Normalization
I. Introduction
Normalization can be used to find a group of spelling variants from a list of words (such as N-grams). Java programs include:
II. Development NotesSpVarNorm are tested on Lexicon.2015. All False-Positive are retrieved and analyzed to improve the algorithm to higher precision algorithm. Please see spVarNorm Development notes fordetails.
III. Algorithm Details
Description | Rule | Example | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Convert non-ASCII unicode to ASCII |
|
| ||||||||||||||||||||||||
Synonym substitution |
|
Spelling variant substitution | Rank substitution | Number substitution | Roman Number substitution |
| Punctuation | Genitive | Process this operation only the matching pattern are not the end of the term Lower case | Remove Space | |