Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Unicode Normalization
Unicode normalization is an algorithm used in Lexical Tools to normalize Unicode characters. If you are not interested in how the software works, you may skip this page.
Decomposition:
For example: [ ˚ :U+02DA RING ABOVE] = [ :U+0020 SPACE] + [ ̊ :U+030A COMBINING RING ABOVE]
For example: [ Å :U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE] = [ A :+0041 LATIN CAPITAL LETTER A] + [ ̊ :U+030A COMBINING RING ABOVE]
Composition:
For example: [ A :U+0041 LATIN CAPITAL LETTER A] + [ ̊ :U+030A COMBINING RING ABOVE] = [ Å :U+00c5 LATIN CAPITAL LETTER A WITH RING ABOVE]
Unicode Normalization:
Unicode Normalization forms define four forms of normalized test. The D, C, KD, and KC normalization differ both in whether they are the result of an initial canonical or compatibility decomposition, and in whether the decomposed text is recomposed with canonical composed characters wherever possible.
Example:
From above, we can utilize normalization algorithm to:
References: