Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Lvg: 2005~2007 strip diacritics

Strip Diacritics:

  • Strip Diacritics using Unicode Normalization D:

    As discussed in the Unicode Normalization, Normalization D can be used for stripping diacritics. A list of sample diacritics, which are stripped by this method, are shown as the follows.

    Numeric EntityUnicode Symbol Description Stripped Character
    192 \u00c0À Capital A, grave accent A
    193 \u00c1Á Capital A, acute accent A
    194 \u00c2Â Capital A, circumflex accent A
    195 \u00c3Ã Capital A, tilde A
    196 \u00c4Ä Capital A, umlaut A
    197 \u00c5Å Capital A, ring A
    199 \u00c7Ç Capital C, cedilla C
    200 \u00c8È Capital E, grave accent E
    201 \u00c9É Capital E, acute accent E
    202 \u00caÊ Capital E, circumflex accent E
    203 \u00cbË Capital E, umlant E
    204 \u00ccÌ Capital I, grave accent I
    205 \u00cdÍ Capital I, acute accent I
    206 \u00ceÎ Capital I, circumflex accent I
    207 \u00cfÏ Capital I, umlant I
    209 \u00d1Ñ Capital N, tilde N
    210 \u00d2Ò Capital O, grave accent O
    211 \u00d3Ó Capital O, acute accent O
    212 \u00d4Ô Capital O, circumflex accent O
    213 \u00d5Õ Capital O, tilde O
    214 \u00d6Ö Capital O, umlaut O
    217 \u00d9Ù Capital U, grave accent U
    218 \u00daÚ Capital U, acute accent U
    219 \u00dbÛ Capital U, circumflex accent U
    220 \u00dcÜ Capital U, umlaut U
    221 \u00ddÝ Capital Y, acute accent Y
    224 \u00e0à Small A, grave accent a
    225 \u00e1á Small A, acute accent a
    226 \u00e2â Small A, circumflex accent a
    227 \u00e3ã Small A, tilde a
    228 \u00e4ä Small A, umlaut a
    229 \u00e5å Small A, ring a
    231 \u00e7ç Small c, cedilla c
    232 \u00e8è Small e, grave accent e
    233 \u00e9é Small e, acute accent e
    234 \u00eaê Small e, circumflex accent e
    235 \u00ebë Small e, umlant e
    236 \u00ecì Small i, grave accent i
    237 \u00edí Small i, acute accent i
    238 \u00eeî Small i, circumflex accent i
    239 \u00efï Small i, umlant i
    241 \u00f1ñ Small n, tilde n
    242 \u00f2ò Small o, grave accent o
    243 \u00f3ó Small o, acute accent o
    244 \u00f4ô Small o, circumflex accent o
    245 \u00f5õ Small o, tilde o
    246 \u00f6ö Small o, umlaut o
    249 \u00f9ù Small u, grave accent u
    250 \u00faú Small u, acute accent u
    251 \u00fbû Small u, circumflex accent u
    252 \u00fcü Small u, umlaut u
    253 \u00fdý Small y, acute accent y
    255 \u00ffÿ Small y, umlaut y

  • Strip diacritics by user's definition:

    Users may define their own diacritics stripping. The current default definitions in Lvg are shown in follows:

    Numeric EntityUnicode Symbol Description Stripped Character
    216 \u00d8Ø Latin Capital Letter O With Stroke O
    248 \u00f8ø Latin Small Letter O With Stroke o