Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Multiwords: Normalization
I. Why Normalization?
A same term could be represented in many different forms (of genitive, punctuation, and case) in MEDLINE. For example, "diabetes mellitus" appears in the following n-gram terms from MEDLINE:
Normalization (by abstracting away from genitive, punctuation, and case) is applied to n-gram terms so that these terms can be grouped for further reviewed and analysis. Also, the word count of normalized n-gram terms reflects true frequency of usage on the n-gram term.
II. Normalization
III. Normalization Usage in N-gram to generate (multi)words
We used normalization as follows: