Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

About Text Categorization

This version of Text Categorization is developed 100% in Java with capability to handle UTF-8. It includes five command line tools along with Java APIs.

JDI is being used to as an automatic indexing method to substitute and help for manually indexing practices. It is also used in several NLM NLP projects to increases accuracy by identifying citations. JDI has been extended to performing Semantic Type (ST) indexing. STI uses JDI as the basis to calculate the ST rank on the similarity between the JD indexing of target text and JD indexing of ST documents. An ST document is a set of UMLS Metathesaurus concepts assigned to an ST. STI is used for applications in Word Sense Disambiguation (WSD). If the senses of an ambiguous word are expressed by STs, STI can be performed on the context surrounding the word (phrase, sentence, and paragraph) in the expectation that in the ST indexing of the context, the correct STs for the word will rank higher than the other candidate STs for the word. StWsd is developed based on STI.