Text Categorization

Semantic Types Indexing (Real-Time)

STRI (Semantic Type Real-Time Indexing) uses statistical associations between the words in a training set of MEDLINE citations and a small set of 135 categories in the Semantic Network in NLM's UMLS Metathesaurus. Similar to STI, this method use ST-documents are created comprised of UMLS Metathesaurus string belonging to the ST to calculate ST score on real-time base (instead of pre-calculate). The procedures are briefly described as follows:

  • Calculate the JDI scores (Words|JD|Wc|Dc) on the input Text or MeSH,
  • Read in St-Jd scores (ST|JD|Wc|Dc) from file
  • Calculate the vector similarity between JDI score of Input (Words|JD|Wc|Dc) and St-Jd score (ST|JD|Wc|Dc) by cosine coefficients and get St scores (Words|ST|Wc|Dc).
  • Sort and display the results
This STRI program along with MEDLINE Tokenizer are used to indexing MEDLINE records on:
  • Text: phrase, titles, abstracts, combination of titles and abstracts
  • MeSHs: Starred MeSH headings and Subheadings

I. Java Software Components:

II. Programs: