Text Categorization

STRI: MeSH

  • Description:

    Read in the input MeSH (Medical Subject Headings, MH|SH) and perform ST real-time indexing based on document count. Main Headings and Subheadings are separated by '|'. Also, Subheadings can be represented by two-letter abbreviations or full names.

  • Inputs:
    • MeSHs, starred MH and SH are separated by '|'
    • a file, such as 9801.2004.MH.in

  • Algorithm:
    • Pre-Processes (Input Filter):
      • Tokenize all Meshs (SH/MH) from the input
      • Filter out illegal Meshs (not in Mh-Jd Table or Sh-Jd Table
      • Assign legal MeSH
    • Processes:
    • Post-Processes (Output Filter):
      • Print out Input MeSH
      • Print out Legal MeSH
      • Details for output filter
      • Score entries display number
      • No output message
      • ST candidate
      • Cluster option
      • Use alphabetical order for STs have same scores

  • Sample commands:
    > stri -imh -p
    => index MeSH from standard input with prompt
    
    > stri -imh -i:9801.2004.MH.in -o:9801.2004.MH.out
    => index MeSH from file, 9801.2004.MH.in, and send results to a file, 9801.2004.MH.out
    

  • Sample Outputs:
    • a file, such as 9801.2004.MH.out