Text Categorization

JDI: MeSH

  • Description:

    Read in the input MeSH (Medical Subject Headings, MH|SH) and perform JD indexing based on document count. Main Headings and Subheadings are separated by '|'. Also, Subheadings can be represented by two-letter abbreviations or full names.

  • Inputs:
    • MeSH, starred MH and SH are separated by '|'
    • a file, such as 9801.2004.MH.in

  • Algorithm:
    • Pre-Processes (Input Filter):
      • Tokenize all MeSHs (SH/MH) from the input
      • Filter out illegal MeSHs (not in Mh-Jd Table or Sh-Jd Table
      • Assign legal MeSHs
    • Processes:
    • Post-Processes (Output Filter):
      • Print out Input MeSH
      • Print out Legal MeSHs

      • Output filter option details
      • Score entries display number
      • No output message
      • JD candidate
      • Cluster option
      • Use alphabetical order for JDs have same scores

  • Sample commands:
    > jdi -imh -p
    => index input MeSH from standard input with prompt
    
    > jdi -imh -i:9801.2004.MH.in -o:9801.2004.MH.out
    => index MeSH from file, 9801.2004.MH.in, and send results to a file, 9801.2004.MH.out
    

  • Sample Outputs:
    • a file, such as 9801.2004.MH.out