Text Categorization

JDI: MeSH

Description:
Read in the input MeSH (Medical Subject Headings, MH|SH) and perform JD indexing based on document count. Main Headings and Subheadings are separated by '|'. Also, Subheadings can be represented by two-letter abbreviations or full names.
Inputs:
- MeSH, starred MH and SH are separated by '|'
- a file, such as 9801.2004.MH.in
Algorithm:
- Pre-Processes (Input Filter):
  - Tokenize all MeSHs (SH/MH) from the input
  - Filter out illegal MeSHs (not in Mh-Jd Table or Sh-Jd Table
  - Assign legal MeSHs
- Processes:
  - Get JD scores for each (legal) Mesh from DB: MH_JD_SCORES table and SH_JD_SCORES table
  - Calculate Avg. score for all legal MeSHs
- Post-Processes (Output Filter):
  - Print out Input MeSH
  - Print out Legal MeSHs
  - Output filter option details
  - Score entries display number
  - No output message
  - JD candidate
  - Cluster option
  - Use alphabetical order for JDs have same scores

Sample commands:

> jdi -imh -p
=> index input MeSH from standard input with prompt

> jdi -imh -i:9801.2004.MH.in -o:9801.2004.MH.out
=> index MeSH from file, 9801.2004.MH.in, and send results to a file, 9801.2004.MH.out

Sample Outputs:
- a file, such as 9801.2004.MH.out