JDI: Text
- Description:
Read in the input text (title, abstract, phrase) and perform JD indexing based on
- word frequency count
- document count for word
- Inputs:
- a text, such as:
- a file, such as 9801.2004.TI.in
- a file, such as 9801.2004.AB.in
- a file, such as 9801.2004.TIAB.in
- Algorithm:
- Pre-Process (Input Filter):
- Tokenize all words of the input term
- Apply Word Extraction Filter (if it's MEDLINE TI or AB)
- Apply acronym filter (TBD)
- Filter out not legal words
- Filter out duplicated words if unique flag is true
- Assign the final words for processing
- Process:
- Get JD scores for each (legal) word in the text from DB: WORD_JD_SCORES table
- Calculate Avg. JD scores for the text
- Post-process (Output Filter):
- Print out Input text (term)
- Output Filter details
- Score entries display number
- No output message
- Cluster option
- JD candidates
- Use alphabetical order for JDs have same score (Ex: "taylor", "assault")
- Sample commands:
> jdi -p
=> index a text from standard input with prompt
> jdi -d -i:9801.2004.TI.in -o:9801.2004.TI.out
=> index text from file, 9801.2004.TI.in, and send the results to a file, 9801.2004.TI.out
Sample Outputs:
- a file, such as 9801.2004.TI.out
- a file, such as 9801.2004.AB.out
- a file, such as 9801.2004.TIAB.out