PreProcess: ST-JDs Table
- Description:
JDI is applied on St-Documents to get St-JD Scores table. The JD scores vector includes
- word count score
- document count score
- Input:
- Java File & Algorithm:
Run JDI (use the latest word-JD table) on each ST through St-Documents to and get
- Word count score
- Document count score
The default input filter option of JDI should be used. The settings are as follows:
- Remove stopwords
- Use restrictwords
- Use normalized signal filter between 2 ~ 510754
=> Please note the default max. signal in JDI.2008 is 645881 (not 510754). This is because there is a SCR (44) for the change after STI table is generated. Along with 5 stop words changes (SCR-43), there is minor different in the stJdsTables for ftcn, neop, orgf.
=>The max. signal must include cancer, blood, risk and exclude function and therapy. Susanne suggests use "cancer" as upper limit since it is not a stop word.
- Use min. word count of 2
- Use min. document count of 2
- Use min. length of 3
- Output Files:
- stJdsTable.txt
ST | TUI | Wc Scores | Dc Score | Jdid | Jd Name
|
---|
- Notes: