PreProcess: Word-ST Table
- Description:
A table (file) stores the Word-St scores is generated, loaded into DB table.
This table is then used to perform ST indexing on phrase. There are two types of scores:
- word count
- document count
- Input:
- Java File & Algorithm:
- Read in WC and DC scores for all Word-Jdid from wordJdidWcDcTable (wordJdidWcDcTable.txt)
- The order of JD scores are not sorted
- JD scores are not in the table if it is 0
- Read in WC and DC scores for all ST-Jdid from stJdsTable (stJdsTable.txt)
- The order of JD scores are sorted
- JD scores are in the table even if it is 0
- Calculate cosine coefficient on Vectors of Wc and Dc for all Word-Jdid and ST-Jdid to form Word-St-Wc-Dc tables
- Make sure all JD vectors have same amount of vector components
- Print out the tables
- Output Files:
- wordStTable.txt
Word | ST index | ST Abbreviation | TUI | Word scores | Document scores
|
---|
- Notes: