Text Categorization

PreProcess - STRI

This page describes pre-process tasks of generating input files for ST real-time index. STRI (Semantic Type Real-Time Indexing) based on JDI methodology. It calculates the average JD scores for the text from JDI, calculate the Vector similarity by using cosine coefficient on JDI score and ST-Jd scores, and then print out ST rank, ST scores, according to decreasing order of the ST scores.

JDI scores (Word-JD-Wc-Dc)
- The scores are retrieved from WordJdidWcDc table generated in JDI preprocess. A JDI APIs method should be used instead of accessing this table directly.
ST-JD table
- The St-Jd table was generated in STI preprocess as follows:
- Run JDI on a stDocuments (St-words) to get St-Jd table
- Use words as the input in JDI