Text Categorization

PreProcess - STRI

This page describes pre-process tasks of generating input files for ST real-time index. STRI (Semantic Type Real-Time Indexing) based on JDI methodology. It calculates the average JD scores for the text from JDI, calculate the Vector similarity by using cosine coefficient on JDI score and ST-Jd scores, and then print out ST rank, ST scores, according to decreasing order of the ST scores.

  • JDI scores (Word-JD-Wc-Dc)
    • The scores are retrieved from WordJdidWcDc table generated in JDI preprocess. A JDI APIs method should be used instead of accessing this table directly.
  • ST-JD table
    • The St-Jd table was generated in STI preprocess as follows:
    • Run JDI on a stDocuments (St-words) to get St-Jd table
    • Use words as the input in JDI