Text Categorization

Semantic Type - Word Sense Disambiguation

STI (Semantic Type Indexing) uses JDI methodology as the basis to calculate the average ST scores from the Word-St table. STI can be used for word sense disambiguation by selecting the best semantic type. STWSD is a tool developed for this purpose.

I. Input

  • in text
    The sentence or paragraph with ambiguous word(s)
  • Ambiguous word
    The target ambiguous word
  • St candidates
    Possible Semantic Type in the abbreviation form

II. Output

  • The selected Semantic Type in the abbreviation form

III. Algorithm

  • Find variants of ambiguous word
    • Use Lexical Tools fruitful variants flow
    • Remove fruitful variants have punctuation (no punctuation for words in STI)
    • Unify and sort fruitful variants
  • Check WSD inputs
    • Check if input text is empty
    • Check if ST Candidates are legal STs
  • Find forced legal words
    • Tokenize all variants into words
    • Unify and sort
  • Find the ST with highest score
    • Use default input filter (or tokenize word)
    • Get STI scores (DC & WC)
    • Use combined score system
    • Find the ST with highest score