The SPECIALIST Lexicon

Antonym Generation for Tt Model (TtSet)

This program is used for the TtSet, which is the training data set used in 2021 to identify type of models. Theorectically, it does not need to re-run annually. In practice, we still run Steps: 40, 42-44, after 2022+ to ensure the quality of this set.

shell>cd ${ANTONYM_DIR}/bin
shell>GetAntonyms ${YEAR}

  • TT model: Training and Test Set
    OptionDescriptioninputOutputNotesOption
    40
    • Collect and retag source from [TT] to [CC|SN] of antonym in the training and test set
    • TtSet.CollectAntonyms.java
    • TtSet.RetagSrcOnAntRaw.java
    • ${TT_DIR}/input/antonymSource.data (use 2021)

    • ${ML_DIR}/input/3-gram.${YEAR}.30.core (previous_year)
      Use shell> 06.NGramUtil ${PREV_YEAR}, option 3.
    • ./output/PreCand/antonymTtSet.data.TT

    • ./output/PreCand/antonymTtSet.data
    • If it is the first time run,
      • shell> mkdir ./output/PreCand
      • link ${ML_DIR}/input/3-gram.${YEAR}.30.core
        => need to run option 3 on ${LMW}/bin/06.NGramUtil ${PREV_YEAR} first
    • Retag [TT] to sources of [LEX|SD|PD|CC|SN]
    40
    41
    • No need for release!
    41
    42
    • Get antonym candidates from TtSet Collections
    • TtSet.GenAntCandFromTtSet
    • ${TT_DIR}/output/antonymTtSet.data
    • ${ANT_DIR}/input/antCand.data.tag.${YEAR}
    • ${LEX_DIR}/input/inflVars.data
    • ${ANT_DIR}/input/domain.data
    • ./output/Cand/antCandTtSet.data
    • ./output/Cand/antCandTtSet.data.tbd
    • ./output/Cand/antCandTtSet.data.tag
    • ./output/candTagged/antCandTtSet.data.tag.tagged
    • TBD file should be 0 (or same number as following exceptions): post|E0049060|pre|EUI_TBD|noun|CANON_TBD|TYPE_TBD|NEG_TBD|DOMAIN_TBD|CC post|E0049061|pre|EUI_TBD|verb|CANON_TBD|TYPE_TBD|NEG_TBD|DOMAIN_TBD|CC
      convert to:
      post|E0049060|pre|EUI_NONE|noun|N|NA|O|DOMAIN_NONE|CC
      post|E0049061|pre|EUI_NONE|verb|N|NA|O|DOMAIN_NONE|CC
    • Send TBD file (if other than above 2) to linguists to tag
    42
    43
    • Validate and fix tags of antonym candidates (TT)
    • Antonym.ValidateTaggedCand.java
    • ./output/candTagged/antCandTtSet.data.tag.tagged
    • ${ANT_DIR}/input/domain.data
    • ./output/candTagged/antCandTtSet.data.tag.fixed
    • Append tagged candidates to antCandTtSet.data.tag.tagged
      post|E0049060|pre|EUI_NONE|noun|N|NA|O|DOMAIN_NONE|CC
      post|E0049061|pre|EUI_NONE|verb|N|NA|O|DOMAIN_NONE|CC
    • run this step until tag and fixed files are the same (should be the same after 2022+)
      • Fixed file is the auto-fixes on [TYPE_TBD] and [DOMAIN_TBD] to [NA] and [DOMAIN_NONE].
      • Manually fix know exceptions (2).
      • Manually copy the fixed file to tagged file
    • Manually copy antCandTtSet.data.tag.tagged to antCandTtSet.data.tag.tagged.${YEAR}
    43
    44
    • Update release antonyms tagged file form TT
    • Antonym.UpdateAllTaggedFile
    • ./output/candTagged/antCandTtSet.data.tag.tagged.${YEAR}
    • ${ANT_DIR}/input/antCand.data.tag.${YEAR}
    • ${ANT_DIR}/input/domain.data
    • ${ANT_DIR}/input/antCand.data.tag.updated
    • This step auto-update all antonym candidate tag file
    • Manully copy antCand.data.tag.updated to antCand.data.tag.updated.TT
    • The output file is used to generate antonym and negation files for the release.
    • Re-run steps 40-44 until it passes all steps
    • TT should be run once and pass steps from 40-44 after year 2023+.
    44