SPECIALIST Lexicon

Antonym Generation for Tt Model (TtSet)

This program is used for the TtSet, which is the training data set used in 2021 to identify type of models. Theorectically, it does not need to re-run annually. In practice, we still run Steps: 40, 42-44, after 2022+ to ensure the quality of this set.

shell>cd ${ANTONYM_DIR}/bin
shell>GetAntonyms ${YEAR}

TT model: Training and Test Set

Option	Description	input	Output	Notes	Option
40	Collect and retag source from [TT] to [CC\|SN] of antonym in the training and test set TtSet.CollectAntonyms.java TtSet.RetagSrcOnAntRaw.java	${TT_DIR}/input/antonymSource.data (use 2021) ${ML_DIR}/input/3-gram.${YEAR}.30.core (previous_year) Use `shell> 06.NGramUtil ${PREV_YEAR}, option 3`.	./output/PreCand/antonymTtSet.data.TT ./output/PreCand/antonymTtSet.data	If it is the first time run, shell> mkdir ./output/PreCand link ${ML_DIR}/input/3-gram.${YEAR}.30.core => need to run option 3 on ${LMW}/bin/06.NGramUtil ${PREV_YEAR} first Retag [TT] to sources of [LEX\|SD\|PD\|CC\|SN]	40
41	No need for release!				41
42	Get antonym candidates from TtSet Collections TtSet.GenAntCandFromTtSet	${TT_DIR}/output/antonymTtSet.data ${ANT_DIR}/input/antCand.data.tag.${YEAR} ${LEX_DIR}/input/inflVars.data ${ANT_DIR}/input/domain.data	./output/Cand/antCandTtSet.data ./output/Cand/antCandTtSet.data.tbd ./output/Cand/antCandTtSet.data.tag ./output/candTagged/antCandTtSet.data.tag.tagged	TBD file should be 0 (or same number as following exceptions): `post\|E0049060\|pre\|EUI_TBD\|noun\|CANON_TBD\|TYPE_TBD\|NEG_TBD\|DOMAIN_TBD\|CC post\|E0049061\|pre\|EUI_TBD\|verb\|CANON_TBD\|TYPE_TBD\|NEG_TBD\|DOMAIN_TBD\|CC` convert to: `post\|E0049060\|pre\|EUI_NONE\|noun\|N\|NA\|O\|DOMAIN_NONE\|CC post\|E0049061\|pre\|EUI_NONE\|verb\|N\|NA\|O\|DOMAIN_NONE\|CC` Send TBD file (if other than above 2) to linguists to tag	42
43	Validate and fix tags of antonym candidates (TT) Antonym.ValidateTaggedCand.java	./output/candTagged/antCandTtSet.data.tag.tagged ${ANT_DIR}/input/domain.data	./output/candTagged/antCandTtSet.data.tag.fixed	Append tagged candidates to antCandTtSet.data.tag.tagged `post\|E0049060\|pre\|EUI_NONE\|noun\|N\|NA\|O\|DOMAIN_NONE\|CC post\|E0049061\|pre\|EUI_NONE\|verb\|N\|NA\|O\|DOMAIN_NONE\|CC` run this step until tag and fixed files are the same (should be the same after 2022+) Fixed file is the auto-fixes on [TYPE_TBD] and [DOMAIN_TBD] to [NA] and [DOMAIN_NONE]. Manually fix know exceptions (2). Manually copy the fixed file to tagged file Manually copy antCandTtSet.data.tag.tagged to antCandTtSet.data.tag.tagged.${YEAR}	43
44	Update release antonyms tagged file form TT Antonym.UpdateAllTaggedFile	./output/candTagged/antCandTtSet.data.tag.tagged.${YEAR} ${ANT_DIR}/input/antCand.data.tag.${YEAR} ${ANT_DIR}/input/domain.data	${ANT_DIR}/input/antCand.data.tag.updated	This step auto-update all antonym candidate tag file Manully copy antCand.data.tag.updated to antCand.data.tag.updated.TT The output file is used to generate antonym and negation files for the release. Re-run steps 40-44 until it passes all steps TT should be run once and pass steps from 40-44 after year 2023+.	44

The SPECIALIST Lexicon