Option | Description | input | Output | Notes | Option
|
---|
40 |
- Collect and retag source from [TT] to [CC|SN] of antonym in the training and test set
- TtSet.CollectAntonyms.java
- TtSet.RetagSrcOnAntRaw.java
|
- ${TT_DIR}/input/antonymSource.data (use 2021)
- ${ML_DIR}/input/3-gram.${YEAR}.30.core (previous_year)
Use shell> 06.NGramUtil ${PREV_YEAR}, option 3 .
|
- ./output/PreCand/antonymTtSet.data.TT
- ./output/PreCand/antonymTtSet.data
|
- If it is the first time run,
- shell> mkdir ./output/PreCand
- link ${ML_DIR}/input/3-gram.${YEAR}.30.core
=> need to run option 3 on ${LMW}/bin/06.NGramUtil ${PREV_YEAR} first
- Retag [TT] to sources of [LEX|SD|PD|CC|SN]
| 40
|
41 |
|
|
|
| 41
|
42 |
- Get antonym candidates from TtSet Collections
- TtSet.GenAntCandFromTtSet
|
- ${TT_DIR}/output/antonymTtSet.data
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${LEX_DIR}/input/inflVars.data
- ${ANT_DIR}/input/domain.data
|
- ./output/Cand/antCandTtSet.data
- ./output/Cand/antCandTtSet.data.tbd
- ./output/Cand/antCandTtSet.data.tag
- ./output/candTagged/antCandTtSet.data.tag.tagged
|
- TBD file should be 0 (or same number as following exceptions):
post|E0049060|pre|EUI_TBD|noun|CANON_TBD|TYPE_TBD|NEG_TBD|DOMAIN_TBD|CC
post|E0049061|pre|EUI_TBD|verb|CANON_TBD|TYPE_TBD|NEG_TBD|DOMAIN_TBD|CC
convert to:
post|E0049060|pre|EUI_NONE|noun|N|NA|O|DOMAIN_NONE|CC
post|E0049061|pre|EUI_NONE|verb|N|NA|O|DOMAIN_NONE|CC
- Send TBD file (if other than above 2) to linguists to tag
| 42
|
43 |
- Validate and fix tags of antonym candidates (TT)
- Antonym.ValidateTaggedCand.java
|
- ./output/candTagged/antCandTtSet.data.tag.tagged
- ${ANT_DIR}/input/domain.data
|
- ./output/candTagged/antCandTtSet.data.tag.fixed
|
- Append tagged candidates to antCandTtSet.data.tag.tagged
post|E0049060|pre|EUI_NONE|noun|N|NA|O|DOMAIN_NONE|CC
post|E0049061|pre|EUI_NONE|verb|N|NA|O|DOMAIN_NONE|CC
- run this step until tag and fixed files are the same (should be the same after 2022+)
- Fixed file is the auto-fixes on [TYPE_TBD] and [DOMAIN_TBD] to [NA] and [DOMAIN_NONE].
- Manually fix know exceptions (2).
- Manually copy the fixed file to tagged file
- Manually copy antCandTtSet.data.tag.tagged to antCandTtSet.data.tag.tagged.${YEAR}
| 43
|
44 |
- Update release antonyms tagged file form TT
- Antonym.UpdateAllTaggedFile
|
- ./output/candTagged/antCandTtSet.data.tag.tagged.${YEAR}
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${ANT_DIR}/input/domain.data
|
- ${ANT_DIR}/input/antCand.data.tag.updated
|
- This step auto-update all antonym candidate tag file
- Manully copy antCand.data.tag.updated to antCand.data.tag.updated.TT
- The output file is used to generate antonym and negation files for the release.
- Re-run steps 40-44 until it passes all steps
- TT should be run once and pass steps from 40-44 after year 2023+.
| 44
|