Option | Description | input | Output | Notes | Option
|
---|
1 |
- generate aPairs from tagged candidates
- Antonym.GenAPairsFromTagCand.java
|
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${ANT_DIR}/input/domain.data
- ${LEX_DIR}/input/LRSPL
|
|
- This program generates aPairs with all spVars
- This program removes duplicated aPairs by spVars from different sources
- The result include some duplicated aPairs from the different order of aPairs from different sources. They are taken care of in Step-3.
- This is the antonym file contains unique aPairs.
- manually copy aPairs.data to aPairs.data.${YEAR}
| 1
|
2 |
- generate negation cue words from tagged candidates
- Antonym.GenNegCueWordsFromTagCand.java
|
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${LEX_DIR}/input/LRSPL
|
- ./output/negCueWords.data
|
- This is the negation cue word file (unique).
- manually copy negCueWords.data to negCueWords.data.${YEAR}
| 2
|
3 |
- Gen antonyms release file from results of step-1 (DB table for Lexical Tools)
- Antonym.GenAntFromAPairs.java
|
- ./output/aPairs.data.${YEAR}
|
- ./output/antonyms.data
- ./output/antonyms.data.tagConflict
=> Must be 0, if not:
- send ./output/antonyms.data.tagConflict to linguist to tag same aPairs.
- manully fixed in ./input/antCand.data.tag
- re-run step 1,2,3 (update aPairs.data.${YEAR}) until tag conflict is 0
- then fix the tag duplicates.
- ./output/antonyms.data.tagDuplicate
=> Must be 0, if not:
- manually review and fix ./input/antCand.data.tag
- delete the duplicates (by keeping the smaller EUI as EUI-1) from the ./input/antonyms.data.tagDuplicate
- re-run step 1,2,3 (update aPairs.data.${YEAR}) until tag duplicates is 0
- then fix the src duplicates.
- ./output/antonyms.data.srcConflict
- Computer program auto-fixed the src according to the following order (LEX > SD > PD > CC > SN) if the same aPair is tagged from multiple sources
- The fixes is conducted on the input (./output/aPairs.data.${YEAR}) and result in the output (./output/antonyms.data).
- All src conflicts are in the log file ./output/antonyms.data.srcConflict
- In general, no extra action is needed because computer program takes care of conflicts by reassign the src and remove the one not needed. However, we can randomly check the following:
- review conflcits in the log file ./output/antonyms.data.srcConflict
- check src conflicts in the input file ./output/aPairs.data.${YEAR}
- check only 1 src is used in the output file ./output/antonyms.data
- all fxied conflicts are kept in the known exceptions for references (./output/antonyms.data.srcConflict.${YEAR}.${NO}.known.
- known source conflicts history:
Year | Exception No.
|
---|
2023 | 3
| 2024 | 8
| 2025 | 68
|
|
- This is the antonym release (also used as the DB table for Lexical tools).
- manually copy antonyms.data to antonyms.data.${YEAR}
| 3
|
5 |
- Get stats on tagged antonym candidate file
|
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
|
- ./output/analysis/antCand.data.tag.stats
- ./output/analysis/domain.out.cand
|
- If run the first time, shell> mkdir ${OUTPUT}/analysis
- Generate stats and domains from antonym candidate tagged file
| 5
|
6 |
- Get stats on canonical antonym from tagged candidate file
|
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
|
- ./output/analysis/antCand.data.tag.canon.stats
- ./output/analysis/domain.out.cand.canon
|
- Generate stats and domains from canonical antonym in tagged file
| 6
|
7 |
- Get stats on antonym file
|
|
- ./output/antonyms.data.2-10
- ./output/analysis/antonym.data.stats
- ./output/analysis/domain.out.antonym
|
- Generate stats and domains from antonym file
- This file is used to update antonym growth.
| 7
|