The SPECIALIST Lexicon

Antonym - Processes for Annual Release and Stats Reports

I. Set Up

  • base directory: ${ANTONYM_DIR}
  • binary scripts: ./bin
  • data: ./data
    • 0.Antonym
  • Pre-requirements:
    Must complete updates on aPairs from LEX, SD, PD, (TT), CC, SN
shell>cd ${ANTONYM_DIR}/bin
shell>GetAntonyms ${YEAR}

II. Processes

  • Generate aPairs, negation cue words, and antonym files
    OptionDescriptioninputOutputNotesOption
    1
    • generate aPairs from tagged candidates
    • Antonym.GenAPairsFromTagCand.java
    • ${ANT_DIR}/input/antCand.data.tag.${YEAR}
    • ${ANT_DIR}/input/domain.data
    • ${LEX_DIR}/input/LRSPL
    • ./output/aPairs.data
    • This program generates aPairs with all spVars
    • This program removes duplicated aPairs by spVars from different sources
    • The result include some duplicated aPairs from the different order of aPairs from different sources. They are taken care of in Step-3.
    • This is the antonym file contains unique aPairs.
    • manually copy aPairs.data to aPairs.data.${YEAR}
    1
    2
    • generate negation cue words from tagged candidates
    • Antonym.GenNegCueWordsFromTagCand.java
    • ${ANT_DIR}/input/antCand.data.tag.${YEAR}
    • ${LEX_DIR}/input/LRSPL
    • ./output/negCueWords.data
    • This is the negation cue word file (unique).
    • manually copy negCueWords.data to negCueWords.data.${YEAR}
    2
    3
    • Gen antonyms release file from results of step-1 (DB table for Lexical Tools)
    • Antonym.GenAntFromAPairs.java
    • ./output/aPairs.data.${YEAR}
    • ./output/antonyms.data

    • ./output/antonyms.data.tagConflict
      => Must be 0, if not:
      • send ./output/antonyms.data.tagConflict to linguist to tag same aPairs.
      • manully fixed in ./input/antCand.data.tag
      • re-run step 1,2,3 (update aPairs.data.${YEAR}) until tag conflict is 0
      • then fix the tag duplicates.
    • ./output/antonyms.data.tagDuplicate
      => Must be 0, if not:
      • manually review and fix ./input/antCand.data.tag
      • delete the duplicates (by keeping the smaller EUI as EUI-1) from the ./input/antonyms.data.tagDuplicate
      • re-run step 1,2,3 (update aPairs.data.${YEAR}) until tag duplicates is 0
      • then fix the src duplicates.
    • ./output/antonyms.data.srcConflict
      • Computer program auto-fixed the src according to the following order (LEX > SD > PD > CC > SN) if the same aPair is tagged from multiple sources
      • The fixes is conducted on the input (./output/aPairs.data.${YEAR}) and result in the output (./output/antonyms.data).
      • All src conflicts are in the log file ./output/antonyms.data.srcConflict
      • In general, no extra action is needed because computer program takes care of conflicts by reassign the src and remove the one not needed. However, we can randomly check the following:
        • review conflcits in the log file ./output/antonyms.data.srcConflict
        • check src conflicts in the input file ./output/aPairs.data.${YEAR}
        • check only 1 src is used in the output file ./output/antonyms.data
        • all fxied conflicts are kept in the known exceptions for references (./output/antonyms.data.srcConflict.${YEAR}.${NO}.known.
        • known source conflicts history:
          YearException No.
          20233
          20248
          202568
    • This is the antonym release (also used as the DB table for Lexical tools).
    • manually copy antonyms.data to antonyms.data.${YEAR}
    3
    5
    • Get stats on tagged antonym candidate file
    • ${ANT_DIR}/input/antCand.data.tag.${YEAR}
    • ./output/analysis/antCand.data.tag.stats
    • ./output/analysis/domain.out.cand
    • If run the first time, shell> mkdir ${OUTPUT}/analysis
    • Generate stats and domains from antonym candidate tagged file
    5
    6
    • Get stats on canonical antonym from tagged candidate file
    • ${ANT_DIR}/input/antCand.data.tag.${YEAR}
    • ./output/analysis/antCand.data.tag.canon.stats
    • ./output/analysis/domain.out.cand.canon
    • Generate stats and domains from canonical antonym in tagged file
    6
    7
    • Get stats on antonym file
    • ./output/antonyms.data
    • ./output/antonyms.data.2-10

    • ./output/analysis/antonym.data.stats
    • ./output/analysis/domain.out.antonym
    • Generate stats and domains from antonym file
    • This file is used to update antonym growth.
    7