The SPECIALIST Lexicon

Generate Multiwords from Verb Complements: Process

This section describes the processes to retrieve multiwords from verb complement types form associated verb in the Lexicon.

I. Setup and Inputs

  • Directory: ${MULTIWORDS}/data/${YEAR}/outData/14.VerbComplements
  • Program: ${MULTIWORDS}/bin/14.VerbComplements ${YEAR}

II. Multiword Varification

Not all LVCs and VPCs are LMWs. They are verified by linguists as follows:

  • Files:
    • lightVerbs.cand
    • verbParticles.cand
  • Format:
    1st field2nd field3rd field4th field
    EUImultiword (LVC or VPC)Tag from WordNetTag
  • 3rd field: multiword (LVC or VPC)
    This field is tagged by WordNet and is used for references. Three types of tags are:
    • [Y]: in the WordNet with POS is verb
    • [N]: not in the WordNet
    • [P]: in the WordNet, yet the POS is not verb. Currently, there are two instatnces:
      • E0036598|knock on|P|y
      • E0053897|rub up|P|y
  • 4th field: tag
    • [y]: a valid multiword
    • [n]: an invalid multiword

III. Processes

StepDescirptionInputsOutputsNotes
1Get raw LVCs
  • ./inData/LEXICON.release
  • lightVerbs.data.raw
  • lightVerbs.infl.raw
  • lightVerbs.form.raw
  • use the latest LEXICON
2
  • Add tags (from previous tags and WordNet) to raw LVCs
  • Generate multiword candidates for tagging
  • lightVerbs.data.raw
  • ./inData/lightVerbs.data.tag
  • ./inData/WnIndexWords.data.3.0.mw
  • lightVerbs.cand
  • lightVerbs.tag
  • Send lightVerbs.cand to linguists for tagging
3Verify linguist tagsi
  • lightVerbs.cand.tag.${YEAR}
None
  • copy tagged file to lightVerbs.cand.tag.${YEAR}
  • append lightVerbs.cand.tag.${YEAR} to ./inData/lightVerbs.data.tag
  • rerun step 2 until lightVerbs.cand is 0
4
  • Get multiwords from tagged LVCs
  • Get stats reports
  • lightVerbs.tag
  • lightVerbs.infl.raw
  • lightVerbs.form.raw
  • lightVerbs.data (used for LEXICON release)
  • lightVerbs.inflVars (used for LEXICON release)
  • lightVerbs.form
  • lightVerbs.stats
  • Use LVC type in the script
11Get raw VPCs
  • ./inData/LEXICON.release
  • verbParticles.data.raw
  • verbParticles.infl.raw
  • verbParticles.form.raw
  • use the latest LEXICON
12
  • Add tags (from previous tags and WordNet) to raw VPCs
  • Generate multiword candidates for tagging
  • verbParticles.raw
  • ./inData/verbParticles.data.tag
  • ./inData/WnIndexWords.data.3.0.mw
  • verbParticles.cand
  • verbParticles.tag
  • Send verbParticles.cand to linguists for tagging
13Verify linguist tags
  • verbParticles.cand.tag.${YEAR}
Nonenbsp;
  • copy tagged file to verbParticles.cand.tag.${YEAR}
  • append verbParticles.cand.tag.${YEAR} to ./inData/verbParticles.data.tag
  • rerun step 12 until verbParticles.cand is 0
14
  • Get multiwords from tagged VPCs
  • Get stats reports
  • verbParticles.tag
  • verbParticles.infl.raw
  • verbParticles.form.raw
  • verbParticles.data (used for LEXICON release)
  • verbParticles.inflVars (used for LEXICON release)
  • verbParticles.form
  • verbParticles.stats
  • Use VPC type in the script
20 Get Stats for LVCs, VPCs and combined VCs
  • lightVerbs.tag
  • verbParticles.tag
  • lightVerbs.tag.stats
  • verbParticles.tag.stats
  • verbComplements.tag
  • verbComplements.tag.stats
  • combine lightVerbs.tag and verbParticles.tag to verbComplements.tag