Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Prefix Computer Programs

A set of computer programs is developed to retrieve prefix word|word in LEXICON and validation for derivations. This program is run annually for lvg release.

  1. Get all base forms from LEXICON (inflvars.data)
    • Program: GetBaseForms.java
    • Input:./dataOrg/inflVars.data
    • Output:./data/bases.data
    • Descriptions
      • go through all lines (inflectional variants) in file of "inflVars.data"
      • retrieve base form (infl = 1)

  2. Retrieve and validate prefix words|words
    • Program: GetPrefixWordsFromFile.java
    • Input:
      • ./dataOrg/prefix.data
      • ./data/bases.data
      • ./data/prefix.tag.data
    • Output:./data/prefixWords.meta.data
    • Descriptions
      • get prefixes from a file (./dataOrg/prefix.data)
      • get base forms from a file (./data/bases.data)
      • get prefix tags from a file (./data/prefix.tag.data)

      • Find all pairs of prefix words|words in LEXICON:
        • go through all prefixes from the sorted prefixes list
        • find all pairs of prefix word|word (prefixWordList) if:
          • prefix word is in base forms
          • word is base in base forms
      • validate all pairs of prefix words|words in prefixWordList
        • go through all pairs of prefixWord|words in prefixWordList
          • print tag ("yes" or "no") to ./data/prefixWords.meta.data
          • print "tbd" if no tag found

  3. Generate various reports from ./data/prefixWords.meta.data by tag
    • Program: GeneratePrefixFiles.java
    • Input:
      • ./data/prefixWords.meta.data
    • Output:
      • ./data/prefix.tbd.data
      • ./data/prefixWords.data
      • ./data/prefix.newTag.data
    • Descriptions
      • go through all pairs of tagged prefixWord|words in prefixWords.meta.data
        • send all "tbd" tags to prefix.tbd.data
        • send all "yes" and "no" tags to prefix.newTag.data
        • send all "yes" tags to prefixWords.data
        • Check if there is invalid tag
        • Check all comment lines

  4. Validate results:
    • Program: 2.GetPrefixWords
    • Input:
      • ./data/prefix.tag.data
      • ./data/prefix.newTag.dat
      • ./data/prefixWords.data.new
    • Output:
      • ./data/prefix.tag.data.noComment.sort
      • ./data/prefix.newTag.data.all.sort
    • Descriptions
      • Remove all comments line from prefix.tag.data
          			
        • fgrep -v '#' prefix.tag.data prefix.tag.data.noComment
        • sort -u prefix.tag.data.noComment > prefix.tag.data.noComment.sort
      • Combine results and new prefixWords (will be added in the future)
          			
        • cat prefix.newTag.data prefixWords.data.new > prefix.newTag.data.all
        • sort -u prefix.newTag.data.all > prefix.newTag.data.all.sort
      • Compare two input and results tagged files
          			
        • diff prefix.tag.data.noComment.sort prefix.newTag.data.all.sort > prefix.tag.diff

  5. Usage for (future) releases:
    • update inflVars.data from new release of LEXICON
    • update prefix.data
    • update prefixWords.data.new (for new prefix words that not in this release)

      	
    • ./bin/1.GetBaseForms ${YEAR}
    • ./bin/2.GetPrefixWords ${YEAR}
      • Check lines of prefix.tag.diff (should be 0)
      • prefixWords.data (to be added to derivations.data)
      • prefix.tbd.data (send to linguists for validations)