Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Exclusive Filter Rules: Derive invalid lead units and invalid end units From Lexicon
I. Introduction
By definition, any known multiwords (in Lexicon) should not begin with an invalid lead unit. The only exception is a child word of the invalid word is the lead unit of the multiwords. For example, "above" is a lead word because there are 13 multiwords found in Lexicon that beginning with "above":
II. Source of invalid lead-end-units
Category | Examples | Lexicon.2014 | Lexicon.2015 |
---|---|---|---|
auxiliary | be, do, etc. | 3 (30) | 3 (30) |
complementizer | that | 1 (1) | 1 (1) |
conjunction | and, or, but, etc. | 71 (71) | 71 (71) |
determiner | a, the, some, etc. | 38 (38) | 38 (38) |
modal | may, must, can, etc. | 8 (27) | 8 (27) |
pronoun | it, he, they, etc. | 87 (87) | 87 (87) |
preposition | to, on, by, etc. | 233 (233) | 233 (233) |
III. Algorithm
LeadEndUnit Candidate | Matches No. | LeadUnit No. | LT Examples | LeadUnit | EndUnit No. | ET Examples | EndUnit | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
across | 0 | 0 | Invalid | 0 | Invalid | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
across from | 1 | 0 |
Invalid | 0 | | Invalid
| around | 0 | 0 | | Invalid | 1 | Valid
| as | 0 | 1 | Valid | 0 | | Invalid
| as far as | 1 | 0 | | Invalid | 0 | | Invalid
| as if | 1 | 2 | Valid | 0 | | Invalid
| down | 0 | 35 | Valid | 12 | Valid
| above | 0 | 13 | Valid | 0 | | Invalid
| on | 0 | 5 | Valid | 3 | Valid
| on board | 1 | 1 | Valid | 0 | | Invalid
| out | 0 | 12 | Valid | 43 | |
shell> ./3.InvalidLeadEndTerm ${YEAR}
Step | Description | IO | Notes - Examples |
---|---|---|---|
1 | Get all words and multiwords from Lexicon
| Inputs:
Outputs:
|
|
2 | Get invalid Lead-End-Unit candidates from Lexicon
| Inputs:
Outputs:
|
|
3 | Get child lead-units and end-units of invalid Lead-End-unit candidates
| Inputs:
Outputs:
|
|
4 | Get invalid LeadTerms and invalid EndTerms from LeadEndTerm candidates
|