Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Exclusive Filter: Lead-End-Units Model

I. Invalid lead-end-units

Multiwords don't start or end with seven closed class POSes, such as auxiliaries (be, do, etc.), complementizer (that), conjunctions (and, or, but, etc.), determiners (a, the, some, etc.), modals (may, must, can, etc.), pronoun (it, he, they, etc.), and prepopistions (to, on, by, etc.). These units are called invalid lead units or end units. They are used in exclusive filter to exclude invalid multiwords from the n-Grams.

Invalid lead units could be units (multiwople words), such as "as if|conj", "as far as|prep", "across from|prep", etc. "as|prep|conj" is the lead unit in "as if" and "as far as". "as" is the parent unit of "as if" or "as far as". The behavior of child units should considered as exception of it's parent unit and not take into consideration when decide if a parent unit is a invalid lead unit. For example, valid units "as if personality" is in Lexicon. Thus:

  • "as if" is a valid lead unit
  • "as" is still an invalid lead unit (in this case)

II. Candidate lead-end-units

On the other hand, a unit ends with certain units are likely a valid multiwords, such as index, test, assay, protien, factor, disease, syndrome, procedure, etc..). These are called candidate end units. Candidate lead units and end units are used in inclusive filter after exclusive filters apply on n-Grams.