Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Core-term

I. Introduction

Lots of nGrams have punctuation at the begining or/and at the end. Such as:

Input TermCoreTerm
- in details,in details
- in details
in details,
in (5) details,in (5) details
(in (5) details,
(in (5) details),

All above n-grmas are normalized to "in details" and "in (5) details" by stripping the leading or/and ending punctuation. The normalized term is called core-term, which is the core of the term. This process is called core-term normalization.

A core term might remain internal punctuation, such as "in (5) details". Also, leading or/and ending puncutation might remian in core-term, such as "clean room(s)".

II. Algorithm

Recursively repeat the following process until term does not change or legnth = 0:

  • Strip leading chars if they are punctuation except for left closed brackets, including:
    ASCII -)}]_!@#%&*\\:;\"',.?/~+=|>$`^
    Unicode ¦§­»‐‑‒–—―’”•․‥…⁈​
  • Strip ending chars if they are punctuation except for right closed brackets, including:
    ASCII -({[_!@#%&*\\:;\"',.?/~+=|>$`^
    Unicode ¦§­«‐‑‒–—―‘“•․‥…⁈​
  • strip close brackets at both ends (leading and ending position), including
    ASCII(), [], {}, <>
    Unicode«»‘’“”
    • Strip brackets of both lead end char if they matches and net bracket no* is = 0
    • Strip left brackets of lead char if net bracket no* is > 0
    • Strip right brackets of end char if net bracket no* is < 0
  • trim

* net bracket no = total left bracket no - total right bracket no

For example,

TermNet Bracket No
(in details:)0
(in (5) details:)0
(in (5) details1
in (5) details)-1

III. Examples

Input nGramCore-term
Strip punctuation
-in detailsin details
In details:In details
#$%IN DETAILS:%^(IN DETAILS
( 
() 
Strip brackets
{in (5) details}in (5) details
{{in (5) details}in (5) details
{in (5) details}}in (5) details
{in (5)} details}}{in (5)} details
Strip brackets and punctuation
(in details:)in details
(in details:))in details
(-(in details)%^)in details
{in (5) days},in (5) days
in (5 days),in (5 days)
in ((5) days),in ((5) days)
((clean room(s)))clean room(s)
((inch(es)))inch(es)