Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov
Leading Punctuation Splitter
This splitter is used to process a split by adding a space before leading punctuation if a token contains leading punctuation. Leading punctuation includes: &([{
Split a token in front of leading punctuation.
File Name | Input | Output |
---|---|---|
12134.txt | doppler( | doppler ( |
12271.txt | 1-plug& | 1-plug & |
12353.txt | epilepsy( | epilepsy ( |
12353.txt | volunteers( | volunteers ( |
12706.txt | dr.[ | dr. [ |
18186.txt | test( | test ( |
18341.txt | vain( | vain ( |
2.txt | one( | one ( |
30.txt | folitrax( | folitrax ( |
50.txt | ,[ | , [ |
78.txt | genes[ | genes [ |
Broader Generic Matchers (Qualifiers) | ||
---|---|---|
Matcher | Regular Expression | Examples |
Contains Leading Punctuation | ^.*[&\\(\\[\\{].*$ |
Filters (Specific Exceptions for Each Leading Punctuation) | |||
---|---|---|---|
Leading Punctuation | Filter (Exception) | Regular Expression | Examples |
Ampersand [&] | 1. Abbreviations [A-Z]+&[A-Z]+ | ^[A-Z]+&[A-Z]+$ |
|
Left Parenthesis [(] | 1. contains digits or plus sign [non-space]*([digit]+\+?)[non-space]* | ((\\S)*\\([\\d]+(\\+)?\\)(\\S)*) |
|
2. max or min [non-space]*(max|min)[non-space]* | ((\\S)*\\((max|min)\\)) |
| |
3. contains a single char or plus [non-space]*(+char)[non-space]* | ((\\S)*\\([+\\w]\\)(\\S)*) |
| |
4. parenthetic plural forms [word]+((s|es)|(y(ies))) | ([\\w]+((s\\(es\\))|(y\\(ies\\)))) |
| |
5. after a hyphen [non-space]*-([non-space]*) | ((\\S)*-\\((\\S)*) |
| |
Left Square Bracket [[] | 1. [ [lower] ] [non-space]*[[lower]][non-space]* | (\\S*\\[[a-z]\\]\\S*) |
|
2. leads with tilde or hyphen (tilde|hyphen)[ | ([~\\-]\\[\\S*) |
| |
Left Curly Brace [{] | 1. No exceptions found | $^ | None |