Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Sorting order of base forms (for citation form and spelling variants)
I. Introduction
Base forms are the uninflected forms of a lexical term. They include citation form (base=...) and spelling variants (spelling_variant=...). Citation form is not a preferred term. It is an arbitrarily chosen base form of a lexical record before 2013 release. In 2014, an enhanced algorithm is implemented to uniquely choose the citation form for achieving LexRecord cross reference check task as well as improving other NLP tasks. All base forms (citation form and all spelling variants) are sorted in an order (described below) automatically during the LexBuilding process. The citation form is then assigned to the top one from the list of sorted base forms.
II. Sorting Order Details
The sorting order applied in 2014 release by Lexical System Group (LSG) are detailed as bellows:
III. Java API
Java API for this base sorting algorithm is available at:
LexCheck.gov.nih.nlm.nls.lexCheck.CheckCont.BaseComparator.java
IV. Impact Tests
Theoretically, results of Lexical Tools flow components that associated with citation forms might be different because the sorting order might assign different base forms as citation forms. These changes are not considered as errors because citation forms were chosen arbitrarily in the previous releases. We conducted a number of tests on 2013 release and compared results using Lexicons with and without new sorting order of base forms to confirm this inference. From our observation:
Bellows are the results for these tests: