TY - GEN
T1 - A generate-and-rank framework with semantic type regularization for biomedical concept normalization
AU - Xu, Dongfang
AU - Zhang, Zeyu
AU - Bethard, Steven
N1 - Funding Information:
We thank the anonymous reviewers for their insightful comments on an earlier draft of this paper. This work was supported in part by National Institutes of Health grant R01LM012918 from the National Library of Medicine (NLM) and grant R01GM114355 from the National Institute of General Medical Sciences (NIGMS). The computations were done in systems supported by the National Science Foundation under Grant No. 1228509. This research was supported in part by an appointment to the Oak Ridge National Laboratory Advanced Short-Term Research Opportunity (ASTRO) Program, sponsored by the U.S. Department of Energy and administered by the Oak Ridge Institute for Science and Education. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, National Science Foundation, or Department of Energy.
Publisher Copyright:
© 2020 Association for Computational Linguistics
PY - 2020
Y1 - 2020
N2 - Concept normalization, the task of linking textual mentions of concepts to concepts in an ontology, is challenging because ontologies are large. In most cases, annotated datasets cover only a small sample of the concepts, yet concept normalizers are expected to predict all concepts in the ontology. In this paper, we propose an architecture consisting of a candidate generator and a list-wise ranker based on BERT. The ranker considers pairings of concept mentions and candidate concepts, allowing it to make predictions for any concept, not just those seen during training. We further enhance this list-wise approach with a semantic type regularizer that allows the model to incorporate semantic type information from the ontology during training. Our proposed concept normalization framework achieves state-of-the-art performance on multiple datasets.
AB - Concept normalization, the task of linking textual mentions of concepts to concepts in an ontology, is challenging because ontologies are large. In most cases, annotated datasets cover only a small sample of the concepts, yet concept normalizers are expected to predict all concepts in the ontology. In this paper, we propose an architecture consisting of a candidate generator and a list-wise ranker based on BERT. The ranker considers pairings of concept mentions and candidate concepts, allowing it to make predictions for any concept, not just those seen during training. We further enhance this list-wise approach with a semantic type regularizer that allows the model to incorporate semantic type information from the ontology during training. Our proposed concept normalization framework achieves state-of-the-art performance on multiple datasets.
UR - http://www.scopus.com/inward/record.url?scp=85095698726&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095698726&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85095698726
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 8452
EP - 8464
BT - ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020
Y2 - 5 July 2020 through 10 July 2020
ER -