Triplet-Trained Vector Space and Sieve-Based Search Improve Biomedical Concept Normalization

Dongfang Xu, Steven Bethard

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Concept normalization, the task of linking textual mentions of concepts to concepts in an ontology, is critical for mining and analyzing biomedical texts. We propose a vector-space model for concept normalization, where mentions and concepts are encoded via transformer networks that are trained via a triplet objective with online hard triplet mining. The transformer networks refine existing pre-trained models, and the online triplet mining makes training efficient even with hundreds of thousands of concepts by sampling training triples within each mini-batch. We introduce a variety of strategies for searching with the trained vector-space model, including approaches that incorporate domain-specific synonyms at search time with no model retraining. Across five datasets, our models that are trained only once on their corresponding ontologies are within 3 points of state-of-the-art models that are retrained for each new domain. Our models can also be trained for each domain, achieving new state-of-the-art on multiple datasets.

Original languageEnglish (US)
Title of host publicationProceedings of the 20th Workshop on Biomedical Language Processing, BioNLP 2021
EditorsDina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
PublisherAssociation for Computational Linguistics (ACL)
Pages11-22
Number of pages12
ISBN (Electronic)9781954085404
StatePublished - 2021
Event20th Workshop on Biomedical Language Processing, BioNLP 2021 - Virtual, Online
Duration: Jun 11 2021 → …

Publication series

NameProceedings of the 20th Workshop on Biomedical Language Processing, BioNLP 2021

Conference

Conference20th Workshop on Biomedical Language Processing, BioNLP 2021
CityVirtual, Online
Period6/11/21 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Information Systems
  • Software
  • Biomedical Engineering
  • Health Informatics

Fingerprint

Dive into the research topics of 'Triplet-Trained Vector Space and Sieve-Based Search Improve Biomedical Concept Normalization'. Together they form a unique fingerprint.

Cite this