TY - GEN
T1 - Pre-trained contextualized character embeddings lead to major improvements in time normalization
T2 - 8th Joint Conference on Lexical and Computational Semantics, *SEM@NAACL-HLT 2019
AU - Xu, Dongfang
AU - Laparra, Egoitz
AU - Bethard, Steven
N1 - Publisher Copyright:
© 2019 Association for Computational Linguistics
PY - 2019
Y1 - 2019
N2 - Recent studies have shown that pre-trained contextual word embeddings, which assign the same word different vectors in different contexts, improve performance in many tasks. But while contextual embeddings can also be trained at the character level, the effectiveness of such embeddings has not been studied. We derive character-level contextual embeddings from Flair (Akbik et al., 2018), and apply them to a time normalization task, yielding major performance improvements over the previous state-of-the-art: 51% error reduction in news and 33% in clinical notes. We analyze the sources of these improvements, and find that pre-trained contextual character embeddings are more robust to term variations, infrequent terms, and cross-domain changes. We also quantify the size of context that pre-trained contextual character embeddings take advantage of, and show that such embeddings capture features like part-of-speech and capitalization.
AB - Recent studies have shown that pre-trained contextual word embeddings, which assign the same word different vectors in different contexts, improve performance in many tasks. But while contextual embeddings can also be trained at the character level, the effectiveness of such embeddings has not been studied. We derive character-level contextual embeddings from Flair (Akbik et al., 2018), and apply them to a time normalization task, yielding major performance improvements over the previous state-of-the-art: 51% error reduction in news and 33% in clinical notes. We analyze the sources of these improvements, and find that pre-trained contextual character embeddings are more robust to term variations, infrequent terms, and cross-domain changes. We also quantify the size of context that pre-trained contextual character embeddings take advantage of, and show that such embeddings capture features like part-of-speech and capitalization.
UR - http://www.scopus.com/inward/record.url?scp=85094977107&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094977107&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85094977107
T3 - *SEM@NAACL-HLT 2019 - 8th Joint Conference on Lexical and Computational Semantics
SP - 68
EP - 74
BT - *SEM@NAACL-HLT 2019 - 8th Joint Conference on Lexical and Computational Semantics
PB - Association for Computational Linguistics (ACL)
Y2 - 6 June 2019 through 7 June 2019
ER -