Abstract
Building clinical natural language processing (NLP) systems that work on widely varying data is an absolute necessity because of the expense of obtaining new training data. While domain adaptation research can have a positive impact on this problem, the most widely studied paradigms do not take into account the realities of clinical data sharing. To address this issue, we lay out a taxonomy of domain adaptation, parameterizing by what data is shareable. We show that the most realistic settings for clinical use cases are seriously understudied. To support research in these important directions, we make a series of recommendations, not just for domain adaptation but for clinical NLP in general, that ensure that data, shared tasks, and released models are broadly useful, and that initiate research directions where the clinical NLP community can lead the broader NLP and machine learning fields.
Original language | English (US) |
---|---|
Pages (from-to) | 146-150 |
Number of pages | 5 |
Journal | JAMIA Open |
Volume | 3 |
Issue number | 2 |
DOIs | |
State | Published - Jul 1 2020 |
Keywords
- Domain adaptation
- Machine learning
- Natural language processing
- Shared resources
ASJC Scopus subject areas
- Health Informatics