Abstract
We present an analysis of the problem of identifying biological context and associating it with biochemical events described in biomedical texts. This constitutes a non-trivial, inter-sentential relation extraction task. We focus on biological context as descriptions of the species, tissue type, and cell type that are associated with biochemical events. We present a new corpus of open access biomedical texts that have been annotated by biology subject matter experts to highlight context-event relations. Using this corpus, we evaluate several classifiers for context-event association along with a detailed analysis of the impact of a variety of linguistic features on classifier performance. We find that gradient tree boosting performs by far the best, achieving an F1 of 0.865 in a cross-validation study.
Original language | English (US) |
---|---|
Article number | 8664185 |
Pages (from-to) | 1895-1906 |
Number of pages | 12 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Volume | 17 |
Issue number | 6 |
DOIs | |
State | Published - Nov 1 2020 |
Keywords
- Context
- NLP
- bioinformatics
- data mining
- inter-sentence relation extraction
ASJC Scopus subject areas
- Biotechnology
- Genetics
- Applied Mathematics