We present an analysis of the problem of identifying biological context and associating it with biochemical events described in biomedical texts. This constitutes a non-trivial, inter-sentential relation extraction task. We focus on biological context as descriptions of the species, tissue type, and cell type that are associated with biochemical events. We present a new corpus of open access biomedical texts that have been annotated by biology subject matter experts to highlight context-event relations. Using this corpus, we evaluate several classifiers for context-event association along with a detailed analysis of the impact of a variety of linguistic features on classifier performance. We find that gradient tree boosting performs by far the best, achieving an F1 of 0.865 in a cross-validation study.
|Number of pages
|IEEE/ACM Transactions on Computational Biology and Bioinformatics
|Published - Nov 1 2020
- data mining
- inter-sentence relation extraction
ASJC Scopus subject areas
- Applied Mathematics