TY - GEN
T1 - An investigation of coreference phenomena in the biomedical domain
AU - Bell, Dane
AU - Hahn-Powell, Gus
AU - Valenzuela-Escárcega, Marco A.
AU - Surdeanu, Mihai
PY - 2016
Y1 - 2016
N2 - We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution. Domain-general coreference resolution algorithms perform poorly on biomedical documents, because the cues they rely on such as gender are largely absent in this domain, and because they do not encode domain-specific knowledge such as the number and type of participants required in chemical reactions. Moreover, it is difficult to directly encode this knowledge into most coreference resolution algorithms because they are not rule-based. Our rule-based architecture uses sequentially applied hand-designed "sieves", with the output of each sieve informing and constraining subsequent sieves. This architecture provides a 3.2% increase in throughput to our Reach event extraction system with precision parallel to that of the stricter system that relies solely on syntactic patterns for extraction.
AB - We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution. Domain-general coreference resolution algorithms perform poorly on biomedical documents, because the cues they rely on such as gender are largely absent in this domain, and because they do not encode domain-specific knowledge such as the number and type of participants required in chemical reactions. Moreover, it is difficult to directly encode this knowledge into most coreference resolution algorithms because they are not rule-based. Our rule-based architecture uses sequentially applied hand-designed "sieves", with the output of each sieve informing and constraining subsequent sieves. This architecture provides a 3.2% increase in throughput to our Reach event extraction system with precision parallel to that of the stricter system that relies solely on syntactic patterns for extraction.
KW - Biomedical text mining
KW - Coreference resolution
KW - Information extraction
UR - http://www.scopus.com/inward/record.url?scp=85021672691&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85021672691&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85021672691
T3 - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
SP - 177
EP - 183
BT - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Mazo, Helene
A2 - Moreno, Asuncion
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Grobelnik, Marko
A2 - Odijk, Jan
A2 - Piperidis, Stelios
A2 - Maegaard, Bente
A2 - Mariani, Joseph
PB - European Language Resources Association (ELRA)
T2 - 10th International Conference on Language Resources and Evaluation, LREC 2016
Y2 - 23 May 2016 through 28 May 2016
ER -