TY - GEN
T1 - Learning what to read
T2 - 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017
AU - Noriega-Atala, Enrique
AU - Morrison, Clayton T.
AU - Valenzuela-Escárcega, Marco A.
AU - Surdeanu, Mihai
N1 - Funding Information:
This work was partially funded by the DARPA Big Mechanism program under ARO contract W911NF-14-1-0395.
Funding Information:
This work was partially funded by the DARPA Big Mechanism program under ARO contract W911NF-14-1-0395. Dr. Mihai Surdeanu discloses a financial interest in Lum.ai. This interest has been disclosed to the University of Arizona Institutional Review Committee and is being managed in accordance with its conflict of interest policies.
Publisher Copyright:
© 2017 Association for Computational Linguistics.
PY - 2017
Y1 - 2017
N2 - Recent efforts in bioinformatics have achieved tremendous progress in the machine reading of biomedical literature, and the assembly of the extracted biochemical interactions into large-scale models such as protein signaling pathways. However, batch machine reading of literature at today’s scale (PubMed alone indexes over 1 million papers per year) is unfeasible due to both cost and processing overhead. In this work, we introduce a focused reading approach to guide the machine reading of biomedical literature towards what literature should be read to answer a biomedical query as efficiently as possible. We introduce a family of algorithms for focused reading, including an intuitive, strong baseline, and a second approach which uses a reinforcement learning (RL) framework that learns when to explore (widen the search) or exploit (narrow it). We demonstrate that the RL approach is capable of answering more queries than the baseline, while being more efficient, i.e., reading fewer documents.
AB - Recent efforts in bioinformatics have achieved tremendous progress in the machine reading of biomedical literature, and the assembly of the extracted biochemical interactions into large-scale models such as protein signaling pathways. However, batch machine reading of literature at today’s scale (PubMed alone indexes over 1 million papers per year) is unfeasible due to both cost and processing overhead. In this work, we introduce a focused reading approach to guide the machine reading of biomedical literature towards what literature should be read to answer a biomedical query as efficiently as possible. We introduce a family of algorithms for focused reading, including an intuitive, strong baseline, and a second approach which uses a reinforcement learning (RL) framework that learns when to explore (widen the search) or exploit (narrow it). We demonstrate that the RL approach is capable of answering more queries than the baseline, while being more efficient, i.e., reading fewer documents.
UR - http://www.scopus.com/inward/record.url?scp=85073163921&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073163921&partnerID=8YFLogxK
U2 - 10.18653/v1/d17-1313
DO - 10.18653/v1/d17-1313
M3 - Conference contribution
AN - SCOPUS:85073163921
T3 - EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 2905
EP - 2910
BT - EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings
PB - Association for Computational Linguistics (ACL)
Y2 - 9 September 2017 through 11 September 2017
ER -