TY - GEN
T1 - Using the Hammer only on Nails
T2 - 43rd European Conference on Information Retrieval Research, ECIR 2021
AU - Liang, Zhengzhong
AU - Zhao, Yiyun
AU - Surdeanu, Mihai
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Evidence retrieval is a key component of explainable question answering (QA). We argue that, despite recent progress, transformer network-based approaches such as universal sentence encoder (USE-QA) do not always outperform traditional information retrieval (IR) methods such as BM25 for evidence retrieval for QA. We introduce a lexical probing task that validates this observation: we demonstrate that neural IR methods have the capacity to capture lexical differences between questions and answers, but miss obvious lexical overlap signal. Learning from this probing analysis, we introduce a hybrid approach for representation-based evidence retrieval that combines the advantages of both IR directions. Our approach uses a routing classifier that learns when to direct incoming questions to BM25 vs. USE-QA for evidence retrieval using very simple statistics, which can be efficiently extracted from the top candidate evidence sentences produced by a BM25 model. We demonstrate that this hybrid evidence retrieval generally performs better than either individual retrieval strategy on three QA datasets: OpenBookQA, ReQA SQuAD, and ReQA NQ. Furthermore, we show that the proposed routing strategy is considerably faster than neural methods, with a runtime that is up to 5 times faster than USE-QA.
AB - Evidence retrieval is a key component of explainable question answering (QA). We argue that, despite recent progress, transformer network-based approaches such as universal sentence encoder (USE-QA) do not always outperform traditional information retrieval (IR) methods such as BM25 for evidence retrieval for QA. We introduce a lexical probing task that validates this observation: we demonstrate that neural IR methods have the capacity to capture lexical differences between questions and answers, but miss obvious lexical overlap signal. Learning from this probing analysis, we introduce a hybrid approach for representation-based evidence retrieval that combines the advantages of both IR directions. Our approach uses a routing classifier that learns when to direct incoming questions to BM25 vs. USE-QA for evidence retrieval using very simple statistics, which can be efficiently extracted from the top candidate evidence sentences produced by a BM25 model. We demonstrate that this hybrid evidence retrieval generally performs better than either individual retrieval strategy on three QA datasets: OpenBookQA, ReQA SQuAD, and ReQA NQ. Furthermore, we show that the proposed routing strategy is considerably faster than neural methods, with a runtime that is up to 5 times faster than USE-QA.
KW - BM25
KW - Neural information retrieval
KW - Representation-based
UR - http://www.scopus.com/inward/record.url?scp=85107345483&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107345483&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-72113-8_22
DO - 10.1007/978-3-030-72113-8_22
M3 - Conference contribution
AN - SCOPUS:85107345483
SN - 9783030721121
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 327
EP - 341
BT - Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Proceedings
A2 - Hiemstra, Djoerd
A2 - Moens, Marie-Francine
A2 - Mothe, Josiane
A2 - Perego, Raffaele
A2 - Potthast, Martin
A2 - Sebastiani, Fabrizio
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 28 March 2021 through 1 April 2021
ER -