TY - GEN
T1 - Automatic Generation of a Large Multiple-Choice Question-Answer Corpus
AU - Kauchak, David
AU - Song, Vivien
AU - Mishra, Prashant
AU - Leroy, Gondy
AU - Harber, Phil
AU - Rains, Stephen
AU - Hamre, John
AU - Morgenstein, Nick
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - Large corpora with fine-grained metrics for difficulty and understandability are a critical resource for developing algorithms and tools to create more informative content. We introduce a new approach for automatically generating a large corpus of health-related content with associated multiple-choice questions using Google’s related questions and ChatGPT, including two new algorithms for generating potential wrong answers. We compare both the question quality as well as the suggested wrong answers using automated metrics and user studies. Overall, we find both algorithms generate reasonable questions that are complementary. Google questions use more accessible language and are easier to answer while ChatGPT questions appear easier, but are more difficult to answer and have better coverage over the entire text. For wrong answer generation, we find ChatGPT produces higher quality wrong answers that are more likely to be good distractors and are more closely related to the text content than our corpus-based approaches. We recommend both questions as options for studies with wrong answers generated by ChatGPT.
AB - Large corpora with fine-grained metrics for difficulty and understandability are a critical resource for developing algorithms and tools to create more informative content. We introduce a new approach for automatically generating a large corpus of health-related content with associated multiple-choice questions using Google’s related questions and ChatGPT, including two new algorithms for generating potential wrong answers. We compare both the question quality as well as the suggested wrong answers using automated metrics and user studies. Overall, we find both algorithms generate reasonable questions that are complementary. Google questions use more accessible language and are easier to answer while ChatGPT questions appear easier, but are more difficult to answer and have better coverage over the entire text. For wrong answer generation, we find ChatGPT produces higher quality wrong answers that are more likely to be good distractors and are more closely related to the text content than our corpus-based approaches. We recommend both questions as options for studies with wrong answers generated by ChatGPT.
KW - Corpus generation
KW - Large language model applications
KW - Text difficulty
UR - https://www.scopus.com/pages/publications/85201120106
UR - https://www.scopus.com/pages/publications/85201120106#tab=citedBy
U2 - 10.1007/978-3-031-66428-1_4
DO - 10.1007/978-3-031-66428-1_4
M3 - Conference contribution
AN - SCOPUS:85201120106
SN - 9783031664274
T3 - Lecture Notes in Networks and Systems
SP - 55
EP - 72
BT - Intelligent Systems and Applications - Proceedings of the 2024 Intelligent Systems Conference IntelliSys Volume 2
A2 - Arai, Kohei
PB - Springer Science and Business Media Deutschland GmbH
T2 - Intelligent Systems Conference, IntelliSys 2024
Y2 - 5 September 2024 through 6 September 2024
ER -