Automatic Generation of a Large Multiple-Choice Question-Answer Corpus

David Kauchak, Vivien Song, Prashant Mishra, Gondy Leroy, Philip I Harber, Stephen Rains, John Hamre, Nick Morgenstein

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Large corpora with fine-grained metrics for difficulty and understandability are a critical resource for developing algorithms and tools to create more informative content. We introduce a new approach for automatically generating a large corpus of health-related content with associated multiple-choice questions using Google’s related questions and ChatGPT, including two new algorithms for generating potential wrong answers. We compare both the question quality as well as the suggested wrong answers using automated metrics and user studies. Overall, we find both algorithms generate reasonable questions that are complementary. Google questions use more accessible language and are easier to answer while ChatGPT questions appear easier, but are more difficult to answer and have better coverage over the entire text. For wrong answer generation, we find ChatGPT produces higher quality wrong answers that are more likely to be good distractors and are more closely related to the text content than our corpus-based approaches. We recommend both questions as options for studies with wrong answers generated by ChatGPT.

Original languageEnglish (US)
Title of host publicationIntelligent Systems and Applications - Proceedings of the 2024 Intelligent Systems Conference IntelliSys Volume 2
EditorsKohei Arai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages55-72
Number of pages18
ISBN (Print)9783031664274
DOIs
StatePublished - 2024
EventIntelligent Systems Conference, IntelliSys 2024 - Amsterdam, Netherlands
Duration: Sep 5 2024Sep 6 2024

Publication series

NameLecture Notes in Networks and Systems
Volume1066 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

ConferenceIntelligent Systems Conference, IntelliSys 2024
Country/TerritoryNetherlands
CityAmsterdam
Period9/5/249/6/24

Keywords

  • Corpus generation
  • Large language model applications
  • Text difficulty

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Automatic Generation of a Large Multiple-Choice Question-Answer Corpus'. Together they form a unique fingerprint.

Cite this