TY - JOUR
T1 - Evaluation of a Generative Language Model Tool for Writing Examination Questions
AU - Edwards, Christopher J.
AU - Erstad, Brian L.
N1 - Publisher Copyright:
© 2024 American Association of Colleges of Pharmacy
PY - 2024/4
Y1 - 2024/4
N2 - Objective: To describe an evaluation of a generative language model tool to write examination questions for a new elective course focused on the interpretation of common clinical laboratory results being developed as an elective for students in a Bachelor of Science in Pharmaceutical Sciences program. Methods: A total of 100 multiple-choice questions were generated using a publicly available large language model for a course dealing with common laboratory values. Two independent evaluators with extensive training and experience in writing multiple-choice questions evaluated each question for appropriate formatting, clarity, correctness, relevancy, and difficulty. For each question, a final dichotomous judgment was assigned by each reviewer, usable as written or not usable written. Results: The major finding of this study was that a generative language model (ChatGPT 3.5) could generate multiple-choice questions for assessing common laboratory value information but only about half the questions (50% and 57% for the 2 evaluators) were deemed usable without modification. General agreement between evaluator comments was common (62% of comments) with more than 1 correct answer being the most common reason for commenting on the lack of usability (N = 27). Conclusion: The generally positive findings of this study suggest that the use of a generative language model tool for developing examination questions is deserving of further investigation.
AB - Objective: To describe an evaluation of a generative language model tool to write examination questions for a new elective course focused on the interpretation of common clinical laboratory results being developed as an elective for students in a Bachelor of Science in Pharmaceutical Sciences program. Methods: A total of 100 multiple-choice questions were generated using a publicly available large language model for a course dealing with common laboratory values. Two independent evaluators with extensive training and experience in writing multiple-choice questions evaluated each question for appropriate formatting, clarity, correctness, relevancy, and difficulty. For each question, a final dichotomous judgment was assigned by each reviewer, usable as written or not usable written. Results: The major finding of this study was that a generative language model (ChatGPT 3.5) could generate multiple-choice questions for assessing common laboratory value information but only about half the questions (50% and 57% for the 2 evaluators) were deemed usable without modification. General agreement between evaluator comments was common (62% of comments) with more than 1 correct answer being the most common reason for commenting on the lack of usability (N = 27). Conclusion: The generally positive findings of this study suggest that the use of a generative language model tool for developing examination questions is deserving of further investigation.
KW - Artificial intelligence
KW - Chatbot
KW - Examination questions
KW - Laboratory tests
KW - Pharmacy
UR - http://www.scopus.com/inward/record.url?scp=85189699747&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85189699747&partnerID=8YFLogxK
U2 - 10.1016/j.ajpe.2024.100684
DO - 10.1016/j.ajpe.2024.100684
M3 - Article
C2 - 38479646
AN - SCOPUS:85189699747
SN - 0002-9459
VL - 88
JO - American journal of pharmaceutical education
JF - American journal of pharmaceutical education
IS - 4
M1 - 100684
ER -