TY - JOUR
T1 - Coherence and comprehensibility
T2 - Large language models predict lay understanding of health-related content
AU - Cohen, Trevor
AU - Xu, Weizhe
AU - Guo, Yue
AU - Pakhomov, Serguei
AU - Leroy, Gondy
N1 - Publisher Copyright:
© 2024
PY - 2025/1
Y1 - 2025/1
N2 - Health literacy is a prerequisite to informed health-related decision making. To facilitate understanding of information, text should be presented at an appropriate reading level for the reader. Cognitive studies suggest that the coherence of a text – the interconnectedness between the ideas it expresses – is especially important for low-knowledge readers, who lack the background knowledge to draw inferences from text that is implicitly connected only. Prior work in cognitive science has yielded automated methods to estimate coherence. These methods estimate the proximity between text representations in a semantic vector space, with the underlying idea that units of text that are poorly connected will be further apart in this space. In addition, recent work with large language models (LLMs) has produced probabilistic methodological analogues that have yet to be evaluated for this purpose. This work concerns the relationship between these automated measures and layperson comprehension of biomedical text. To characterize this relationship, we applied a range of automated measures of text coherence to a set of text snippets, some of which were deliberately modified to improve their accessibility in a series of reading comprehension experiments. Results indicate significant associations between reader comprehension – as estimated using multiple-choice questions – and LLM-derived coherence metrics. Interventions designed to improve the comprehensibility of passages also improved their coherence, as measured with the best-performing LLM-derived models and shown by improved reader understanding of the text. These findings support the utility of LLM-derived measures of text coherence as a means to identify gaps in connectedness that make biomedical text difficult for laypeople to understand, with the potential to inform both manual and automated methods to improve the accessibility of the biomedical literature.
AB - Health literacy is a prerequisite to informed health-related decision making. To facilitate understanding of information, text should be presented at an appropriate reading level for the reader. Cognitive studies suggest that the coherence of a text – the interconnectedness between the ideas it expresses – is especially important for low-knowledge readers, who lack the background knowledge to draw inferences from text that is implicitly connected only. Prior work in cognitive science has yielded automated methods to estimate coherence. These methods estimate the proximity between text representations in a semantic vector space, with the underlying idea that units of text that are poorly connected will be further apart in this space. In addition, recent work with large language models (LLMs) has produced probabilistic methodological analogues that have yet to be evaluated for this purpose. This work concerns the relationship between these automated measures and layperson comprehension of biomedical text. To characterize this relationship, we applied a range of automated measures of text coherence to a set of text snippets, some of which were deliberately modified to improve their accessibility in a series of reading comprehension experiments. Results indicate significant associations between reader comprehension – as estimated using multiple-choice questions – and LLM-derived coherence metrics. Interventions designed to improve the comprehensibility of passages also improved their coherence, as measured with the best-performing LLM-derived models and shown by improved reader understanding of the text. These findings support the utility of LLM-derived measures of text coherence as a means to identify gaps in connectedness that make biomedical text difficult for laypeople to understand, with the potential to inform both manual and automated methods to improve the accessibility of the biomedical literature.
KW - Large language models
KW - Text coherence
KW - Word embeddings
KW - layperson comprehension
UR - http://www.scopus.com/inward/record.url?scp=85211974537&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85211974537&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2024.104758
DO - 10.1016/j.jbi.2024.104758
M3 - Article
C2 - 39662650
AN - SCOPUS:85211974537
SN - 1532-0464
VL - 161
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
M1 - 104758
ER -