TY - GEN
T1 - SuMe
T2 - 13th International Conference on Language Resources and Evaluation Conference, LREC 2022
AU - Bastan, Mohaddeseh
AU - Shankar, Nishant
AU - Surdeanu, Mihai
AU - Balasubramanian, Niranjan
N1 - Funding Information:
This work was supported in part by the National Science Foundation under grants IIS-1815358 and IIS-1815948.
Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.
PY - 2022
Y1 - 2022
N2 - Can language models read biomedical texts and explain the biomedical mechanisms discussed? In this work we introduce a biomedical mechanism summarization task. Biomedical studies often investigate the mechanisms behind how one entity (e.g., a protein or a chemical) affects another in a biological context. The abstracts of these publications often include a focused set of sentences that present relevant supporting statements regarding such relationships, associated experimental evidence, and a concluding sentence that summarizes the mechanism underlying the relationship. We leverage this structure and create a summarization task, where the input is a collection of sentences and the main entities in an abstract, and the output includes the relationship and a sentence that summarizes the mechanism. Using a small amount of manually labeled mechanism sentences, we train a mechanism sentence classifier to filter a large biomedical abstract collection and create a summarization dataset with 22k instances. We also introduce conclusion sentence generation as a pretraining task with 611k instances. We benchmark the performance of large bio-domain language models. We find that while the pretraining task help improves performance, the best model produces acceptable mechanism outputs in only 32% of the instances, which shows the task presents significant challenges in biomedical language understanding and summarization.
AB - Can language models read biomedical texts and explain the biomedical mechanisms discussed? In this work we introduce a biomedical mechanism summarization task. Biomedical studies often investigate the mechanisms behind how one entity (e.g., a protein or a chemical) affects another in a biological context. The abstracts of these publications often include a focused set of sentences that present relevant supporting statements regarding such relationships, associated experimental evidence, and a concluding sentence that summarizes the mechanism underlying the relationship. We leverage this structure and create a summarization task, where the input is a collection of sentences and the main entities in an abstract, and the output includes the relationship and a sentence that summarizes the mechanism. Using a small amount of manually labeled mechanism sentences, we train a mechanism sentence classifier to filter a large biomedical abstract collection and create a summarization dataset with 22k instances. We also introduce conclusion sentence generation as a pretraining task with 611k instances. We benchmark the performance of large bio-domain language models. We find that while the pretraining task help improves performance, the best model produces acceptable mechanism outputs in only 32% of the instances, which shows the task presents significant challenges in biomedical language understanding and summarization.
KW - Biomedical NLP
KW - Explanation Generation
KW - Relation Extraction
KW - Summarization
KW - Text Generation
UR - http://www.scopus.com/inward/record.url?scp=85144419908&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85144419908&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85144419908
T3 - 2022 Language Resources and Evaluation Conference, LREC 2022
SP - 6922
EP - 6931
BT - 2022 Language Resources and Evaluation Conference, LREC 2022
A2 - Calzolari, Nicoletta
A2 - Bechet, Frederic
A2 - Blache, Philippe
A2 - Choukri, Khalid
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Odijk, Jan
A2 - Piperidis, Stelios
PB - European Language Resources Association (ELRA)
Y2 - 20 June 2022 through 25 June 2022
ER -