TY - GEN
T1 - MARiA at SemEval 2024 Task-6
T2 - 18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024
AU - Sanayei, Reza
AU - Singh, Abhyuday
AU - Rezaei, Mohammad Hossein
AU - Bethard, Steven
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - The advent of large language models (LLMs) has revolutionized Natural Language Generation (NLG), offering unmatched text generation capabilities. However, this progress introduces significant challenges, notably hallucinations-semantically incorrect yet fluent outputs. This phenomenon undermines content reliability, as traditional detection systems focus more on fluency than accuracy, posing a risk of misinformation spread. Our study addresses these issues by proposing a unified strategy for detecting hallucinations in neural model-generated text, focusing on the SHROOM task in SemEval 2024. We employ diverse methodologies to identify output divergence from the source content. We utilized Sentence Transformers to measure cosine similarity between source-hypothesis and source-target embeddings, experimented with omitting source content in the cosine similarity computations, and Leveragied LLMs' In-Context Learning with detailed task prompts as our methodologies. The varying performance of our different approaches across the subtasks underscores the complexity of Natural Language Understanding tasks, highlighting the importance of addressing the nuances of semantic correctness in the era of advanced language models.
AB - The advent of large language models (LLMs) has revolutionized Natural Language Generation (NLG), offering unmatched text generation capabilities. However, this progress introduces significant challenges, notably hallucinations-semantically incorrect yet fluent outputs. This phenomenon undermines content reliability, as traditional detection systems focus more on fluency than accuracy, posing a risk of misinformation spread. Our study addresses these issues by proposing a unified strategy for detecting hallucinations in neural model-generated text, focusing on the SHROOM task in SemEval 2024. We employ diverse methodologies to identify output divergence from the source content. We utilized Sentence Transformers to measure cosine similarity between source-hypothesis and source-target embeddings, experimented with omitting source content in the cosine similarity computations, and Leveragied LLMs' In-Context Learning with detailed task prompts as our methodologies. The varying performance of our different approaches across the subtasks underscores the complexity of Natural Language Understanding tasks, highlighting the importance of addressing the nuances of semantic correctness in the era of advanced language models.
UR - http://www.scopus.com/inward/record.url?scp=85215507588&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85215507588&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85215507588
T3 - SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop
SP - 1584
EP - 1588
BT - SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop
A2 - Ojha, Atul Kr.
A2 - Dohruoz, A. Seza
A2 - Madabushi, Harish Tayyar
A2 - Da San Martino, Giovanni
A2 - Rosenthal, Sara
A2 - Rosa, Aiala
PB - Association for Computational Linguistics (ACL)
Y2 - 20 June 2024 through 21 June 2024
ER -