CLULab-UofA at SemEval-2024 Task 8: Detecting Machine-Generated Text Using Triplet-Loss-Trained Text Similarity and Text Classification

Mohammad Hossein Rezaei, Yeaeun Kwon, Reza Sanayei, Abhyuday Singh, Steven Bethard

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Detecting machine-generated text is a critical task in the era of large language models. In this paper, we present our systems for SemEval-2024 Task 8, which focuses on multi-class classification to discern between human-written and maching-generated texts by five state-of-the-art large language models. We propose three different systems: unsupervised text similarity, triplet-loss-trained text similarity, and text classification. We show that the triplet-loss-trained text similarity system outperforms the other systems, achieving 80% accuracy on the test set and surpassing the baseline model for this subtask. Additionally, our text classification system, which takes into account sentence paraphrases generated by the candidate models, also outperforms the unsupervised text similarity system, achieving 74% accuracy.

Original languageEnglish (US)
Title of host publicationSemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop
EditorsAtul Kr. Ojha, A. Seza Dohruoz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosa
PublisherAssociation for Computational Linguistics (ACL)
Pages1498-1504
Number of pages7
ISBN (Electronic)9798891761070
StatePublished - 2024
Event18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024 - Hybrid, Mexico City, Mexico
Duration: Jun 20 2024Jun 21 2024

Publication series

NameSemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop

Conference

Conference18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024
Country/TerritoryMexico
CityHybrid, Mexico City
Period6/20/246/21/24

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'CLULab-UofA at SemEval-2024 Task 8: Detecting Machine-Generated Text Using Triplet-Loss-Trained Text Similarity and Text Classification'. Together they form a unique fingerprint.

Cite this