TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla

Nazia Tasnim, Istiak Shihab, Asif Shahriyar Sushmit, Steven Bethard, Farig Sadeque

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Biological and healthcare domains, artistic works, and organization names can all have nested, overlapping, discontinuous entity mentions that may be syntactically or semantically ambiguous in practice. Traditional sequence tagging algorithms are unable to recognize these complex mentions because they violate the assumptions upon which sequence tagging schemes are founded. In this paper, we describe our contribution to SemEval 2022 Task 11 on identifying such complex named entities. We leveraged an ensemble of ELECTRA-based models exclusively pretrained on the Bangla language with ELECTRA-based monolingual models pretrained on English to achieve competitive performance. Besides providing a system description, we also present the outcomes of our experiments on architectural decisions, dataset augmentations and post-competition findings.

Original languageEnglish (US)
Title of host publicationSemEval 2022 - 16th International Workshop on Semantic Evaluation, Proceedings of the Workshop
EditorsGuy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan
PublisherAssociation for Computational Linguistics (ACL)
Pages1524-1530
Number of pages7
ISBN (Electronic)9781955917803
StatePublished - 2022
Event16th International Workshop on Semantic Evaluation, SemEval 2022 - Seattle, United States
Duration: Jul 14 2022Jul 15 2022

Publication series

NameSemEval 2022 - 16th International Workshop on Semantic Evaluation, Proceedings of the Workshop

Conference

Conference16th International Workshop on Semantic Evaluation, SemEval 2022
Country/TerritoryUnited States
CitySeattle
Period7/14/227/15/22

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla'. Together they form a unique fingerprint.

Cite this