Overview for the First Shared Task on Language Identification in Code-Switched Data

Thamar Solorio, Elizabeth Blair, Suraj Maharjan, Steven Bethard, Mona Diab, Mahmoud Gohneim, Abdelati Hawwari, Fahad AlGhamdi, Julia Hirschberg, Alison Chang, Pascale Fung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

183 Scopus citations

Abstract

We present an overview of the first shared task on language identification on codeswitched data. The shared task included code-switched data from four language pairs: Modern Standard Arabic-Dialectal Arabic (MSA-DA), Mandarin-English (MAN-EN), Nepali-English (NEP-EN), and Spanish-English (SPA-EN). A total of seven teams participated in the task and submitted 42 system runs. The evaluation showed that language identification at the token level is more difficult when the languages present are closely related, as in the case of MSA-DA, where the prediction performance was the lowest among all language pairs. In contrast, the language pairs with the higest F-measure where SPA-EN and NEP-EN. The task made evident that language identification in code-switched data is still far from solved and warrants further research.

Original languageEnglish (US)
Title of host publication1st Workshop on Computational Approaches to Code Switching, Switching 2014 at the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 - Proceedings
EditorsMona Diab, Julia Hirschberg, Pascale Fung, Thamar Solorio
PublisherAssociation for Computational Linguistics (ACL)
Pages62-72
Number of pages11
ISBN (Electronic)9781937284961
StatePublished - 2014
Externally publishedYes
Event1st Workshop on Computational Approaches to Code Switching, Switching 2014 at the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 - Doha, Qatar
Duration: Oct 25 2014 → …

Publication series

Name1st Workshop on Computational Approaches to Code Switching, Switching 2014 at the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 - Proceedings

Conference

Conference1st Workshop on Computational Approaches to Code Switching, Switching 2014 at the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014
Country/TerritoryQatar
CityDoha
Period10/25/14 → …

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Overview for the First Shared Task on Language Identification in Code-Switched Data'. Together they form a unique fingerprint.

Cite this