A multilingual learner corpus for less commonly taught languages

Bruna Sommer-Farias, Aleksey Novikov, Adriana Picoral, Mariana Centanin-Bertho, Shelley Staples

Research output: Contribution to journalArticlepeer-review


This article provides a detailed account of the framework, pedagogical and research applications of the Multilingual Academic Corpus of Assignments – Writing and Speech (MACAWS).1 MACAWS is a monitor learner corpus of written and oral assignments produced by foreign language learners in the context of their language learning classrooms. Currently the corpus focuses on two less commonly taught languages rarely represented in learner corpora, Portuguese and Russian, and contains 124,054 words in Russian and 536,168 in Portuguese, being updated each semester as new texts are added to the corpus. The online interface is designed for ease of use by teachers and students. Our novel interactive data-driven learning (iDDL) tool allows embedding of concordance lines into websites and learning management systems (LMS), facilitating student interaction with concordance lines. Researchers can gain access to an offline corpus for greater flexibility.

Original languageEnglish (US)
Pages (from-to)261-282
Number of pages22
JournalInternational Journal of Learner Corpus Research
Issue number2
StatePublished - Dec 31 2022


  • Less Commonly Taught Languages (LCTL)
  • interactive data-driven learning (iDDL)
  • multilingual

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Linguistics and Language


Dive into the research topics of 'A multilingual learner corpus for less commonly taught languages'. Together they form a unique fingerprint.

Cite this