How specialized are specialized corpora? Behavioral evaluation of corpus representativeness for Maltese

Jerid Francom, Amy La Cross, Adam P Ussishkin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

In this paper we bring to light a novel intersection between corpus linguistics and behavioral data that can be employed as an evaluation metric for resources for low-density languages, drawing on well-established psycholinguistic factors. Using the low-density language Maltese as a test case, we highlight the challenges that face researchers developing resources for languages with sparsely available data and identify a key empirical link between corpus and psycholinguistic research as a tool to evaluate corpus resources. Specifically, we compare two robust variables identified in the psycholinguistic literature: word frequency (as measured in a corpus) and word familiarity (as measured in a rating task). We then use three statistical methods to evaluate these comparisons. This research provides a multidisciplinary approach to corpus development and evaluation, in particular for less-resourced languages that lack a wide access to diverse language data.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
EditorsDaniel Tapias, Irene Russo, Olivier Hamon, Stelios Piperidis, Nicoletta Calzolari, Khalid Choukri, Joseph Mariani, Helene Mazo, Bente Maegaard, Jan Odijk, Mike Rosner
PublisherEuropean Language Resources Association (ELRA)
Pages421-427
Number of pages7
ISBN (Electronic)2951740867, 9782951740860
StatePublished - 2010
Event7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta
Duration: May 17 2010May 23 2010

Publication series

NameProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010

Other

Other7th International Conference on Language Resources and Evaluation, LREC 2010
Country/TerritoryMalta
CityValletta
Period5/17/105/23/10

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'How specialized are specialized corpora? Behavioral evaluation of corpus representativeness for Maltese'. Together they form a unique fingerprint.

Cite this