Computational feature-sensitive reconstruction of language relationships: Developing the ALINE distance for comparative historical linguistic reconstruction

Sean S. Downey, Brian Hallmark, Murray P. Cox, Peter Norquest, J. Stephen Lansing

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Historical relationships among languages are used as a proxy for social history in many non-linguistic settings, including the fields of cultural and molecular anthropology. Linguists have traditionally assembled this information using the standard comparative method. While providing extremely nuanced linguistic information, this approach is time-consuming and labor-intensive. Conversely, computational approaches are appreciably quicker, but can potentially introduce significant error. Furthermore, current methods often use cognate sets that were themselves coded by historical linguists, thus reducing the benefit of computational approaches. Here we develop a method, based on the ALINE distance, to extract feature-sensitive relationships from paired glosses, datasets that require minimal contribution from trained linguists beyond transcription from primary sources. We validate our results by comparison with data generated independently via the comparative method, and quantify error rates using consistency indices. To showcase our method's utility and to demonstrate its robustness at local and regional scales, we apply it to two language datasets from eastern Indonesia. As linguistic datasets proliferate, scalable computational methods that mimic historical linguistic reconstruction will become increasingly necessary. Although at present we cannot disentangle all the processes driving linguistic change (e.g. lexical borrowing), our method provides a robust and accurate alternative to manual linguistic analysis. The feature-sensitive method adopted here accurately and automatically identifies emergent patterns hidden in traditional word-lists by analysing critical phonetic information that is discarded (or required as prerequisite) by many current cognate-based computational methods. This approach is not intended to supplant manual linguistic analysis, but has an important role in quickly generating robust data for non-linguistic fields or interdisciplinary projects that require formal quantitative analysis of historical linguistic relationships. Our approach provides a workable approximate phylogeny in cases where a trained linguist is unavailable, or otherwise significantly reduces the time and effort required for manual classification.

Original languageEnglish (US)
Pages (from-to)340-369
Number of pages30
JournalJournal of Quantitative Linguistics
Volume15
Issue number4
DOIs
StatePublished - 2008

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Computational feature-sensitive reconstruction of language relationships: Developing the ALINE distance for comparative historical linguistic reconstruction'. Together they form a unique fingerprint.

Cite this