Low-resource grapheme-to-phoneme mapping with phonetically-conditioned transfer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we explore a very simple nonneural approach to mapping orthography to phonetic transcription in a low-resource context with transfer data from a related language. We start from a baseline system and focus our efforts on data augmentation. We make three principal moves. First, we start with an HMMbased system (Novak et al., 2012). Second, we augment our basic system by recombining legal substrings in restricted fashion (Ryan and Hulden, 2020). Finally, we limit our transfer data by only using training pairs where the phonetic form shares all bigrams with the target language.

Original languageEnglish (US)
Title of host publicationACL 2023 - 20th SIGMORPHON Workshop on Computational Morphology, Phonology, and Phonetics, CMPP 2023
EditorsGarrett Nicolai, Eleanor Chodroff, Cagri Coltekin, Fred Mailhot
PublisherAssociation for Computational Linguistics (ACL)
Pages245-248
Number of pages4
ISBN (Electronic)9781959429937
DOIs
StatePublished - 2023
Event20th SIGMORPHON Workshop on Computational Morphology, Phonology, and Phonetics, CMPP 2023, as part of ACL 2023 - Toronto, Canada
Duration: Jul 14 2023 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference20th SIGMORPHON Workshop on Computational Morphology, Phonology, and Phonetics, CMPP 2023, as part of ACL 2023
Country/TerritoryCanada
CityToronto
Period7/14/23 → …

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Low-resource grapheme-to-phoneme mapping with phonetically-conditioned transfer'. Together they form a unique fingerprint.

Cite this