Data Augmentation for End-to-end Silent Speech Recognition for Laryngectomees

Beiming Cao, Kristin Teplansky, Nordine Sebkhi, Arpan Bhavsar, Omer T. Inan, Robin Samlan, Ted Mau, Jun Wang

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Silent speech recognition (SSR) predicts textual information from silent articulation, which is an algorithm design in silent speech interfaces (SSIs). SSIs have the potential of recovering the speech ability of individuals who lost their voice but can still articulate (e.g., laryngectomees). Due to the logistic difficulties in articulatory data collection, current SSR studies suffer limited amount of dataset. Data augmentation aims to increase the training data amount by introducing variations into the existing dataset, but has rarely been investigated in SSR for laryngectomees. In this study, we investigated the effectiveness of multiple data augmentation approaches for SSR including consecutive and intermittent time masking, articulatory dimension masking, sinusoidal noise injection and randomly scaling. Different experimental setups including speaker-dependent, speaker-independent, and speaker-adaptive were used. The SSR models were end-to-end speech recognition models trained with connectionist temporal classification (CTC). Electromagnetic articulography (EMA) datasets collected from multiple healthy speakers and laryngectomees were used. The experimental results have demonstrated that the data augmentation approaches explored performed differently, but generally improved SSR performance. Especially, the consecutive time masking has brought significant improvement on SSR for both healthy speakers and laryngectomees.

Original languageEnglish (US)
Pages (from-to)3653-3657
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2022-September
DOIs
StatePublished - 2022
Event23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
Duration: Sep 18 2022Sep 22 2022

Keywords

  • alaryngeal speech
  • data augmentation
  • silent speech interface
  • silent speech recognition

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Data Augmentation for End-to-end Silent Speech Recognition for Laryngectomees'. Together they form a unique fingerprint.

Cite this