Paralinguistic classification of mask wearing by image classifiers and fusion

Jeno Szep, Salim Hariri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations


In this study, we address the ComParE 2020 Paralinguistics Mask sub-challenge, where the task is the detection of wearing surgical masks from short speech segments. In our approach, we propose a computer-vision-based pipeline to utilize the capabilities of deep convolutional neural network-based image classifiers developed in recent years and apply this technology to a specific class of spectrograms. Several linear and logarithmic scale spectrograms were tested, and the best performance is achieved on linear-scale, 3-Channel Spectrograms created from the audio segments. A single model image classifier provided a 6.1% better result than the best single-dataset baseline model. The ensemble of our models further improves accuracy and achieves 73.0% UAR by training just on the 'train' dataset and reaches 80.1% UAR on the test set when training includes the 'devel' dataset, which result is 8.3% higher than the baseline. We also provide an activation-mapping analysis to identify frequency ranges that are critical in the 'mask' versus 'clear' classification.

Original languageEnglish (US)
Title of host publicationInterspeech 2020
PublisherInternational Speech Communication Association
Number of pages5
ISBN (Print)9781713820697
StatePublished - 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: Oct 25 2020Oct 29 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772


Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020


  • Computational paralinguistics
  • Convolutional neural networks (CNN)
  • Ensemble learning
  • Image-classification
  • Spectrogram

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation


Dive into the research topics of 'Paralinguistic classification of mask wearing by image classifiers and fusion'. Together they form a unique fingerprint.

Cite this