TY - GEN
T1 - Naturalistic Head Motion Generation from Speech
AU - Mittal, Trisha
AU - Aldeneh, Zakaria
AU - Fedzechkina, Masha
AU - Ranjan, Anurag
AU - Theobald, Barry John
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Synthesizing natural head motion to accompany speech for an embodied conversational agent is necessary for pro-viding a rich interactive experience. Most prior works assess the quality of generated head motion by comparing them against a single ground-truth using an objective metric. Yet there are many plausible head motion sequences to accompany a speech utterance. In this work, we study the variation in the perceptual quality of head motions sampled from a generative model. We show that, despite providing more di-verse head motions, the generative model produces motions with varying degrees of perceptual quality. We finally show that objective metrics commonly used in previous research do not accurately reflect the perceptual quality of generated head motions. These results open an interesting avenue for future work to investigate better objective metrics that correlate with human perception of quality.
AB - Synthesizing natural head motion to accompany speech for an embodied conversational agent is necessary for pro-viding a rich interactive experience. Most prior works assess the quality of generated head motion by comparing them against a single ground-truth using an objective metric. Yet there are many plausible head motion sequences to accompany a speech utterance. In this work, we study the variation in the perceptual quality of head motions sampled from a generative model. We show that, despite providing more di-verse head motions, the generative model produces motions with varying degrees of perceptual quality. We finally show that objective metrics commonly used in previous research do not accurately reflect the perceptual quality of generated head motions. These results open an interesting avenue for future work to investigate better objective metrics that correlate with human perception of quality.
KW - audio-visual speech
KW - head motion synthesis
KW - human-computer inter-action
KW - perceptual study
KW - speech animation
UR - https://www.scopus.com/pages/publications/85177564808
UR - https://www.scopus.com/pages/publications/85177564808#tab=citedBy
U2 - 10.1109/ICASSP49357.2023.10095684
DO - 10.1109/ICASSP49357.2023.10095684
M3 - Conference contribution
AN - SCOPUS:85177564808
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Y2 - 4 June 2023 through 10 June 2023
ER -