Michelle Cohn(University of California, Davis), Eran Raveh(Saarland University), Kristin Predeck(UC Davis), Iona Gessinger(Saarland University), Bernd Möbius(Saarland University) and Georgia Zellou(UC Davis)
The present study compares how individuals perceive gradient acoustic realizations of emotion produced by a human voice versus an Amazon Alexa TTS voice. We manipulated semantically neutral sentences spoken by both talkers with identical emotional synthesis methods, using three levels of increasing 'happiness' (0%, 33%, 66% 'happier'). On each trial, listeners (native American English, n=99) rated a given sentence on two scales to assess dimensions of emotion: valence (negative-positive) and arousal (calm-excited). Participants also rated the Alexa voice on several parameters to assess anthropomorphism (e.g., naturalness, human-likeness, etc.). Results showed that the emotion manipulations led to increases in perceived positive valence and excitement. Yet, the effect differed by interlocutor: increasing 'happiness' manipulations led to larger changes for the human voice than the Alexa voice. Additionally, we observed individual differences in perceived valence/arousal based on participants' anthropomorphism scores. Overall, this line of research can speak to theories of computer personification and elucidate our changing relationship with voice-AI technology.