Clear speech in the new digital era: Speaking and listening clearly to voice-AI systems
Millions of people now regularly communicate with AI-based devices, such as smartphones, speakers, and cars. Studying these interactions can improve AI's ability to understand human speech and determine how talking with technology impacts language.
In their talk, "Clear speech in the new digital era: Speaking and listening clearly to voice-AI systems," Georgia Zellou and Michelle Cohn of the University of California, Davis described experiments to investigate how speech and comprehension change when humans communicate with AI. The presentation took place as part of the 184th Meeting of the Acoustical Society of America running May 8-12.
In their first line of questioning, Zellou and Cohn examined how people adjust their voice when communicating with an AI system compared to talking with another human. They found the participants produced louder and slower speech with less pitch variation when they spoke to voice-AI (e.g., Siri, Alexa), even across identical interactions.
On the listening side, the researchers showed that how humanlike a device sounds impacts how well listeners will understand it. If a listener thinks the voice talking is a device, they are less able to accurately understand. However, if it sounds more humanlike, their comprehension increases. Clear speech, like in the style of a newscaster, was better understood overall, even if it was machine-generated.
"We do see some differences in patterns across human- and machine-directed speech: People are louder and slower when talking to technology. These adjustments are similar to the changes speakers make when talking in background noise, such as in a crowded restaurant," said Zellou. "People also have expectations that the systems will misunderstand them and that they won't be able to understand the output."
Clarifying what makes a speaker intelligible will be useful for voice technology. For example, these results suggest that text-to-speech voices should adopt a "clear" style in noisy conditions.
Looking forward, the team aims to apply these studies to people from different age groups and social and language backgrounds. They also want to investigate how people learn language from devices and how linguistic behavior adapts as technology changes.
"There are so many open questions," said Cohn. "For example, could voice-AI be a source of language change among some speakers? As technology advances, such as with large language models like ChatGPT, the boundary between human and machine is changing—how will our language change with it?"
More information: Conference: acousticalsociety.org/asa-meetings/