January 5, 2022

New method to make AI-generated voices more expressive

Researchers have found a way to make AI-generated voices, such as digital personal assistants, more expressive, with a minimum amount of training. The method, which translates text to speech, can also be applied to voices that were never part of the system's training set.

The team of computer scientists and electrical engineers from the University of California San Diego presented their work at the ACML 2021 conference, which took place online recently.

In addition to personal assistants for smartphones, homes and cars, the method could help improve voice-overs in animated movies, automatic translation of speech in multiple languages—and more.The method could also help create personalized speech interfaces that empower individuals who have lost the ability to speak, similar to the computerized voice that Stephen Hawking used to communicate, but far more expressive.

"We have been working in this area for a fairly long period of time," said Shehzeen Hussain, a Ph.D. student at the UC San Diego Jacobs School of Engineering and one of the paper's lead authors. "We wanted to look at the challenge of not just synthesizing speech but of adding expressive meaning to that speech."

Existing methods fall short of this work in two ways. Some systems can synthesize expressive speech for a specific speaker by using several hours of training data for that speaker. Others can synthesize speech from only a few minutes of speech data from a speaker never encountered before; but they are not able to generate expressive speech and only translate text to speech. By contrast, method developed by the UC San Diego team is the only one that can generate with minimal training expressive speech for a subject that has not been part of its training set.

The researchers flagged the pitch and rhythm of the speech in training samples, as a proxy for emotion. This allowed their cloning system to generate expressive speech with minimal training, even for voices it had never encountered before.

"We demonstrate that our proposed model can make a new voice express, emote, sing or copy the style of a given reference speech," the researchers write.

Their method can learn speech directly from text; reconstruct a speech sample from a target speaker; and transfer the pitch and rhythm of speech from a different expressive speaker into cloned speech for the target speaker.

The team is aware that their work could be used to make deepfake videos and audio clips more accurate and persuasive. As a result, they plan to release their code with a watermark that will identify the speech created by their method as cloned.

"Expressive voice cloning would become a threat if you could make natural intonations," said Paarth Neekhara, the paper's other lead author and a Ph.D. student in computer science at the Jacobs School. "The more important challenge to address is detection of these media and we will be focusing on that next."

The method itself still needs to be improved. It is biased toward English speakers and struggles with speakers with a strong accent.

More information: Paarth Neekhara et al, Expressive Neural Voice Cloning. arXiv:2102.00151v1 [cs.SD], arxiv.org/abs/2102.00151

Audio examples: expressivecloning.github.io/

Provided by University of California - San Diego

Citation: New method to make AI-generated voices more expressive (2022, January 5) retrieved 17 July 2024 from https://techxplore.com/news/2022-01-method-ai-generated-voices.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Can we perceive gender from children's voices?

123 shares

Feedback to editors

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

11 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

13 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

15 hours ago

Large language models make human-like reasoning mistakes, researchers find

16 hours ago

Unveiling a new class of synthetic fuels

16 hours ago

Microsoft unveils software that allows LLMs to work with spreadsheets

16 hours ago

New technique to assess a general-purpose AI model's reliability before it's deployed

17 hours ago

New system enables intuitive teleoperation of a robotic manipulator in real-time

20 hours ago

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

21 hours ago

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Jul 15, 2024

Load comments (0)

New method to make AI-generated voices more expressive

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Can we perceive gender from children's voices?

Speech signal processing—enhancing voice conversion models

A new model to synthesize emotional speech for companion robots

Sounds familiar: A speaker identity-controllable framework for machine speech translation

Speaking "baby talk" to infants isn't just cute: It could help them learn to make words

Study finds that adults who stutter don't stutter when they're alone

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Phys.org

Medical Xpress

Science X

New method to make AI-generated voices more expressive

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Related Stories

Can we perceive gender from children's voices?

Speech signal processing—enhancing voice conversion models

A new model to synthesize emotional speech for companion robots

Sounds familiar: A speaker identity-controllable framework for machine speech translation

Speaking "baby talk" to infants isn't just cute: It could help them learn to make words

Study finds that adults who stutter don't stutter when they're alone

Recommended for you

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Your Privacy