June 13, 2019 weblog

Connecting the dots between voice and a human face

by Nancy Cohen , Tech Xplore

Once again, artificial intelligence teams tease the realm of the impossible and deliver surprising results. This team in the news figured out what a person's face may look like just based on voice. Welcome to Speech2Face. The research team found a way to reconstruct some people's very rough likeness based on short audio clips.

The paper describing their work is up on arXiv, and is titled "Speech2Face: Learning the Face Behind a Voice." Authors are Tae-Hyun Oh, Tali Dekel, Changil Kim, Inbar Mosseri, William Freemany, Michael Rubinstein and Wojciech Matusiky. "Our goal in this work is to study to what extent we can infer how a person looks from the way they talk."

They evaluate and numerically quantify how, and in what way, their Speech2Face reconstructions from audio resemble the true face images of the speakers.

The authors apparently wanted to make sure their intent was clear, not as some attempt to link voices with images of the specific people who actually spoke, as "our goal is not to predict a recognizable image of the exact face, but rather to capture dominant facial traits of the person that are correlated with the input speech."

The authors on GitHub said that they also felt it important to discuss in the paper ethical considerations "due to the potential sensitivity of facial information."

They said in their paper that their method "cannot recover the true identity of a person from their voice (i.e., an exact image of their face). This is because our model is trained to capture visual features (related to age, gender, etc.) that are common to many individuals, and only in cases where there is strong enough evidence to connect those visual features with vocal/speech attributes in the data."

They also said the model will produce average-looking faces—only average looking faces— with characteristic visual features correlated with the input speech.

Jackie Snow, Fast Company, wrote about their method. Snow said the dataset that they took was made up of clips from YouTube. Speech2Face was trained by scientists on videos from the internet that showed people talking. They created a neural network-based model that "learns vocal attributes associated with facial features from the videos."

Snow added, "Now, when the system hears a new sound bite, the AI can use what it's learned to guess what the face might look like."

Neurohive discussed their work: "From the videos, they extract speech-face pairs, which are fed into two branches of the architecture. The images are encoded into a latent vector using the pre-trained face recognition model, whilst the waveform is fed into a voice encoder in a form of a spectrogram, in order to utilize the power of convolutional architectures. The encoded vector from the voice encoder is fed into the face decoder to obtain the final face reconstruction."

One can also get a precise report on their method and how they tested with an article on Packt:

"They said they further evaluated and numerically quantified how their Speech2Face reconstructs, obtains results directly from audio, and how it resembles the true face images of the speakers. For this, they tested their model both qualitatively and quantitatively on the AVSpeech dataset and the VoxCeleb dataset."

How might their findings help realworld applications? They said, "we believe that predicting face images directly from voice may support useful applications, such as attaching a representative face to phone/video calls based on the speaker's voice."

Why their work matters: Think patterns. "Previous research has explored methods for predicting age and gender from speech," said Snow, "but in this case, the researchers claim they have also detected correlations with some facial patterns too."

More information: Speech2Face: Learning the Face Behind a Voice, arXiv:1905.09773 [cs.CV] arxiv.org/abs/1905.09773

Citation: Connecting the dots between voice and a human face (2019, June 13) retrieved 30 June 2024 from https://techxplore.com/news/2019-06-dots-voice-human.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Researchers achieve 100 percent recognition rates for half and three-quarter faces

189 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Jun 28, 2024

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

Connecting the dots between voice and a human face

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Researchers achieve 100 percent recognition rates for half and three-quarter faces

Widespread brain connections enable face recognition

New method enables high quality speech separation

Want to expand your toddler's vocabulary? Find another child

Study suggests we can recognize speakers only from how faces move when talking

Apple's smart speaker HomePod may get Face ID: report

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New work explores optimal circumstances for reaching a common goal with humanoid robots

Phys.org

Medical Xpress

Science X

Connecting the dots between voice and a human face

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Researchers achieve 100 percent recognition rates for half and three-quarter faces

Widespread brain connections enable face recognition

New method enables high quality speech separation

Want to expand your toddler's vocabulary? Find another child

Study suggests we can recognize speakers only from how faces move when talking

Apple's smart speaker HomePod may get Face ID: report

Recommended for you

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Robotic hand with tactile fingertips achieves new dexterity feat

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New work explores optimal circumstances for reaching a common goal with humanoid robots

Your Privacy