April 19, 2024 report

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

by Bob Yirka , Tech Xplore

A team of AI researchers at Microsoft Research Asia has developed an AI application that converts a still image of a person and an audio track into an animation that accurately portrays the individual speaking or singing the audio track with appropriate facial expressions.

The team has published a paper describing how they created the app on the arXiv preprint server; video samples are available on the research project page.

The research team sought to animate still images talking and singing using any provided backing audio track, while also displaying believable facial expressions. They clearly succeeded with the development of VASA-1, an AI system that turns static images, whether captured by a camera, drawn, or painted, into what they describe as "exquisitely synchronized" animations.

The group has proven the effectiveness of their system by posting short video clips of their test results. In one, a cartoon version of the Mona Lisa is performs a rap song; in another, a photograph of a woman has been transformed into a singing performance, and in yet another, a drawing of a man delivers a speech.

In each of the animations, the facial expressions change along with the words in a way that emphasizes what is being said. The researchers note also that despite the life-like nature of the videos, closer inspection can reveal flaws and evidence that they have been artificially generated.

Credit: Microsoft

The research team achieved their results by training their app on thousands of images with a wide variety of facial expressions. They also note that the system currently produces 512-by-512-pixel imagery running at 45 frames per second. Also, it took an average of two minutes to produce the videos using a desktop-grade Nvidia RTX 4090 GPU.

The research team suggests that VASA-1 could be used to generate extremely lifelike avatars for games or simulations. At the same time, they acknowledge the potential for abuse and are therefore not making the system available for general use.

More information: Sicheng Xu et al, VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time, arXiv (2024). DOI: 10.48550/arxiv.2404.10667

Project page: www.microsoft.com/en-us/research/project/vasa-1/

Journal information: arXiv

Citation: Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions (2024, April 19) retrieved 16 August 2024 from https://techxplore.com/news/2024-04-microsoft-ai-app-vasa-believable.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

AI system can convert voice track to video of a person speaking using a still image

71 shares

Feedback to editors

Engineers design tiny batteries for powering cell-sized robots

10 hours ago

Leaf-like solar concentrators promise major boost in solar efficiency

11 hours ago

Why does AI beat humans at the strategy game Diplomacy?

11 hours ago

New technique prints metal oxide thin film circuits at room temperature

12 hours ago

Studies highlight challenges and solutions in making large language models trustworthy

13 hours ago

Finding security flaws in Android ahead of malicious hackers

14 hours ago

Robot planning tool accounts for human carelessness

14 hours ago

From shrimp to steel: Introducing nature-inspired metalworking

15 hours ago

'AI Scientist' model designed to conduct scientific research autonomously

15 hours ago

Global AI adoption is outpacing risk understanding, researchers warn

16 hours ago

Load comments (1)

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

Engineers design tiny batteries for powering cell-sized robots

Leaf-like solar concentrators promise major boost in solar efficiency

Why does AI beat humans at the strategy game Diplomacy?

New technique prints metal oxide thin film circuits at room temperature

Studies highlight challenges and solutions in making large language models trustworthy

Finding security flaws in Android ahead of malicious hackers

Robot planning tool accounts for human carelessness

From shrimp to steel: Introducing nature-inspired metalworking

'AI Scientist' model designed to conduct scientific research autonomously

Global AI adoption is outpacing risk understanding, researchers warn

AI system can convert voice track to video of a person speaking using a still image

Creating realistic 'talking heads' with an AI-powered program

AI-powered 'sonar' on smartglasses tracks gaze, facial expressions

Different people interpret facial impressions very differently, study reveals

Housecats use hundreds of facial expressions to interact with other cats

Robotic face makes eye contact, uses AI to anticipate and replicate a person's smile before it occurs

A two-stage framework to improve LLM-based anomaly detection and reactive planning

'AI Scientist' model designed to conduct scientific research autonomously

Global AI adoption is outpacing risk understanding, researchers warn

Why does AI beat humans at the strategy game Diplomacy?

Studies highlight challenges and solutions in making large language models trustworthy

How working with AI impacts the collective attention of teams

Phys.org

Medical Xpress

Science X

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

Engineers design tiny batteries for powering cell-sized robots

Leaf-like solar concentrators promise major boost in solar efficiency

Why does AI beat humans at the strategy game Diplomacy?

New technique prints metal oxide thin film circuits at room temperature

Studies highlight challenges and solutions in making large language models trustworthy

Finding security flaws in Android ahead of malicious hackers

Robot planning tool accounts for human carelessness

From shrimp to steel: Introducing nature-inspired metalworking

'AI Scientist' model designed to conduct scientific research autonomously

Global AI adoption is outpacing risk understanding, researchers warn

Related Stories

AI system can convert voice track to video of a person speaking using a still image

Creating realistic 'talking heads' with an AI-powered program

AI-powered 'sonar' on smartglasses tracks gaze, facial expressions

Different people interpret facial impressions very differently, study reveals

Housecats use hundreds of facial expressions to interact with other cats

Robotic face makes eye contact, uses AI to anticipate and replicate a person's smile before it occurs

Recommended for you

A two-stage framework to improve LLM-based anomaly detection and reactive planning

'AI Scientist' model designed to conduct scientific research autonomously

Global AI adoption is outpacing risk understanding, researchers warn

Why does AI beat humans at the strategy game Diplomacy?

Studies highlight challenges and solutions in making large language models trustworthy

How working with AI impacts the collective attention of teams

Your Privacy