August 17, 2018 feature

A light-weight and accurate deep learning model for audiovisual emotion recognition

by Ingrid Fadelli , Tech Xplore

Researchers at Orange Labs and Normandie University have developed a novel deep neural model for audiovisual emotion recognition that performs well with small training sets. Their study, which was pre-published on arXiv, follows a philosophy of simplicity, substantially limiting the parameters that the model acquires from datasets and using simple learning techniques.

Neural networks for emotion recognition have a number of useful applications within the contexts of healthcare, customer analysis, surveillance, and even animation. While state-of-the-art deep learning algorithms have achieved remarkable results, most are still unable to reach the same understanding of emotions attained by humans.

"Our overall objective is to facilitate human-computer interaction by making computers able to perceive various subtle details expressed by humans," Frédéric Jurie, one of the researchers who carried out the study, told TechXplore. "Perceiving emotions contained in images, video, voice and sound fall within this context."

Recently, studies have put together multimodal and temporal datasets that contain annotated videos and audiovisual clips. Yet these datasets typically contain a relatively small number of annotated samples, while to perform well, most existing deep learning algorithms require larger datasets.

The researchers tried to address this issue by developing a new framework for audiovisual emotion recognition, which fuses the analysis of visual and audio footage, retaining a high level of accuracy even with relatively small training datasets. They trained their neural model on AFEW, a dataset of 773 audiovisual clips extracted from movies and annotated with discrete emotions.

"One can see this model as a black box processing the video and automatically inferring the emotional state of people," Jurie explained. "One big advantage of such deep neural models is that they learn by themselves how to process the video by analyzing examples, and do not require experts to provide specific processing units."

The model devised by the researchers follows the Occam's razor philosophical principle, which suggests that between two approaches or explanations, the simplest one is the best choice. Contrarily to other deep learning models for emotion recognition, therefore, their model is kept relatively simple. The neural network learns a limited number of parameters from the dataset and employs basic learning strategies.

"The proposed network is made of cascaded processing layers abstracting the information, from the signal to its interpretation," Jurie said. "Audio and video are processed by two different channels of the network and are combined lately in the process, almost at the end."

When tested, their light model achieved a promising emotion recognition accuracy of 60.64 percent. It was also ranked fourth at the 2018 Emotion Recognition in the Wild (EmotiW) challenge, held at the ACM International Conference on Multimodal Interaction (ICMI), in Colorado.

"Our model is proof that following the Occam's razor principle, i.e., by always choosing the simplest alternatives for designing neural networks, it is possible to limit the size of the models and obtain very compact but state-of-the-art neural networks, which are easier to train," Jurie said. "This contrasts with the research trend of making neural networks bigger and bigger."

The researchers will now continue to explore ways of achieving high accuracy in emotion recognition by simultaneously analyzing visual and auditory data, using the limited annotated training datasets that are currently available.

"We are interested in several research directions, such as how to better fuse the different modalities, how to represent emotion by compact semantically meaning full descriptors (and not only class labels) or how to make our algorithms able to learn with less, or even without, annotated data," Jurie said.

More information: An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets, arXiv:1808.02668v1 [cs.AI]. arxiv.org/abs/1808.02668

Abstract
This paper presents a light-weight and accurate deep neural model for audiovisual emotion recognition. To design this model, the authors followed a philosophy of simplicity, drastically limiting the number of parameters to learn from the target datasets, always choosing the simplest earning methods: i) transfer learning and low-dimensional space embedding allows to reduce the dimensionality of the representations. ii) The isual temporal information is handled by a simple score-per-frame selection process, averaged across time. iii) A simple frame selection echanism is also proposed to weight the images of a sequence. iv) The fusion of the different modalities is performed at prediction level (late usion). We also highlight the inherent challenges of the AFEW dataset and the difficulty of model selection with as few as 383 validation equences. The proposed real-time emotion classifier achieved a state-of-the-art accuracy of 60.64 % on the test set of AFEW, and ranked 4th at he Emotion in the Wild 2018 challenge.

Journal information: arXiv

Citation: A light-weight and accurate deep learning model for audiovisual emotion recognition (2018, August 17) retrieved 30 June 2024 from https://techxplore.com/news/2018-08-light-weight-accurate-deep-audiovisual-emotion.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Using deep neural network acceleration for image analysis in drug discovery

195 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Jun 28, 2024

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

A light-weight and accurate deep learning model for audiovisual emotion recognition

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Using deep neural network acceleration for image analysis in drug discovery

A new machine learning strategy that could enhance computer vision

An integrated visual and semantic neural network model explains human object recognition in the brain

Training artificial intelligence with artificial X-rays

AI researchers design 'privacy filter' for your photos that disables facial recognition systems

Using multi-task learning for low-latency speech translation

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Phys.org

Medical Xpress

Science X

A light-weight and accurate deep learning model for audiovisual emotion recognition

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Using deep neural network acceleration for image analysis in drug discovery

A new machine learning strategy that could enhance computer vision

An integrated visual and semantic neural network model explains human object recognition in the brain

Training artificial intelligence with artificial X-rays

AI researchers design 'privacy filter' for your photos that disables facial recognition systems

Using multi-task learning for low-latency speech translation

Recommended for you

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Your Privacy