This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:



Adaptive spatio-temporal attention neural network for cross-database micro-expression recognition

Adaptive spatio-temporal attention neural network for cross-database micro-expression recognition
Spatial attention can assist the network in focusing on pixel regions that contain useful spatial information about subtle facial movements, whereas temporal attention allows the network to focus on images in time series with features that are more appropriate for recognition. Credit: Beijing Zhongke Journal Publising Co. Ltd.

Many intelligent applications and systems, including biomedical hardware and devices, require human computer interaction technology. This technology enables intelligent hardware to obtain physiological and behavioral information from humans to process and accomplish specific tasks, providing convenience in daily life and promoting societal efficiency. Further, human-computer interaction technology is relevant to many important research fields.

Emotion recognition is a significant challenge in , as understanding the emotional state of humans is difficult yet important for intelligent machines during the process of interaction. Recognizing by recognizing facial micro-expressions has become increasingly popular in recent years. Micro-expressions are brief and involuntary facial expressions consisting of subtle facial muscle movements that occur when a person tries to hide emotions.

Thus, micro-expressions can usually reveal human beings' true emotional states and convey more substantial information compared to ordinary facial expressions. Therefore, the automatic recognition of micro-expressions will have potentially useful applications in many fields. For instance, clinical diagnoses, security work, and human-computer interaction.

Inspired by the studies discussed above, an adaptive spatiotemporal attention (ASTANN) for CDMER is proposed in this paper. Specifically, the databases are first preprocessed by extracting optical flow information. Then, the optical flow information is combined with the facial images to generate new representations. Furthermore, three images of the new representation were chosen to serve as the dynamic expression sequence and then fitted into the network for further spatio-temporal feature extraction.

Finally, a simple yet effective loss function is developed to optimize the network parameters to alleviate the distribution gap between the source and target databases. The main advantage of this model is that it utilizes a deep neural network with a spatiotemporal attention mechanism to focus on the subtle and instant features of micro-expressions to solve CDMER problems.

By employing spatio-temporal attention, the architecture can automatically capture useful information that is sparse in the spatial and temporal domains in micro-expression samples for CDMER tasks.

The attention mechanism is introduced by calculating attention weights for samples in both the spatial and temporal domains, which highlights information that is more useful in samples for the backbone framework.

  • A simple yet effective domain adaptation method is utilized to embed the correlation alignment (CORAL)
  • A simple yet effective domain adaptation method is utilized to embed the correlation alignment (CORAL) loss into the first fully connected (FC) layer of the neural network, which significantly enhances the performance of cross-database tasks.
  • Experiments are conducted on two benchmark tasks and the results show that authors' approach has superior performance compared with state-of-the-art (SOTA) methods.

In the future, the researchers hope to investigate whether combining multimodal information, such as text and audio, may assist the recognition process, which is a significant issue and can contribute to the research field of CDMER.

The paper is published in the journal Virtual Reality & Intelligent Hardware.

More information: Yuhan Ran et al, Adaptive spatio-temporal attention neural network for crossdatabase micro-expression recognition, Virtual Reality & Intelligent Hardware (2023). DOI: 10.1016/j.vrih.2022.03.006

Provided by Beijing Zhongke Journal Publising Co. Ltd.
Citation: Adaptive spatio-temporal attention neural network for cross-database micro-expression recognition (2023, June 16) retrieved 2 December 2023 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

A deep learning technique for context-aware emotion recognition


Feedback to editors