Adaptive spatio-temporal attention neural network for cross-database micro-expression recognition
Many intelligent applications and systems, including biomedical hardware and devices, require human computer interaction technology. This technology enables intelligent hardware to obtain physiological and behavioral information from humans to process and accomplish specific tasks, providing convenience in daily life and promoting societal efficiency. Further, human-computer interaction technology is relevant to many important research fields.
Emotion recognition is a significant challenge in human-computer interaction, as understanding the emotional state of humans is difficult yet important for intelligent machines during the process of interaction. Recognizing human emotions by recognizing facial micro-expressions has become increasingly popular in recent years. Micro-expressions are brief and involuntary facial expressions consisting of subtle facial muscle movements that occur when a person tries to hide emotions.
Thus, micro-expressions can usually reveal human beings' true emotional states and convey more substantial information compared to ordinary facial expressions. Therefore, the automatic recognition of micro-expressions will have potentially useful applications in many fields. For instance, clinical diagnoses, security work, and human-computer interaction.
Inspired by the studies discussed above, an adaptive spatiotemporal attention neural network (ASTANN) for CDMER is proposed in this paper. Specifically, the databases are first preprocessed by extracting optical flow information. Then, the optical flow information is combined with the facial images to generate new representations. Furthermore, three images of the new representation were chosen to serve as the dynamic expression sequence and then fitted into the network for further spatio-temporal feature extraction.
Finally, a simple yet effective loss function is developed to optimize the network parameters to alleviate the distribution gap between the source and target databases. The main advantage of this model is that it utilizes a deep neural network with a spatiotemporal attention mechanism to focus on the subtle and instant features of micro-expressions to solve CDMER problems.
By employing spatio-temporal attention, the architecture can automatically capture useful information that is sparse in the spatial and temporal domains in micro-expression samples for CDMER tasks.
The attention mechanism is introduced by calculating attention weights for samples in both the spatial and temporal domains, which highlights information that is more useful in samples for the backbone framework.
- A simple yet effective domain adaptation method is utilized to embed the correlation alignment (CORAL)
- A simple yet effective domain adaptation method is utilized to embed the correlation alignment (CORAL) loss into the first fully connected (FC) layer of the neural network, which significantly enhances the performance of cross-database tasks.
- Experiments are conducted on two benchmark tasks and the results show that authors' approach has superior performance compared with state-of-the-art (SOTA) methods.
In the future, the researchers hope to investigate whether combining multimodal information, such as text and audio, may assist the recognition process, which is a significant issue and can contribute to the research field of CDMER.
The paper is published in the journal Virtual Reality & Intelligent Hardware.
More information: Yuhan Ran et al, Adaptive spatio-temporal attention neural network for crossdatabase micro-expression recognition, Virtual Reality & Intelligent Hardware (2023). DOI: 10.1016/j.vrih.2022.03.006