Skeleton-based Explainable Bodily Expressed Emotion Recognition through Graph Convolutional Networks
This is a pioneering study in the field of emotion recognition since it shifts the focus of explainable AI from voice and face as expressive modalities to bodily expressions. In this study, I proposed a procedure using state-of-the-art machine learning methods that offer accurate performance and explainable decisions. Specifically, I developed an explainable framework for bodily expressed emotion recognition using Graph Convolutional Networks (GCNs), that offer accurate performance and explainable decisions. Our findings show. In comprehensive evaluations, the study’s findings show that hands and arm movements are the most significant for automatic bodily expressed emotion recognition. These findings are in alignment with perceptual studies, which have shown that features related to the arms’ movements are correlated the most with human perception of emotions.
Besides, I supervised a Master’s Thesis on the same topic, albeit using a different approach. The student used CNNs to recognize emotions from body movements, focusing on the model’s interpretability.
Having transparent and explainable methods increases scientific discovery to have consistency with domain knowledge. It is an important research direction as it can also help reduce biases and discrimination. More information can be found in the following publications.
This video shows a demo from my work titled “Skeleton-Based Explainable Bodily Expressed Emotion Recognition Through Graph Convolutional Networks,” which was presented at the 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG2021) in Jodhpur, India, 2021:
Related Publications
Skeleton-Based Explainable Bodily Expressed Emotion Recognition Through Graph Convolutional Networks
Esam Ghaleb, André Mertens, Stylianos Asteriadis, and 1 more author
In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) 2021
Much of the focus on emotion recognition has gone into the face and voice as expressive channels, whereas bodily expressions of emotions are understudied. Moreover, current studies lack the explainability of computational features of body movements related to emotional expressions. Perceptual research on body parts’ movements shows that features related to the arms’ movements are correlated the most with human perception of emotions. In this paper, our research aims at presenting an explainable approach for bodily expressed emotion recognition. It utilizes the body joints of the human skeleton, representing them as a graph, which is used in Graph Convolutional Networks (GCNs). We improve the modelling of the GCNs by using spatial attention mechanisms based on body parts, i.e. arms, legs and torso. Our study presents a state-of-the-art explainable approach supported by experimental results on two challenging datasets. Evaluations show that the proposed methodology offers accurate performance and explainable decisions. The methodology demonstrates which body part contributes the most in its inference, showing the significance of arm movements in emotion recognition.
Explainable and Interpretable Features of Emotion in Human Body Expressions
André Mertens, Esam Ghaleb, and Stylianos Asteriadis
The cooperation between machines and humans could be improved if machines could understand and respond to the emotions of the people around them. Furthermore, the features that machines use to classify emotions should be explainable to reduce the inhibition threshold for automatic emotion recognition. However, the explainability in bodily expressivity of emotions has hardly been explored yet. Therefore, this study aims to visualize and explain the features used by neural networks to classify emotions based on body movements and postures of human characters in videos. For this purpose, a state-of-the-art neural network was selected as classification model. This network was used to classify the videos of two datasets for emotion classification. As a result, the activation of the classification features used by the model were visualized with heatmaps over the course of the videos. Furthermore, a combination of Class Activation Maps and body joint coordinates were used to compute the activation of body parts in order to investigate the existence of prototypical activation patterns in emotions. As a result, similarities were found between the activation patterns of the two datasets. These patterns may provide new insights into the classification features used by neural networks and the emotion expression in body movements and postures.