Self-Suerpvised Learning (SSL) for sensor-based human activity recognition
While working as a post-doctoral researcher at Maastricht University, I was fortunate enough to be the daily supervisor of Bulat Khaertdinov, a Ph.D. candidate at UM. I guided and collaborated with Bulat to work in Self-Supervised Learning (SSL) for sensor-based Human Activity Recognition (HAR). During my Ph.D. research, when working on Deep Metric Learning, I recognized that SSL could improve the learning process, especially for those tasks (e.g., sensor-based HAR) where data collection and annotation are time-consuming and expensive. We conducted two studies, introducing a novel method based on contrastive-based SSL. Our study also addressed the problem of negative pairs in contrastive learning by using dynamic temperature scaling within a contrastive loss function. The extensive evaluations of three widely used open-source datasets have shown that the proposed method achieves state-of-the-art SSL activity recognition task results. Furthermore, it has demonstrated strong potential in semi-supervised and transfer learning by outperforming many counterpart baseline methods.
Related Publications
Contrastive Self-supervised Learning for Sensor-based Human Activity Recognition
Bulat Khaertdinov, Esam Ghaleb, and Stylianos Asteriadis
In 2021 IEEE International Joint Conference on Biometrics (IJCB) 2021
Deep Learning models, applied to a sensor-based Human Activity Recognition task, usually require vast amounts of annotated time-series data to extract robust features. However, annotating signals coming from wearable sensors can be a tedious and, often, not so intuitive process, that requires specialized tools and predefined scenarios, making it an expensive and time-consuming task. This paper combines one of the most recent advances in Self-Supervised Leaning (SSL), namely a SimCLR framework, with a powerful transformer-based encoder to introduce a Contrastive Self-supervised learning approach to Sensor-based Human Activity Recognition (CSSHAR) that learns feature representations from unlabeled sensory data. Extensive experiments conducted on three widely used public datasets have shown that the proposed method outperforms recent SSL models. Moreover, CSSHAR is capable of extracting more robust features than the identical supervised transformer when transferring knowledge from one dataset to another as well as when very limited amounts of annotated data are available.
Deep Triplet Networks with Attention for Sensor-based Human Activity Recognition
Bulat Khaertdinov, Esam Ghaleb, and others
In 2021 IEEE International Conference on Pervasive Computing and Communications (PerCom) Mar 2021
One of the most significant challenges in Human Activity Recognition using wearable devices is inter-class similarities and subject heterogeneity. These problems lead to the difficulties in constructing robust feature representations that might negatively affect the quality of recognition. This study, for the first time, applies deep triplet networks with various triplet loss functions and mining methods to the Human Activity Recognition task. Moreover, we introduce a novel method for constructing hard triplets by exploiting similarities between subjects performing the same activities using the concept of Hierarchical Triplet Loss. Our deep triplet models are based on the recent state-of-the-art LSTM networks with two attention mechanisms. The extensive experiments conducted in this paper identify important hyperparameters and settings for training deep metric learning models on widely-used open-source Human Activity Recognition datasets. The comparison of the proposed models against the recent benchmark models shows that deep metric learning approach has the potential to improve the quality of recognition. Specifically, at least one of the implemented triplet networks shows the state-of-the-art results for each dataset used in this study, namely PAMAP2, USC-HAD and MHEALTH. Another positive effect of applying deep triplet networks and especially the proposed sampling algorithm is that feature representations are less affected by inter-class similarities and subject heterogeneity issues.
Temporal triplet mining for personality recognition
Dario Dotti, Esam Ghaleb, and Stylianos Asteriadis
In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) Mar 2020
One of the primary goals of personality computing is to enhance the automatic understanding of human behavior, making use of various sensing technologies. Recent studies have started to correlate personality patterns described by psychologists with data findings, however, given the subtle delineations of human behaviors, results are specific to predefined contexts. In this paper, we propose a framework for automatic personality recognition that is able to embed different behavioral dynamics evoked by diverse real world scenarios. Specifically, motion features are designed to encode local motion dynamics from the human body, and interpersonal distance (proxemics) features are designed to encode global dynamics in the scene. By using a Convolutional Neural Network (CNN) architecture which utilizes a triplet loss deep metric learning, we learn temporal, as well as discriminative spatio-temporal streams of embeddings to represent patterns of personality behaviors. We experimentally show that the proposed Temporal Triplet Mining strategy leverages the similarity between temporally related samples and, therefore, helps to encode higher semantic movements or sub-movements which are easier to map onto personality labels. Our experiments show that the generated embeddings improve the state-of-the-art results of personality recognition on two public datasets, recorded in different scenarios.
Dynamic Temperature Scaling in Contrastive Self-supervised Learning for Sensor-based Human Activity Recognition
Bulat Khaertdinov, Stylianos Asteriadis, and Esam Ghaleb
IEEE Transactions on Biometrics, Behavior, and Identity Science Mar 2022
The use of deep neural networks in sensor-based Human Activity Recognition has led to considerably improved recognition rates in comparison to more traditional techniques. Nonetheless, these improvements usually rely on collecting and annotating massive amounts of sensor data, a time-consuming and expensive task. In this paper, inspired by the impressive performance of Contrastive Learning approaches in Self-Supervised Learning settings, we introduce a novel method based on the SimCLR framework and a Transformer-like model. The proposed algorithm addresses the problem of negative pairs in SimCLR by using dynamic temperature scaling within a contrastive loss function. While the original SimCLR framework scales similarities between features of the augmented views by a constant temperature parameter, our method dynamically computes temperature values for scaling. Dynamic temperature is based on instance-level similarity values extracted by an additional model pre-trained on initial instances beforehand. The proposed approach demonstrates state-of-the-art performance on three widely used datasets in sensor-based HAR, namely MobiAct, UCI-HAR and USC-HAD. Moreover, it is more robust than the identical supervised models and models trained with constant temperature in semi-supervised and transfer learning scenarios.