Multimodal Face-to-face Dialogue Modeling

Understanding the emergence and maintenance of cross-modal alignment in face-to-face dialogues

This project aims to model and understand the emergence and maintenance of cross-modal speaker alignment in face-to-face dialogues. It focuses on analyzing multimodal behaviors in a referential communication task, where two speakers participate in a referential game, where one participant (the director) describes a novel object called Fribble while the other participant (the matcher) tries to find it, using any means of communication available, including speech and gestures. Overall, this project contributes to understanding how humans use multiple modalities to establish mutual understanding and how AI methods can help analyze large-scale multimodal data without relying on rater-based analyses, which can be laborious and subjective. In this work, I collaborate with lead linguists and cognitive scientists specializing in dialogue and gestures from UvA, Radboud University, and TU Dresden.

In this project, I have worked on co-speech gesture segmentation and representation and the automatic detection and analysis of linguistic and gesture alignment in referential communication. The following research outputs are related to this project, which are published or under review in leading AI and cognitive science venues:

Related Publications

  1. Leveraging Speech for Gesture Detection in Multimodal Communication
    Esam Ghaleb, Ilya Burenko, Marlou Rasenberg, and 7 more authors
    arXiv preprint arXiv:2404.14952 2024
  2. Speakers align both their gestures and words not only to establish but also to maintain reference to create shared labels for novel objects in interaction
    Sho Akamine, Esam Ghaleb, Marlou Rasenberg, and 3 more authors
    In Proceedings of the Annual Meeting of the Cognitive Science Society 2024
  3. Analysing Cross-Speaker Convergence in Face-to-Face Dialogue through the Lens of Automatically Detected Shared Linguistic Constructions
    Esam Ghaleb, Marlou Rasenberg, Wim Pouw, and 4 more authors
    In Proceedings of the Annual Meeting of the Cognitive Science Society 2024
  4. Co-Speech Gesture Detection through Multi-phase Sequence Labeling (to appear)
    Esam Ghaleb, Ilya Burenko, Marlou Rasenberg, and 6 more authors
    In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024