Grand Challenges Program

Grand Challenges Part 1 (10:30-12:30)

15 min per paper, including Q&A; 30 min for organizers presentations, including Q&A

10:30-11:00 – ACM Multimedia Computational Paralinguistics Challenge (ComParE) Organizers presentation, paper mmgc052i, The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests (Presenter: Shahin Amiriparian)

11:00-11:15: Paper mmgc011, Cascaded Cross-Modal Transformer for Request and Complaint Detection

11:15-11:30: Paper mmgc013, Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference

11:30-11:45: Paper mmgc 017, Multi-Layer Acoustic & Linguistic Feature Fusion for ComParE-23 Emotion and Requests Challenge

11:45-12:00: Paper mmgc 021, Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks

12:00-12:30: Paper mmgc50i, SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge(Presenter: Bo Wu)

Grand Challenges Part 2 (13:30-15:00)

15 min per paper, including Q&A; 30 min for organizers presentations, including Q&A

13:30-14:00: Multi-modal Behaviour Analysis for Artificial Mediation (MultiMediate) Organizers presentation, paper mmgc43i, MultiMediate ’23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions (Presenters Philip Mueller and Michal Balazia)

14:00-14:15: Paper mmgc025, MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings (Presenter: Surbhi Madan)

14:15-14:30: Paper mmgc024, DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation (Presenter: proxy)

14:30-14:45: Paper mmgc027, BEAMER: Behavioral Encoder to Generate Multiple Appropriate Facial Reactions (Presenter: proxy)

14:45-15:00: Paper mmgc008, Gradient Boost Tree Network based on Extensive Feature Analysis for Popularity Prediction of Social Posts (Presenter: Chia-Ming Li)

Grand Challenges Part 3 (15:30-17:30)

15 min per paper, including Q&A; 30 min for organizers presentations, including Q&A

15:30-16:00: Deep Video Understanding (DVU) 2023 Organizers presentation, paper mmgc044i, The ACM Multimedia 2023 Deep Video Understanding Grand Challenge (Presenter: George Awad)

16:00-16:30: Facial Micro-Expression Grand Challenge Organizers presentation, paper mmgc049i, MEGC2023: ACM Multimedia 2023 ME Grand Challenge (Presenter: Adrian Davison)

16:30-16:45: Paper mmgc028              , Efficient Micro-Expression Spotting Based on Main Directional Mean Optical Flow Feature (Presenter: proxy)

16:45-17:00: Paper mmgc010, Double-Fine-Tuning Multi-Objective Vision-and-Language Transformer for Social Media Popularity Prediction (Presenter: Weilong Chen)

17:00-17:30: Time for discussion

Videos are available on Conflux System for all Grand Challenges papers, as listed below:

mmgc002Finetuning Language Models for Multimodal Question Answering
mmgc003A Hierarchical Deep Video Understanding Method with Large Language Model
mmgc004Enhanced CatBoost with Stacking Features for Social Media Prediction
mmgc005Semi-Supervised Multimodal Emotion Recognition with Expression MAE
mmgc006Towards Realistic Conversational Head Generation: A Comprehensive Framework for Lifelike Video Synthesis
mmgc007Invisible Video Watermark Method Based on Maximum Voting and Probabilistic Superposition
mmgc008Gradient Boost Tree Network based on Extensive Feature Analysis for Popularity Prediction of Social Posts
mmgc009VTQAGen: BART-based Generative Model For Visual Text Question Answering
mmgc010Double-Fine-Tuning Multi-Objective Vision-and-Language Transformer for Social Media Popularity Prediction
mmgc011Cascaded Cross-Modal Transformer for Request and Complaint Detection
mmgc012Multi-scale Conformer Fusion Network for Multi-participant Behavior Analysis
mmgc013Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference
mmgc014Automatic Audio Augmentation for Requests Sub-Challenge
mmgc016Answer-Based Entity Extraction and Alignment for Visual Text Question Answering
mmgc017Multi-Layer Acoustic & Linguistic Feature Fusion for ComParE-23 Emotion and Requests Challenge
mmgc018Sliding Window Seq2seq Modeling for Engagement Estimation
mmgc019Micro-Expression Spotting with Face Alignment and Optical Flow
mmgc020UniFaRN: Unified Transformer for Facial Reaction Generation
mmgc021Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks
mmgc023Data Augmentation for Human Behavior Analysis in Multi-Person Conversations
mmgc024DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation
mmgc025MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings
mmgc026Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023
mmgc027BEAMER: Behavioral Encoder to Generate Multiple Appropriate Facial Reactions
mmgc028Efficient Micro-Expression Spotting Based on Main Directional Mean Optical Flow Feature
mmgc030Mining High-quality Samples from Raw Data and Majority Voting Method for Multimodal Emotion Recognition
mmgc031Deep Video Understanding with Video-Language Model
mmgc032Semi-Supervised Multimodal Emotion Recognition with Class-Balanced Pseudo-labeling
mmgc033Leveraging the Latent Diffusion Models for Offline Facial Multiple Appropriate Reactions Generation
mmgc034Improvements on SadTalker-based Approach for ViCo Conversational Head Generation Challenge
mmgc035Multimodal Emotion Recognition in Noisy Environment Based on Progressive Label Revision
mmgc036Integrating VideoMAE based model and Optical Flow for Micro- and Macro-expression Spotting
mmgc037Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline
mmgc038Unveiling Subtle Cues: Backchannel Detection Using Temporal Multimodal Attention Networks
mmgc039Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding
mmgc040Building Robust Multimodal Sentiment Recognition via a Simple yet Effective Multimodal Transformer
mmgc041MultiMediate 2023: Engagement Level Detection using Audio and Video Features
mmgc044iThe ACM Multimedia 2023 Deep Video Understanding Grand Challenge
mmgc045iMER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning
mmgc046iLearning and Evaluating Human Preferences for Conversational Head Generation
mmgc047iREACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge
mmgc049iMEGC2023: ACM Multimedia 2023 ME Grand Challenge
mmgc051iACM Multimedia 2023 Grand Challenge Report: Invisible Video Watermark
mmgc052iThe ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests
mmgc43i<sc>MultiMediate ’23:</sc> Engagement Estimation and Bodily Behaviour Recognition in Social Interactions
mmgc48iVTQA2023: ACM Multimedia 2023 Visual Text Question Answering Challenge
mmgc50iSMP Challenge: An Overview and Analysis of Social Media Prediction Challenge