Keynote Speakers

Transition and Adaptability:
The Cornerstone of Resilience in Future Networked Multimedia Systems and Beyond

Ralf Steinmetz
Technical University of Darmstadt, Germany


Let us define transition as the “exchange” between two mechanisms with comparable functionality, but with different algorithms and implementation concepts, which are optimal depending on the respective conditions of the respective context. It is much more that adaptability; it does not cover just the smooth automatic control of e.g., a MAPE loop or a control loop which is in charge to maximize the quality of service of streamed media data while errors occur.  Resilience describes the ability of a system to either absorb large changes (and crises) and recover from them in the short term or to overcome them by acquiring comparable or new basic functionality through overall system adjustments. In doing so, the system’s readiness increases continuously and sustainably by learning from past changes of the context (and crises).

Just one extreme example: In the situation of a severe danger due to a nature disaster, a person -located in the affected area of the disaster- transmits to the rescue team an on-the-fly generated 360-degree panoramic point cloud of the situation. He still has sufficient energy supply and for whatever reason the communication facilities are still available. Due to energy shortage, a lot of other traffic and some damages of the infrastructure, multimedia communication must be adjusted continuously to the environment and requirements. In an extreme situation data is send over high latency low bandwidth satellite channels. Media might become a short textual description of the actual surrounding. Assume it happens without any manual interaction of the person sending this data. Multimedia and Communications Mechanisms must be exchanged; media must be “customized”. Transitions occur to support the user in e.g., such an extreme stress situation.   In the collaborative research project MAKI as well as our center researching resilient infrastructures of digital cities that can withstand crises and disasters emergenCITY, we address some of these issues. However, in the next years beyond multimedia networks, many multimedia systems, interfaces, applications, contents, retrieval, theory,  .. security and privacy will be affected.


KEYWORDS: Computer systems organization, Networks, Networks/Network algorithms, Information systems / Multimedia information systems



Ralf Steinmetz (Prof. Dr.-Ing. Dr.h.c.) is a full professor at the Department of Electrical Engineering and Information Technology as well as at the Dept. of Computer Science at the Techn. Univ. Darmstadt, Germany. Since 1996, there he is managing director of the “Multimedia Communications Lab”; until end of 2001, he directed a Fraunhofer Institute. He founded the Hessian Telemedia Technology Competence Center httc and he acted for approx. 20 years as the chair.

Over the course of his career, he has supervised over 105 successful PhD students. He has edited and co-authored a set of multimedia books which reflected the major issues; the initial version was the worldwide first in-depth technical book on multimedia technology. He has served as editor-in-chief of ACM TOMM, editor of various IEEE, ACM, and other journals.

For more than 10 years he has served as Hessian’s advisor for information and communications technology. He is a member of the Scientific Council and president of the Board of Trustees of the international research institute IMDEA Networks, Madrid, Spain.

Ralf Steinmetz has been widely recognized for his contributions to the field, including the receipt of prestigious accolades such as an Honorary Doctorate from RWTH Aachen University. He was awarded with a Chair of Excellence at the Univ. Carlos III de Madrid, Spain. Since 2019 he is a member of the Academia Europaea. In 2008 he received the Award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications, ACM SIGMM, “for pioneering work in multimedia communications and the fundamentals of multimedia synchronization”. In 2017 he was among the first three scientists being awarded a fellowship of the VDE ITG and he is a fellow of the GI.  As first German scientist, he was awarded the honors of Fellow of both, the IEEE and the ACM.


Multimodal AI & LLMs for Peacekeeping and Emergency Response

Alejandro (Alex) Jaimes

Chief Scientist & SVP of AI at Dataminr &
Visiting Professor at Cornell Tech. 


When an emergency event, or an incident relevant for peacekeeping first occurs, getting the right information as quickly as possible is critical in saving lives. When an event is ongoing, information on what is happening can be critical in making decisions to keep people safe and take control of the particular situation unfolding. In both cases, first responders and peacekeepers have to quickly make decisions that include what resources to deploy and where. Fortunately, in most emergencies, people use social media to publicly share information. At the same time, sensor data is increasingly becoming available. But a platform to detect emergency situations and deliver the right information has to deal with ingesting thousands of noisy data points per second: sifting through and identifying relevant information, from different sources, in different formats, with varying levels of detail, in real time, so that relevant individuals and teams can be alerted at the right level and at the right time. In this talk I will describe the technical challenges in processing vast amounts of heterogenous, noisy data in real time, highlighting the importance of interdisciplinary research and a human-centered approach to address problems in peacekeeping and emergency response. I will give specific examples specifically discussing how LLMs can be deployed at scale, including relevant future research directions in Multimedia.

About Dataminr

Dataminr’s advanced AI platform detects the earliest signals of high-impact events and emerging risks, enabling enterprise and public sector clients around the globe to know critical information first, respond with confidence, and manage crises more effectively. Dataminr has partnered with the UN to accelerate humanitarian and crisis response efforts equipping thousands of UN personnel with Dataminr’s First Alert product.


Alex is Chief Scientist & SVP of AI at Dataminr and Visiting Professor at Cornell Tech. He is a leader in AI and has built products that are used by millions of people (real-time event detection/emergency response, healthcare, self-driving cars, media, telecomm, etc.) at companies such as Yahoo, Telefónica, IBM, Fuji Xerox, Siemens, AT&T Bell Labs, DigitalOcean, etc. An early voice in Human-Centered AI (Computing), he has over 100 patents and publications in top-tier conferences and journals in AI. He has been featured widely in the press (MIT Tech review, CNBC, Vice, TechCrunch, Yahoo! Finance, etc.). He is a mentor at Endeavor and Techstars, and a member of the advisory board of Digital Divide Data (a non-for-profit that creates sustainable tech opportunities for underserved youth, their families, and their communities in Asia and Africa). He was an expert in the Colombian Government’s Artificial Intelligence Expert Mission which advised the President on AI policies. Alex holds a Ph.D. from Columbia University. 

Internet of Video Things: Technical Challenges and Emerging Applications

Professor Chang Wen Chen
The Hong Kong Polytechnic University


The worldwide flourishing of the Internet of Things (IoT) in the past decade has enabled numerous new applications through the internetworking of a wide variety of devices and sensors. In recent years, visual sensors have seen a considerable boom in IoT systems because they are capable of providing richer and more versatile information. Internetworking of large-scale visual sensors has been named the Internet of Video Things (IoVT). IoVT has a new array of unique characteristics in terms of sensing, transmission, storage, and analysis, all are fundamentally different from the conventional IoT. These new characteristics of IoVT are expected to impose significant challenges on existing technical infrastructures. In this talk, an overview of recent advances in various fronts of IoVT will be introduced and a broad range of technological and systematic challenges will be addressed. Several emerging IoVT applications will be discussed to illustrate the great potential of IoVT in a broad range of practical scenarios..


Chang Wen Chen is currently Chair Professor of Visual Computing at The Hong Kong Polytechnic University. Before his current position, he served as Dean of the School of Science and Engineering at The Chinese University of Hong Kong, Shenzhen from 2017 to 2020, and concurrently as Deputy Director at Peng Cheng Laboratory from 2018 to 2021. Previously, he has been an Empire Innovation Professor at the State University of New York at Buffalo (SUNY) from 2008 to 2021 and the Allan Henry Endowed Chair Professor at the Florida Institute of Technology from 2003 to 2007.

He has served as an Editor-in-Chief for IEEE Trans. Multimedia (2014-2016) and IEEE Trans. Circuits and Systems for Video Technology (2006-2009). He has received many professional achievement awards, including ten (10) Best Paper Awards in premier publication venues, the prestigious Alexander von Humboldt Award in 2010, the SUNY Chancellor’s Award for Excellence in Scholarship and Creative Activities in 2016, and UIUC ECE Distinguished Alumni Award in 2019. He is an IEEE Fellow, an SPIE Fellow, and a Member of the Academia Europaea.

Variational Audio-Visual Representation Learning

Xavier Alameda-Pineda 

INRIA, France


Learning robust and powerful representations is at the core of many problems in multimedia, including content representation, multi-modal fusion, social signals, etc. While the supervised and self-supervised learning paradigms showed great progress in many applications, the learned representations are strongly tailored to one application or domain, and their adaptation to a different scenario or dataset might require large amounts of data, not always available. Deep probabilistic models provide an opportunity to exploit various unsupervised mechanisms that enable several interesting properties. First, they can combined with other deep or shallow probabilistic models within the same methodological framework. Second, they can include unsupervised mixture mechanisms useful for modality and/or model selection on-the-fly. Third, they are naturally suitable not only for unsupervised learning, but also for unsupervised adaptation, thus overcoming a potential domain shift with few data. In this talk, we will discuss the methodology of deep probabilistic models, i.e. variational learning, and showcase their interest for multi-modal applications with auditory and visual data of human activities (speech and motion).



Xavier Alameda-Pineda is a (tenured) Research Scientist at Inria and the Leader of the RobotLearn Team. He obtained the M.Sc. (equivalent) in Mathematics in 2008, in Telecommunications in 2009 from BarcelonaTech, and in Computer Science in 2010 from Univ. Grenoble-Alpes (UGA). He then worked towards his Ph.D. in Mathematics and Computer Science, and obtained it in 2013, from UGA. After a two-year post-doc at the Multimodal Human Understanding Group, at the University of Trento, he was appointed to his current position. Xavier is an active member of SIGMM, a senior member of IEEE, and a member of ELLIS. He is the Coordinator of the H2020 Project SPRING: Socially Pertinent Robots in Gerontological Healthcare and is co-leading the ???Audio-visual machine perception and interaction for companion robots??? chair of the Multidisciplinary Institute of Artificial Intelligence. Xavier???s research interests are at the crossroads of machine learning, computer vision, and audio processing for scene and behavior analysis and human-robot interaction.