Training and Research
PhD Programme Courses/classes
Non monotonic reasoning
Credits: 3
Language: English
Teacher: Matteo Cristani
Sustainable Embodied Mechanical Intelligence
Credits: 3
Language: English
Teacher: Giovanni Gerardo Muscolo
Brain Computer Interfaces
Credits: 3
Language: English
Teacher: Silvia Francesca Storti
A practical interdisciplinary PhD course on exploratory data analysis
Credits: 4
Language: English
Teacher: Prof. Vincenzo Bonnici (Università di Parma)
Multimodal Learning and Applications
Credits: 5
Language: English
Teacher: Cigdem Beyan
Introduction to Blockchain
Credits: 3
Language: English
Teacher: Sara Migliorini
Autonomous Agents and Multi-Agent Systems
Credits: 5
Language: English
Teacher: Alessandro Farinelli
Cyber-physical systems security
Credits: 3
Language: English/Italian
Teacher: Massimo Merro
Foundations of quantum languages
Credits: 3
Language: English
Teacher: Margherita Zorzi
Advanced Data Structures for Textual Data
Credits: 3
Language: English
Teacher: Zsuzsanna Liptak
AI and explainable models
Credits: 5
Language: English
Teacher: Gloria Menegaz, Lorenza Brusini
Automated Software Testing
Credits: 4
Language: English
Teacher: Mariano Ceccato
Elements of Machine Teaching: Theory and Appl.
Credits: 3
Language: English
Teacher: Ferdinando Cicalese
Introduction to Quantum Machine Learning
Credits: 4
Language: English
Teacher: Alessandra Di Pierro
Laboratory of quantum information in classical wave-optics analogy
Credits: 3
Language: English
Teacher: Claudia Daffara
Multimodal Learning and Applications (2023/2024)
Teacher
Referent
Credits
5
Language
English
Class attendance
Free Choice
Location
VERONA
Learning objectives
For intelligent systems, adeptly interpreting, reasoning, and fusing multimodal information is essential. One of the latest and most promising trends in machine/deep learning research is Multimodal Learning, a multi-disciplinary field focused on integrating and modeling multiple modalities, such as acoustics, linguistics and vision. This course explores fundamental concepts in multimodal learning, including alignment, fusion, joint learning, temporal learning, and representation
learning. Through an examination of recent state-of-the-art papers, the course emphasizes effective computational algorithms tailored for diverse applications. Various datasets, sensing approaches, and computational methodologies will be explored, with discussions on existing limitations and potential future directions. Course evaluation will involve a small project assigned to student groups.
Scheduled Lessons
| When | Classroom | Teacher | topics |
|---|---|---|---|
|
Monday 17 June 2024 14:00 - 18:00 Duration: 4:00 AM |
Ca' Vignal 2 - L [67 - 1°] | Cigdem Beyan | The definition of multimodality, multimodality versus multimedia, heterogeneous and interconnected data, modalities, common sensors, definitions of multimodal machine learning and multimodal artificial intelligence, research tasks: audio-visual speech recognition, affective computing, synthesis, human-human-robot interaction analysis, content understanding,...., multimedia information retrieval, Multimodal technical challenges: a) representation (joint, coordinated), contrastive learning, CLIP, b) Alignment (explicit, implicit), Dynamic time warping, self-attention, cross attention, transformers, why attention is important, Semantic alignment, visual grounding, text grounding, Referring Expression Segmentation. State of the art examples for each challenge. |
|
Tuesday 18 June 2024 14:00 - 18:00 Duration: 4:00 AM |
Ca' Vignal 2 - L [67 - 1°] | Cigdem Beyan | Multimodal learning challenges: c) Translation (example based, generative based), GAN based example, avatar creation, Dall-E, Dall-E 2, Stable diffusion, d) Fusion (late, early fusion), Multimodal kernel learning, graphical models, neural networks, e) co-learning definition, co-learning via representation, f) generation for summarization and creation, multimodal summarization and example approaches, creation evaluation metrics (IS, FID, SID) and their limitations, generation open challenges, g) learning and optimization (overfitting to generalization ratio), gradient blending, h) modality bias, i) fairness, explainability, interpretability. |
|
Wednesday 19 June 2024 14:00 - 18:00 Duration: 4:00 AM |
Ca' Vignal 2 - L [67 - 1°] | Cigdem Beyan | Applications: Intro to human behavior understanding. The definition of Social Signal Processing, social signals, verbal and nonverbal communication, and nonverbal cues (body activity, eye gaze, facial expressions, vocal behavior, physical appearance, proxemics), methodologies, toolboxes, libraries used to extract all these nonverbal cues. Types of interactions (joint focused, common focused,...), f-formations, example applications with references, into to open-face, mediapipe, openpose, opensmile. Human-human interaction datasets |
|
Thursday 20 June 2024 14:00 - 18:00 Duration: 4:00 AM |
Ca' Vignal 2 - L [67 - 1°] | Cigdem Beyan | SSP examples: a) Emergent leader detection in meeting environments: dataset creation, annotation, used nonverbal cues, results, future work. b) Gaze target detection: unimodal SOTA, multimodal SOTA with depth maps, multimodal SOTA with skeletons and deep maps, privacy-preserving gaze target detection, transformer-based gaze target detection, multi task gaze target detection, c) predicting gaze from egocentric social interactions (dataset creation, methodology, evaluation, future work), d) social group detection (methodology, evaluation). SSP challenges and future directions (privacy preserving, domain adaptation, unsupervised learning,....) |
|
Friday 21 June 2024 14:00 - 18:00 Duration: 4:00 AM |
Ca' Vignal 2 - L [67 - 1°] | Cigdem Beyan | Multimodal activity recognition (HAR): definition, possible sensors, importance, challenges, Approaches and datasets: HAR using RGB camera, HAR using RGB+depth, point-cloud based HAR, Egocentric action recognition datasets. Introducing EGO4D dataset, challenges, methodology: short term object interaction anticipation. Introducing Ego-Exo4 dataset, benchmarks, sensors, tasks. Multimodal emotion recognition: definition of emotions, discrete emotions, Russel theory, cues to represent and predict emotions automatically, datasets from unimodal to multimodal, open questions, rare applications, open research problem. Methodology: Zero-shot multimodal emotion recognition, disentanglement based multimodal emotion recognition. |