Training and Research

PhD Programme Courses/classes

Non monotonic reasoning

Credits: 3

Language: English

Teacher:  Matteo Cristani

Sustainable Embodied Mechanical Intelligence

Credits: 3

Language: English

Teacher:  Giovanni Gerardo Muscolo

Brain Computer Interfaces

Credits: 3

Language: English

Teacher:  Silvia Francesca Storti

A practical interdisciplinary PhD course on exploratory data analysis

Credits: 4

Language: English

Teacher:  Prof. Vincenzo Bonnici (Università di Parma)

Multimodal Learning and Applications

Credits: 5

Language: English

Teacher:  Cigdem Beyan

Introduction to Blockchain

Credits: 3

Language: English

Teacher:  Sara Migliorini

Autonomous Agents and Multi-Agent Systems

Credits: 5

Language: English

Teacher:  Alessandro Farinelli

Cyber-physical systems security

Credits: 3

Language: English/Italian

Teacher:  Massimo Merro

Foundations of quantum languages

Credits: 3

Language: English

Teacher:  Margherita Zorzi

Advanced Data Structures for Textual Data

Credits: 3

Language: English

Teacher:  Zsuzsanna Liptak

AI and explainable models

Credits: 5

Language: English

Teacher:  Gloria Menegaz, Lorenza Brusini

Automated Software Testing

Credits: 4

Language: English

Teacher:  Mariano Ceccato

Elements of Machine Teaching: Theory and Appl.

Credits: 3

Language: English

Teacher:  Ferdinando Cicalese

Introduction to Quantum Machine Learning

Credits: 4

Language: English

Teacher:  Alessandra Di Pierro

Laboratory of quantum information in classical wave-optics analogy

Credits: 3

Language: English

Teacher:  Claudia Daffara

Credits

5

Language

English

Class attendance

Free Choice

Location

VERONA

Learning objectives

For intelligent systems, adeptly interpreting, reasoning, and fusing multimodal information is essential. One of the latest and most promising trends in machine/deep learning research is Multimodal Learning, a multi-disciplinary field focused on integrating and modeling multiple modalities, such as acoustics, linguistics and vision. This course explores fundamental concepts in multimodal learning, including alignment, fusion, joint learning, temporal learning, and representation
learning. Through an examination of recent state-of-the-art papers, the course emphasizes effective computational algorithms tailored for diverse applications. Various datasets, sensing approaches, and computational methodologies will be explored, with discussions on existing limitations and potential future directions. Course evaluation will involve a small project assigned to student groups.

Scheduled Lessons

When Classroom Teacher topics
Monday 17 June 2024
14:00 - 18:00
Duration: 4:00 AM
Ca' Vignal 2 - L [67 - 1°] Cigdem Beyan The definition of multimodality, multimodality versus multimedia, heterogeneous and interconnected data, modalities, common sensors, definitions of multimodal machine learning and multimodal artificial intelligence, research tasks: audio-visual speech recognition, affective computing, synthesis, human-human-robot interaction analysis, content understanding,...., multimedia information retrieval, Multimodal technical challenges: a) representation (joint, coordinated), contrastive learning, CLIP, b) Alignment (explicit, implicit), Dynamic time warping, self-attention, cross attention, transformers, why attention is important, Semantic alignment, visual grounding, text grounding, Referring Expression Segmentation. State of the art examples for each challenge.
Tuesday 18 June 2024
14:00 - 18:00
Duration: 4:00 AM
Ca' Vignal 2 - L [67 - 1°] Cigdem Beyan Multimodal learning challenges: c) Translation (example based, generative based), GAN based example, avatar creation, Dall-E, Dall-E 2, Stable diffusion, d) Fusion (late, early fusion), Multimodal kernel learning, graphical models, neural networks, e) co-learning definition, co-learning via representation, f) generation for summarization and creation, multimodal summarization and example approaches, creation evaluation metrics (IS, FID, SID) and their limitations, generation open challenges, g) learning and optimization (overfitting to generalization ratio), gradient blending, h) modality bias, i) fairness, explainability, interpretability.
Wednesday 19 June 2024
14:00 - 18:00
Duration: 4:00 AM
Ca' Vignal 2 - L [67 - 1°] Cigdem Beyan Applications: Intro to human behavior understanding. The definition of Social Signal Processing, social signals, verbal and nonverbal communication, and nonverbal cues (body activity, eye gaze, facial expressions, vocal behavior, physical appearance, proxemics), methodologies, toolboxes, libraries used to extract all these nonverbal cues. Types of interactions (joint focused, common focused,...), f-formations, example applications with references, into to open-face, mediapipe, openpose, opensmile. Human-human interaction datasets
Thursday 20 June 2024
14:00 - 18:00
Duration: 4:00 AM
Ca' Vignal 2 - L [67 - 1°] Cigdem Beyan SSP examples: a) Emergent leader detection in meeting environments: dataset creation, annotation, used nonverbal cues, results, future work. b) Gaze target detection: unimodal SOTA, multimodal SOTA with depth maps, multimodal SOTA with skeletons and deep maps, privacy-preserving gaze target detection, transformer-based gaze target detection, multi task gaze target detection, c) predicting gaze from egocentric social interactions (dataset creation, methodology, evaluation, future work), d) social group detection (methodology, evaluation). SSP challenges and future directions (privacy preserving, domain adaptation, unsupervised learning,....)
Friday 21 June 2024
14:00 - 18:00
Duration: 4:00 AM
Ca' Vignal 2 - L [67 - 1°] Cigdem Beyan Multimodal activity recognition (HAR): definition, possible sensors, importance, challenges, Approaches and datasets: HAR using RGB camera, HAR using RGB+depth, point-cloud based HAR, Egocentric action recognition datasets. Introducing EGO4D dataset, challenges, methodology: short term object interaction anticipation. Introducing Ego-Exo4 dataset, benchmarks, sensors, tasks. Multimodal emotion recognition: definition of emotions, discrete emotions, Russel theory, cues to represent and predict emotions automatically, datasets from unimodal to multimodal, open questions, rare applications, open research problem. Methodology: Zero-shot multimodal emotion recognition, disentanglement based multimodal emotion recognition.