Formazione e ricerca
Attività Formative del Corso di Dottorato
This page shows the courses and classes of the PhD programme for the academic year 2023/2024. Additional courses and classes will be added during the year. Please check for updates regularly!
Non monotonic reasoning
Crediti: 3
Lingua di erogazione: English
Docente: Matteo Cristani
Sustainable Embodied Mechanical Intelligence
Crediti: 3
Lingua di erogazione: English
Docente: Giovanni Gerardo Muscolo
Brain Computer Interfaces
Crediti: 3
Lingua di erogazione: English
Docente: Silvia Francesca Storti
A practical interdisciplinary PhD course on exploratory data analysis
Crediti: 4
Lingua di erogazione: Inglese
Docente: Prof. Vincenzo Bonnici (Università di Parma)
Multimodal Learning and Applications
Crediti: 5
Lingua di erogazione: Inglese
Docente: Cigdem Beyan
Introduction to Blockchain
Crediti: 3
Lingua di erogazione: English
Docente: Sara Migliorini
Autonomous Agents and Multi-Agent Systems
Crediti: 5
Lingua di erogazione: English
Docente: Alessandro Farinelli
Cyber-Physical System Security
Crediti: 3
Lingua di erogazione: English/Italian
Docente: Massimo Merro
Foundations of quantum languages
Crediti: 3
Lingua di erogazione: English
Docente: Margherita Zorzi
Advanced Data Structures for Textual Data
Crediti: 3
Lingua di erogazione: English
Docente: Zsuzsanna Liptak
AI and explainable models
Crediti: 5
Lingua di erogazione: English
Docente: Gloria Menegaz, Lorenza Brusini
Automated Software Testing
Crediti: 4
Lingua di erogazione: English
Docente: Mariano Ceccato
Elements of Machine Teaching: Theory and Appl.
Crediti: 3
Lingua di erogazione: English
Docente: Ferdinando Cicalese
Introduction to Quantum Machine Learning
Crediti: 4
Lingua di erogazione: English
Docente: Alessandra Di Pierro
Laboratory of quantum information in classical wave-optics analogy
Crediti: 3
Lingua di erogazione: English
Docente: Claudia Daffara
Multimodal Learning and Applications (2023/2024)
Docente
Referente
Crediti
5
Lingua di erogazione
Inglese
Frequenza alle lezioni
Scelta Libera
Sede
VERONA
Obiettivi di apprendimento
For intelligent systems, adeptly interpreting, reasoning, and fusing multimodal information is essential. One of the latest and most promising trends in machine/deep learning research is Multimodal Learning, a multi-disciplinary field focused on integrating and modeling multiple modalities, such as acoustics, linguistics and vision. This course explores fundamental concepts in multimodal learning, including alignment, fusion, joint learning, temporal learning, and representation
learning. Through an examination of recent state-of-the-art papers, the course emphasizes effective computational algorithms tailored for diverse applications. Various datasets, sensing approaches, and computational methodologies will be explored, with discussions on existing limitations and potential future directions. Course evaluation will involve a small project assigned to student groups.
Modalità didattiche
June 2024
Lezioni Programmate
Quando | Aula | Docente | Argomenti |
---|---|---|---|
lunedì 17 giugno 2024 14:00 - 18:00 Durata: 04:00 |
Ca' Vignal 2 - L [67 - 1°] | Cigdem Beyan | The definition of multimodality, multimodality versus multimedia, heterogeneous and interconnected data, modalities, common sensors, definitions of multimodal machine learning and multimodal artificial intelligence, research tasks: audio-visual speech recognition, affective computing, synthesis, human-human-robot interaction analysis, content understanding,...., multimedia information retrieval, Multimodal technical challenges: a) representation (joint, coordinated), contrastive learning, CLIP, b) Alignment (explicit, implicit), Dynamic time warping, self-attention, cross attention, transformers, why attention is important, Semantic alignment, visual grounding, text grounding, Referring Expression Segmentation. State of the art examples for each challenge. |
martedì 18 giugno 2024 14:00 - 18:00 Durata: 04:00 |
Ca' Vignal 2 - L [67 - 1°] | Cigdem Beyan | Multimodal learning challenges: c) Translation (example based, generative based), GAN based example, avatar creation, Dall-E, Dall-E 2, Stable diffusion, d) Fusion (late, early fusion), Multimodal kernel learning, graphical models, neural networks, e) co-learning definition, co-learning via representation, f) generation for summarization and creation, multimodal summarization and example approaches, creation evaluation metrics (IS, FID, SID) and their limitations, generation open challenges, g) learning and optimization (overfitting to generalization ratio), gradient blending, h) modality bias, i) fairness, explainability, interpretability. |
mercoledì 19 giugno 2024 14:00 - 18:00 Durata: 04:00 |
Ca' Vignal 2 - L [67 - 1°] | Cigdem Beyan | Applications: Intro to human behavior understanding. The definition of Social Signal Processing, social signals, verbal and nonverbal communication, and nonverbal cues (body activity, eye gaze, facial expressions, vocal behavior, physical appearance, proxemics), methodologies, toolboxes, libraries used to extract all these nonverbal cues. Types of interactions (joint focused, common focused,...), f-formations, example applications with references, into to open-face, mediapipe, openpose, opensmile. Human-human interaction datasets |
giovedì 20 giugno 2024 14:00 - 18:00 Durata: 04:00 |
Ca' Vignal 2 - L [67 - 1°] | Cigdem Beyan | SSP examples: a) Emergent leader detection in meeting environments: dataset creation, annotation, used nonverbal cues, results, future work. b) Gaze target detection: unimodal SOTA, multimodal SOTA with depth maps, multimodal SOTA with skeletons and deep maps, privacy-preserving gaze target detection, transformer-based gaze target detection, multi task gaze target detection, c) predicting gaze from egocentric social interactions (dataset creation, methodology, evaluation, future work), d) social group detection (methodology, evaluation). SSP challenges and future directions (privacy preserving, domain adaptation, unsupervised learning,....) |
venerdì 21 giugno 2024 14:00 - 18:00 Durata: 04:00 |
Ca' Vignal 2 - L [67 - 1°] | Cigdem Beyan | Multimodal activity recognition (HAR): definition, possible sensors, importance, challenges, Approaches and datasets: HAR using RGB camera, HAR using RGB+depth, point-cloud based HAR, Egocentric action recognition datasets. Introducing EGO4D dataset, challenges, methodology: short term object interaction anticipation. Introducing Ego-Exo4 dataset, benchmarks, sensors, tasks. Multimodal emotion recognition: definition of emotions, discrete emotions, Russel theory, cues to represent and predict emotions automatically, datasets from unimodal to multimodal, open questions, rare applications, open research problem. Methodology: Zero-shot multimodal emotion recognition, disentanglement based multimodal emotion recognition. |