Training and Research

PhD Programme Courses/classes

Non monotonic reasoning

Credits: 3

Language: English

Teacher:  Matteo Cristani

Sustainable Embodied Mechanical Intelligence

Credits: 3

Language: English

Teacher:  Giovanni Gerardo Muscolo

Brain Computer Interfaces

Credits: 3

Language: English

Teacher:  Silvia Francesca Storti

A practical interdisciplinary PhD course on exploratory data analysis

Credits: 4

Language: English

Teacher:  Prof. Vincenzo Bonnici (Università di Parma)

Multimodal Learning and Applications

Credits: 5

Language: English

Teacher:  Cigdem Beyan

Introduction to Blockchain

Credits: 3

Language: English

Teacher:  Sara Migliorini

Autonomous Agents and Multi-Agent Systems

Credits: 5

Language: English

Teacher:  Alessandro Farinelli

Cyber-physical systems security

Credits: 3

Language: English/Italian

Teacher:  Massimo Merro

Foundations of quantum languages

Credits: 3

Language: English

Teacher:  Margherita Zorzi

Advanced Data Structures for Textual Data

Credits: 3

Language: English

Teacher:  Zsuzsanna Liptak

AI and explainable models

Credits: 5

Language: English

Teacher:  Gloria Menegaz, Lorenza Brusini

Automated Software Testing

Credits: 4

Language: English

Teacher:  Mariano Ceccato

Elements of Machine Teaching: Theory and Appl.

Credits: 3

Language: English

Teacher:  Ferdinando Cicalese

Introduction to Quantum Machine Learning

Credits: 4

Language: English

Teacher:  Alessandra Di Pierro

Laboratory of quantum information in classical wave-optics analogy

Credits: 3

Language: English

Teacher:  Claudia Daffara

Credits

3

Language

English

Class attendance

Free Choice

Location

VERONA

Learning objectives

Introduction to text indexes (data structures for very large textual data)

Prerequisites and basic notions

Algorithms and Data Structures

Program

Genome-scale textual data, i.e. strings of many giga- or even terabytes, are everywhere in today's world. This includes biological sequences (genomic data, protein sequences), digital books, web crawl data, emails, musical data, and many others. The main challenge nowadays is not so much how to store this data, but how to store it in such a way that it can be processed and queried efficiently. Text indexes are dedicated data structures for handling very large amounts of textual data. Propelled forward by the need arising from computational biology on the one hand, and from web search on the other, enormous progress has been made in this area in recent decades.
In this course, we will study some of these text indexes. We will start with a brief introduction to the suffix array, a classic data structure for strings, study its properties, some of its uses in string processing, and its efficient construction. We then introduce two supporting data structures, the LCP-array and the Burrows- Wheeler-Transform (BWT). The so-called ‘clustering property’ of the BWT allows compressed indexes to be built on it. We will close with several BWT-based text indexes: the FM-index, the RLFM-index, and the r- index.
Depending on the background of the students, some of the above topics may be replaced by others, such as a more thorough introduction to wavelet trees (a versatile data structure for efficient rank/select queries), or the extended Burrows-Wheeler-Transform (eBWT, a generalization of the BWT to string collections). String collections are of fundamental interest in many of the most common applications today, such as pangenomes, version control data, or web crawl data, where many different copies of highly similar strings are given in input.
Day 1: Introduction to problems on textual data, pattern matching, suffix arrays (SA)
Day 2: efficient SA construction, LCP-array
Day 3: BWT, backward search, wavelet trees
Day 4: FM-index, RLFM-index, r-index

Bibliography

Visualizza la bibliografia con Leganto, strumento che il Sistema Bibliotecario mette a disposizione per recuperare i testi in programma d'esame in modo semplice e innovativo.

Didactic methods

lectures, using slides and/or blackboard

Learning assessment procedures

takehome exam

Students with disabilities or specific learning disorders (SLD), who intend to request the adaptation of the exam, must follow the instructions given HERE

Assessment

ability to apply the algorithms and data structures studied on concrete examples; ability to develop new algorithms using the text indexes studied

Criteria for the composition of the final grade

n.a.

Sustainable Development Goals - SDGs

This initiative contributes to the achievement of the Sustainable Development Goals of the UN Agenda 2030. More information on sustainability