Studying at the University of Verona
Here you can find information on the organisational aspects of the Programme, lecture timetables, learning activities and useful contact details for your time at the University, from enrolment to graduation.
Study Plan
The Study Plan includes all modules, teaching and learning activities that each student will need to undertake during their time at the University.
Please select your Study Plan based on your enrollment year.
1° Year
Modules | Credits | TAF | SSD |
---|
A scelta un insegnamento tra
A scelta due insegnamenti tra
2° Year activated in the A.Y. 2017/2018
Modules | Credits | TAF | SSD |
---|
A scelta tre insegnamenti tra
Modules | Credits | TAF | SSD |
---|
A scelta un insegnamento tra
A scelta due insegnamenti tra
Modules | Credits | TAF | SSD |
---|
A scelta tre insegnamenti tra
Legend | Type of training activity (TTA)
TAF (Type of Educational Activity) All courses and activities are classified into different types of educational activities, indicated by a letter.
Computational analysis of genomic sequences (2017/2018)
Teaching code
4S004556
Teacher
Coordinator
Credits
6
Language
English
Scientific Disciplinary Sector (SSD)
INF/01 - INFORMATICS
Period
I sem. dal Oct 2, 2017 al Jan 31, 2018.
Location
VERONA
Learning outcomes
In this course we study data structures and algorithms for textual data (strings, sequences). The recent explosion of the amounts of data available ("big data") is one of the major challenges for computer science today. Much of this data is in form of text (or can be easily rendered in textual form): genomic sequences and other biological sequences, webpages, emails, scanned books, musical data, and many others. In order to be able to efficiently store, process, and extract information from this data, we need dedicated data structures and algorithms, i.e. data structures specifically developed for strings, also referred to as text indices.
In the recent progress of research in computational biology, the use of these data structures has been decisive, while the methods can be, and are being, applied to all other kinds of textual data.
The course will provide:
- an understanding of the fundamental challenges and issues in processing textual data,
- knowledge of the most common computational problems on strings in applications (pattern matching, repeat finding, string statistics, etc.),
- familiarity with the most important text indices.
Program
Following an introduction to strings (sequences), their basic properties and fundamental issues (alphabet size, character comparison, string sorting), the course covers basics of the following text indices:
- tries
- suffix trees
- suffix arrays, enhanced suffix arrays
- Burrows-Wheeler Transform (BWT)
For each of these, we will study their properties, efficient construction, and applications to specific string problems.
We will also cover (or recall, as appropriate) some classical exact pattern matching algorithms that are not index-based.
Main textbooks:
1) Enno Ohlebusch, Bioinformatics Algorithms, 2013
2) Dan Gusfield, Algorithms on Strings, Trees, and Sequences, 1997
Author | Title | Publishing house | Year | ISBN | Notes |
---|---|---|---|---|---|
Dan Gusfield | Algorithms on Strings, Trees, and Sequences | Cambridge University Press | 1997 | 0 521 58519 8 | |
Enno Ohlebusch | Bioinformatics Algorithms | 2013 | 978-3-00-041316-2 |
Examination Methods
Final exam: written and oral. In the written exam, both theoretical questions will be asked (running times and storage space of algorithms, properties of the data structures studied), and concrete examples will have to be solved (compute the suffix tree, suffix array, BWT etc. of a given string, apply certain algorithms). In the oral exam, the student will have the opportunity to explain in detail his/her solution and show to what extent he/she has understood the topics studied.
The exam will show that the student
- has acquired sufficient understanding of the most important issues with respect to handling large textual data (alphabet type, comparison of strings, string sorting, size of textual data)
- can apply, explain, and analyze the algorithms studied for string sorting
- can apply, explain, and analyze the data structures studied, in particular construction algorithms for and storage space required by these data structures (inverted index, trie, suffix tree, suffix array, BWT)
- can apply, explain, and analyze some applications of these data structures to problems on strings, such as pattern matching, matching statistics, palindromes, etc.
The exam is the same for all students (whether or not they followed the lectures).