Studying at the University of Verona
Here you can find information on the organisational aspects of the Programme, lecture timetables, learning activities and useful contact details for your time at the University, from enrolment to graduation.
Study Plan
The Study Plan includes all modules, teaching and learning activities that each student will need to undertake during their time at the University.
Please select your Study Plan based on your enrollment year.
1° Year
Modules | Credits | TAF | SSD |
---|
2° Year activated in the A.Y. 2023/2024
Modules | Credits | TAF | SSD |
---|
Modules | Credits | TAF | SSD |
---|
Modules | Credits | TAF | SSD |
---|
Modules | Credits | TAF | SSD |
---|
1 module among the following (a.a. 2023/24: Data protection in business organizations not activated)
2 modules among the following (a.a. 2023/24: Statistical methods for business intelligence not activated)
2 modules among the following (a.a. 2023/24: Complex systems and social physics not activated)
2 modules among the following
Legend | Type of training activity (TTA)
TAF (Type of Educational Activity) All courses and activities are classified into different types of educational activities, indicated by a letter.
Statistical models for Data Science (2022/2023)
Teaching code
4S009079
Academic staff
Coordinator
Credits
6
Also offered in courses:
- Statistical Models of the course Master's degree in Artificial intelligence
Language
English
Scientific Disciplinary Sector (SSD)
MAT/06 - PROBABILITY AND STATISTICS
Period
Semester 1 dal Oct 3, 2022 al Jan 27, 2023.
Learning objectives
The course will be devoted to the mathematical background necessary to describe, analyze and derive value from datasets, possibly Big Data and unstructured, and to master the main probabilistic models used in the data science field. Starting from basic models, for example regressions, PCA-based predictors, Bayesian statistics, filters, etc., particular emphasis will be placed on mathematically rigorous quantitative approaches aimed at optimizing the data collection, cleaning and organization phases (e.g. series historical data, unstructured data generated in social media, semantic elements, etc.). The mathematical tools necessary to deal with the description of the time series, their analysis and forecasts will also be introduced. The contents of the entire course will be structured in interaction with the study of real problems relating to industrial, economic, social, etc., heterogeneous sectors, using software oriented to probabilistic modeling, for example, Knime, ElasticSearch, Kibana, R AnalyticFlow, Orange , etc.
Prerequisites and basic notions
Regarding both component modules of the entire course: basic notions of Probability Theory, knowledge of the main models of notable discrete and continuous random variables (eg: binomial, Poisson, Gaussian) and their main statistical properties; convergence theorems (eg: law of large numbers, central limit theorem), basic notions of discrete and continuous time stochastic processes (eg: Markov chains, birth and death processes), rudiments of statistical analysis and data (eg : frequency, average, mode, square deviation). Basics of programming in Python, relating in particular to general syntax, data structures, import / export, main graphics for data visualization. Rudiments of the main libraries such as Numpy, Pandas and Matplotlib.
Program
The course program is divided into the following macro-topics
Part 1 [module 1]
1. Time domain analysis
2. Frequency domain analysis
3. Tools for data analysis and cleaning (eg identification of outliers)
4. Methods of maximum verseimilitude, likelihood metrics, fitting density Probability
5. Principal Component Analysis (PCA) [PCA-based regressors / predictors]
6. AR, MA, ARMA, ARIMA, Box-Jenkins, ARCH, GARCH models and their generalizations
7. TIme series decomposition ACF / PACF and connected visualizations
8. Hypothesis tests Gaussian and jump processes / compound processes
9. Decomposition of white noise type processes
10. Bayesian statistics and applications
11. Forecast evaluations via consideration of inferential statistical models, based, eg, on autocovariance and partial autocorrelation, seasonality (SARIMA), variance analysis (ANOVA, MANOVA) , etc.
12. Smoothing techniques, spectral decomposition, polynomial fitting, etc.
Part 2 [module 2]
1. Recalls to programming in Python
2. Manage and view time series
3. Descriptive statistics
4. Analysis in the frequency domain
5. Linear regression for time series
6. Analyze and decompose the principal components of the time series (trend, cycle, seasonality)
7. Forecasting methods: Exponential Smoothing (simple, double, triple)
8. Forecasting methods: AR, MA, ARMA, ARIMA, SARIMA
9. Forecasting methods: ARCH, GARCH and generalizations
10. How to evaluate the different forecasting models
All the above points will be deepened through practical exercises that will require their implementation by appropriate Python codes.
Moreover, the main forecasting methods will be further investigated thanks to the treatment and resolution of real case studies of various types.
Bibliography
Didactic methods
The course will be divided into lectures, with slides as well as notes sharing, and computer simulations / exercises.
Learning assessment procedures
The final exam consists of two parts: one theoretical, the next practical / implementative. Consequently, the first part of the exam is functional to the verification of the learning of the theoretical concepts characterizing the statistical methods and the connected models and algorithms, at the basis of the IT-computational implementations used to donduct a project that the student will agree with the course teachers.
Latter "case study", together with the discussion of the coding parts created to complete it, will be the subject of the second and final part of the exam.
Evaluation criteria
The evaluation of the exam will be carried out by combining the results obtained from the two modules of the course, therefore giving equal importance to the correctness and effectiveness of the solutions adopted in the phase of solving concrete problems due to computer implementations, as well as to understanding of the probabilistic / statistical models underlying them.
Criteria for the composition of the final grade
The final grade will be the result of the joint evaluation of the two theoretical tests and the resolution of the "case study" agreed by the student with the teachers., in accordance with what is expressed in the sections "Examination procedures" and "Evaluation criteria".
Exam language
Inglese / English