Studying at the University of Verona
Here you can find information on the organisational aspects of the Programme, lecture timetables, learning activities and useful contact details for your time at the University, from enrolment to graduation.
Study Plan
The Study Plan includes all modules, teaching and learning activities that each student will need to undertake during their time at the University.
Please select your Study Plan based on your enrollment year.
1° Year
Modules | Credits | TAF | SSD |
---|
2° Year activated in the A.Y. 2021/2022
Modules | Credits | TAF | SSD |
---|
Modules | Credits | TAF | SSD |
---|
Modules | Credits | TAF | SSD |
---|
Modules | Credits | TAF | SSD |
---|
1 module among the following (1st year: Big Data epistemology and Social research; 2nd year: Cybercrime, Data protection in business organizations, Comparative and Transnational Law & Technology)
2 courses among the following (1st year: Business analytics, Digital Marketing and market research; 2nd year: Logistics, Operations & Supply Chain, Digital transformation and IT change, Statistical methods for Business intelligence)
2 courses among the following (1st year: Complex systems and social physics, Discrete Optimization and Decision Making, 2nd year: Statistical models for Data Science, Continuous Optimization for Data Science, Network science and econophysics, Marketing research for agrifood and natural resources)
2 courses among the following (1st year: Data Visualisation, Data Security & Privacy, Statistical learning, Mining Massive Dataset, 2nd year: Machine Learning for Data Science)
Legend | Type of training activity (TTA)
TAF (Type of Educational Activity) All courses and activities are classified into different types of educational activities, indicated by a letter.
Statistical models for Data Science (2021/2022)
Teaching code
4S009079
Academic staff
Coordinator
Credits
6
Language
English
Scientific Disciplinary Sector (SSD)
MAT/06 - PROBABILITY AND STATISTICS
Period
Primo semestre dal Oct 4, 2021 al Jan 28, 2022.
Learning outcomes
The course will be devoted to the mathematical background necessary to describe, analyze and derive value from datasets, possibly Big Data and unstructured, and to master the main probabilistic models used in the data science field. Starting from basic models, for example regressions, PCA-based predictors, Bayesian statistics, filters, etc., particular emphasis will be placed on mathematically rigorous quantitative approaches aimed at optimizing the data collection, cleaning and organization phases (e.g. series historical data, unstructured data generated in social media, semantic elements, etc.). The mathematical tools necessary to deal with the description of the time series, their analysis and forecasts will also be introduced. The contents of the entire course will be structured in interaction with the study of real problems relating to industrial, economic, social, etc., heterogeneous sectors, using software oriented to probabilistic modeling, for example, Knime, ElasticSearch, Kibana, R AnalyticFlow, Orange , etc.
At the end of the course the student has to show to have acquired the following skills:
● know and know how to use the basic tools for the treatment of time series and their indicators, e.g.,
● know and know how to develop forecasting solutions based on statistical inferential models, eg, AR, MA, ARMA, ARIMA, ARIMAX: Box-Jenkins, partial self-variance and autocorrelation, seasonality (SARIMA), analysis in variance (ANOVA, MANOVA), etc .
● knowing how to identify the parameters that characterize a certain population via methods such as error minimization, maximum likelihood, etc.
● know how to estimate / identify / reconstruct characteristics related to first-order analysis, smoothing techniques, spectral decomposition, polynomial fitting, etc.
Program
The course program is divided into the following macro-topics
Time domain analysis
Frequency domain analysis
Data analysis and cleaning tools (e.g. identification of outliers)
Maximum likelihood methods, likelihood metrics, probability density fitting
Principal Component Analysis (PCA) [PCA-based regressors / predictors]
AR, MA, ARMA, ARIMA, Box-Jenkins, ARCH, GARCH models and generalizations
Time series decomposition
ACF / PACF and related "views"
Hypothesis test
Gaussian / jump / compound processes
Decomposition of "white noise" processes
Bayesian statistics and applications
Forecast evaluations via consideration of inferential statistical models, based, e.g.,
on autocovariance and partial autocorrelation, seasonality (SARIMA), variance analysis (ANOVA, MANOVA), etc.
Smoothing techniques, spectral decomposition, polynomial fitting, etc.
Creation of the models referred to in the previous points for the resolution of concrete case studies.
The latter aspect will mainly, but not exclusively, concern Python coding as well as using statistical/probabilistic libraries and software such as, e.g., Knime, ElasticSearch, Kibana, R, TensorFlow, Prophet, AnalyticFlow, Orange, etc.
Bibliography
Examination Methods
The final exam consists of two parts: theoretical and practical/coding.
Consequently, the first is functional to the learning verification of theoretical concepts characterizing statistical methods and associated models/algorithms, then exploited to solve a project chosen by the student in agreement with the course's lecturers.
Such a "case study", together with the discussion of the coding parts realized to complete it, will be the subject of the second and final part of the exam.