Statistical models for data science
Scientific Disciplinary Sector (SSD)
MAT/06 - PROBABILITY AND STATISTICS
Language of instruction
Secondo semestre dal Mar 7, 2022 al Jun 10, 2022.
The course will be devoted to the mathematical background necessary to describe, analyze and derive value from datasets, possibly Big Data and unstructured, and to master the main probabilistic models used in the data science field. Starting from basic models, for example regressions, PCA-based predictors, Bayesian statistics, filters, etc., particular emphasis will be placed on mathematically rigorous quantitative approaches aimed at optimizing the data collection, cleaning and organization phases (e.g. series historical data, unstructured data generated in social media, semantic elements, etc.). The mathematical tools necessary to deal with the description of the time series, their analysis and forecasts will also be introduced. The contents of the entire course will be structured in interaction with the study of real problems relating to industrial, economic, social, etc., heterogeneous sectors, using software oriented to probabilistic modeling, for example, Knime, ElasticSearch, Kibana, R AnalyticFlow, Orange , etc.
At the end of the course the student has to show to have acquired the following skills:
● know and know how to use the basic tools for the treatment of time series and their indicators, e.g.,
● know and know how to develop forecasting solutions based on statistical inferential models, eg, AR, MA, ARMA, ARIMA, ARIMAX: Box-Jenkins, partial self-variance and autocorrelation, seasonality (SARIMA), analysis in variance (ANOVA, MANOVA), etc .
● knowing how to identify the parameters that characterize a certain population via methods such as error minimization, maximum likelihood, etc.
● know how to estimate / identify / reconstruct characteristics related to first-order analysis, smoothing techniques, spectral decomposition, polynomial fitting, etc.