STK-IN4300 Statistical Learning Methods in Data Science
Riccardo De Bin
debin@math.uio.no
STK4030: lecture 1 1/ 51
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De - - PowerPoint PPT Presentation
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no STK4030: lecture 1 1/ 51 STK-IN4300 - Statistical Learning Methods in Data Science Outline of the lecture Introduction Overview of supervised learning
STK4030: lecture 1 1/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 2/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 3/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 4/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 5/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
lpsa
−1 1 2 3 4
40 50 60 70 80 1 2 3 4 5
1 2 3 4
3 4 5 6
2 3 4 5 40 50 60 70 80
4 5 6
age
STK4030: lecture 1 6/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 7/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 8/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ quantitative (e.g., stock price, amount of glucose, . . . ); § categorical (e.g., heart attack/no heart attack)
§ examples: age, gender, income, . . .
STK4030: lecture 1 9/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 10/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ p ąą n problem;
§ classify patients with similar
§ predict the chance of getting
STK4030: lecture 1 11/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 12/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ remove variables having low correlation with response; § more formal subset selections; § select a few “best” linear combinations of variables;
§ ridge regression; § lasso (least absolute shrinkage and selection operator) § elastic net. STK4030: lecture 1 13/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ suggestion: use R Studio (www.rstudio.com), available at all
§ encouragement: follow good R programming practices, for
STK4030: lecture 1 14/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ linear model with least squares estimator; § k nearest neighbors;
STK4030: lecture 1 15/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 16/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 17/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 18/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 19/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 20/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ fewer training observations are
§ is this a good criterion? STK4030: lecture 1 21/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ the learner works greatly on the
§ It would be preferable to evaluate
STK4030: lecture 1 22/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 23/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 24/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ example: squared error loss, LpY, fpXqq “ pY ´ fpXqq2. STK4030: lecture 1 25/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 26/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 27/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ no conditioning on X; § we have used our knowledge on the functional relationship to
§ less rigid functional relationship may be considered, e.g.
p
j“1
STK4030: lecture 1 28/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ expectation is approximated by averaging over sample data; § conditioning on a point is relaxed to conditioning on a
STK4030: lecture 1 29/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ small sample size; § curse of dimensionality (see later) STK4030: lecture 1 30/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ the solution is the conditional median
§ more robust estimates than those obtained with the
§ the L1 loss function has discontinuities in its derivatives Ñ
STK4030: lecture 1 31/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ all elements on the diagonal are 0; § often non-diagonal elements are 1 (zero-one loss function). STK4030: lecture 1 32/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
k“1
K
k“1
STK4030: lecture 1 33/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
§ ˆ
§ approximation of this solution.
§ ErYk|Xs “ PrpG “ gk|Xq; § also approximates the Bayes
STK4030: lecture 1 34/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 35/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 36/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 37/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 38/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 39/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 40/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 41/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 42/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 43/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 44/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 45/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 46/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 47/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 48/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 49/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 50/ 51
STK-IN4300 - Statistical Learning Methods in Data Science
STK4030: lecture 1 51/ 51