STK-IN4300 Statistical Learning Methods in Data Science
Riccardo De Bin
debin@math.uio.no
STK-IN4300: lecture 9 1/ 46
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De - - PowerPoint PPT Presentation
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no STK-IN4300: lecture 9 1/ 46 STK-IN4300 - Statistical Learning Methods in Data Science Outline of the lecture Random Forests Definition of Random
STK-IN4300: lecture 9 1/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 2/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 3/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
§ classification:
§ regression: tp{3u. STK-IN4300: lecture 9 4/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 5/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
§ where Θb “ tRb, cbu characterizes the tree in terms of split
§ where ˆ
STK-IN4300: lecture 9 6/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 7/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 8/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
§ MedInc: median income of the people living in the neighbour; § House: house density (number of houses); § AveOccup: average occupancy of the house; § longitude: longitude of the house; § latitude: latitude of the house; § AveRooms: average number of rooms per house; § AveBedrms: average number of bedrooms per house. STK-IN4300: lecture 9 9/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 10/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 11/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 12/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 13/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 14/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 15/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 16/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
total variance
VarZ ˆ frfpxq
within-Z variance
§ decreases with m decreasing;
§ increases with m decreasing; STK-IN4300: lecture 9 17/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 18/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 19/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 20/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 21/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 22/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
§ the value for the variable with highest importance is set to 100; § the other values are rescaled. STK-IN4300: lecture 9 23/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
§ the prediction error is computed on the OOB sample; § same procedure on randomly permuted values of the OOB
§ the decrease of accuracy is registered;
STK-IN4300: lecture 9 24/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 25/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
§ it does not measure the effect on prediction were this variable
STK-IN4300: lecture 9 26/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 27/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 28/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 29/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 30/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 31/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 32/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 33/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 34/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 35/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 36/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
§ i.e., how to adapt the metric;
STK-IN4300: lecture 9 37/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 38/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 39/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
§ i.e., directions in which the observed class means do not differ;
§ to avoid using points far away; § empirically, ǫ “ 1 works generally well;
§ B “ 0 Ñ Σ “ I; § remember the X have to be scaled. STK-IN4300: lecture 9 40/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 41/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 42/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 43/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
§ if the tree is grown to maximal size; § i.e., one observation per leaf.
STK-IN4300: lecture 9 44/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 45/ 46
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 9 46/ 46