STK-IN4300 Statistical Learning Methods in Data Science
Riccardo De Bin
debin@math.uio.no
STK-IN4300: lecture 8 1/ 39
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De - - PowerPoint PPT Presentation
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no STK-IN4300: lecture 8 1/ 39 STK-IN4300 - Statistical Learning Methods in Data Science Outline of the lecture Generalized Additive Models Definition
STK-IN4300: lecture 8 1/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 2/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 3/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 4/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
§ gpµq “ µ Ø identity link Ñ Gaussian models; § gpµq “ logpµ{p1 ´ µqq Ø logit link Ñ Binomial models; § gpµq “ Φ´1pµq Ø probit link Ñ Binomial models; § gpµq “ logpµq Ø logarithmic link Ñ Poisson models; § . . . STK-IN4300: lecture 8 5/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 6/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 7/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
§ each fjpXjq is a cubic spline with knots at the (unique) xij’s. STK-IN4300: lecture 8 8/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
§ the functions average 0 over the data; § α is therefore identifiable; § in particular, ˆ
STK-IN4300: lecture 8 9/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 10/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
§ the degrees of freedom for the j-th terms are tracepSq;
§ not feasible when p ąą N. STK-IN4300: lecture 8 11/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
§ until a stopping criterion applies;
STK-IN4300: lecture 8 12/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 13/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 14/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 15/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 16/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
§ define the two half-hyperplanes, § R1pj, sq “ tX|Xj ď su; § R2pj, sq “ tX|Xj ą su; § the choice of s can be done really quickly;
§ ˆ
§ ˆ
STK-IN4300: lecture 8 17/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
§ intuitive; § short-sighted (a split can be preparatory for a split below).
STK-IN4300: lecture 8 18/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 19/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
§ successively collapse the internal node that produce the
m“1 NmQmpTq;
§ until the single node tree; § find Tα within the sequence;
STK-IN4300: lecture 8 20/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
§ 0-1 loss: N ´1
m
xiPRm 1pyi ‰ kmq;
§ Gini index: řK
k“1 ˆ
§ deviance: řK
k“1 ˆ
§ all three can be extended to consider different error weights. STK-IN4300: lecture 8 21/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 22/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 23/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 24/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 25/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 26/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 27/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 28/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
§ “individuals” are supposed to be independent; § we have only one dataset . . .
STK-IN4300: lecture 8 29/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 30/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 31/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 32/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
§ where qkpxq is the proportion of trees voting for the category k;
§ where prbs
k pxq is the probability assigned by the b-th tree to
STK-IN4300: lecture 8 33/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 34/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 35/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 36/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 37/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 38/ 39
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 8 39/ 39