STK-IN4300 Statistical Learning Methods in Data Science
Riccardo De Bin
debin@math.uio.no
STK-IN4300: lecture 5 1/ 41
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De - - PowerPoint PPT Presentation
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no STK-IN4300: lecture 5 1/ 41 STK-IN4300 - Statistical Learning Methods in Data Science Outline of the lecture Kernel Smoothing Methods One dimensional
STK-IN4300: lecture 5 1/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 2/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 3/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 4/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 5/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 6/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 7/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ for Epanechnikov, biquadratic or tricubic kernels Ñ radius of
§ for Gaussian kernel, standard deviation;
§ λ small Ñ ˆ
§ λ large Ñ more points Ñ stronger effect of averaging;
§ adapt to the local density (fix k as in kNN); § expressed by substituting λ with hλpx0q in (1); § keep bias constant, variance is inversely proportional to the
STK-IN4300: lecture 5 8/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 9/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 10/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ λ Ñ 0 reduce the bias;
§ λ Ñ 8 reduce the variance.
STK-IN4300: lecture 5 11/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 12/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ estimates are less accurate close to the boundaries; § less observations; § asymmetry in the kernel;
§ possibly more weight on a single xi; § there can be different yi for the same xi. STK-IN4300: lecture 5 13/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 14/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 15/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ combine the weighting kernel Kλpx0, ¨q and the LS operator. STK-IN4300: lecture 5 16/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 17/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 18/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ no trimming the hills and filling the gaps effect. STK-IN4300: lecture 5 19/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 20/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 21/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 22/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ the MSE is asymptotically dominated by boundary effects;
STK-IN4300: lecture 5 23/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 24/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 25/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ the fraction of points at the boundary increases to 1 by
§ curse of dimensionality;
§ it is impossible to maintain localness (small bias) and sizeable
§ again, curse of dimensionality. STK-IN4300: lecture 5 26/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ A diagonal, increase or decrease the importance of the
§ low rank versions of A Ñ projection pursuit; STK-IN4300: lecture 5 27/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ remove all interaction terms,
j“1 gjpXjq;
§ keep only the first order interactions,
j“1 gjpXjq ` ř kăℓ gkℓpXk, Xℓq;
§ . . . STK-IN4300: lecture 5 28/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ solution via least squares estimator;
STK-IN4300: lecture 5 29/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 30/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 31/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 32/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 33/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 34/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 35/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 36/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ the method often works well with “bad” parametric starts; § the better the approximation, the better the result, though;
§ back to the classic kernel estimator. STK-IN4300: lecture 5 37/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
§ when f0pxq is a good guess, better performance! STK-IN4300: lecture 5 38/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 39/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
2 4 6 8 10 12 0.00 0.05 0.10 0.15
Concentration of theophylline
N = 132 Bandwidth = 0.1849 ('ucv') Density start gamma, kernel gamma start gamma, kernel Gaussian start Gaussian, kernel Gaussian start Gaussian, kernel gamma
STK-IN4300: lecture 5 40/ 41
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 5 41/ 41