STK-IN4300 Statistical Learning Methods in Data Science
Riccardo De Bin
debin@math.uio.no
STK-IN4300: lecture 13 1/ 30
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De - - PowerPoint PPT Presentation
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no STK-IN4300: lecture 13 1/ 30 STK-IN4300 - Statistical Learning Methods in Data Science Outline of the lecture Feature Assessment when p " N
STK-IN4300: lecture 13 1/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 2/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
§ in this lecture M is the number of variables (as in the book);
STK-IN4300: lecture 13 3/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
§ selection by a procedure with variable selection property; § absolute value of a regression coefficient in lasso; § variable importance plots (boosting, random forests, . . . );
§ univariate tests;
STK-IN4300: lecture 13 4/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
§ 44 patients with normal reaction; § 14 patients who had a severe reaction. STK-IN4300: lecture 13 5/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 6/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
§ ¯
iPCk xkj{Nk;
§ Ck are the indexes of the Nk observations of group k; § sej “ ˆ
1 N1 ` 1 N2 ;
§ ˆ
j “ 1 N1`N2´2
iPC1pxij ´ ¯
iPC2pxij ´ ¯
STK-IN4300: lecture 13 7/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 8/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
§ expected falsely significant genes, 12625 ¨ 0.05 “ 631.25; § standard deviation,
STK-IN4300: lecture 13 9/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 10/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 11/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
§ positive dependence is typical in genomic studies. STK-IN4300: lecture 13 12/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
§ it is easy to show that FWER ď α;
STK-IN4300: lecture 13 13/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 14/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 15/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 16/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 17/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 18/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 19/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
§ univariate response Y ; § N ˆ p covariate matrix X.
STK-IN4300: lecture 13 20/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 21/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 22/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 23/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 24/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 25/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 26/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 27/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
§ sensible values are in p0.6, 0.9q;
§ Λ : qΛ “ ?0.8p
§ Λ : qΛ “ ?0.8αp
§ q is given by the number of variables which enter in the model; § for lasso, find λmin : | Ť
λmaxěλěλmin ˆ
STK-IN4300: lecture 13 28/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
§ exact error control is possible; § the method works fine even though the noise level is unknown;
STK-IN4300: lecture 13 29/ 30
STK-IN4300 - Statistical Learning Methods in Data Science
STK-IN4300: lecture 13 30/ 30