stk in4300 statistical learning methods in data science
play

STK-IN4300 Statistical Learning Methods in Data Science Statistical - PowerPoint PPT Presentation

STK-IN4300 - Statistical Learning Methods in Data Science Outline of the lecture AdaBoost Introduction algorithm STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a forward stagewise additive modelling


  1. STK-IN4300 - Statistical Learning Methods in Data Science Outline of the lecture AdaBoost Introduction algorithm STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a forward stagewise additive modelling Why exponential loss? Riccardo De Bin Steepest Descent Gradient boosting debin@math.uio.no STK-IN4300: lecture 10 1/ 30 STK-IN4300: lecture 10 2/ 30 STK-IN4300 - Statistical Learning Methods in Data Science STK-IN4300 - Statistical Learning Methods in Data Science AdaBoost: introduction AdaBoost: introduction Starting challenge: L. Breiman: “[Boosting is] the best off-shelf classifier in the world” . “Can [a committee of blockheads] somehow arrive at highly ‚ originally developed for classification; reasoned decisions, despite the weak judgement of the individual ‚ as a pure machine learning black-box; members?” (Schapire & Freund, 2014) ‚ translated into the statistical world (Friedman et al., 2000); Goal: create a good classifier by combining several weak classifiers, ‚ extended to every statistical problem (Mayr et al., 2014), ‚ in classification, a “weak classifier” is a classifier which is able § regression; to produce results only slightly better than a random guess; § survival analysis; § . . . Idea: apply repeatedly (iteratively) a weak classifier to ‚ interpretable models, thanks to the statistical view; modifications of the data, ‚ extended to work in high-dimensional settings (B¨ uhlmann, ‚ at each iteration, give more weight to the misclassified 2006). observations. STK-IN4300: lecture 10 3/ 30 STK-IN4300: lecture 10 4/ 30

  2. STK-IN4300 - Statistical Learning Methods in Data Science STK-IN4300 - Statistical Learning Methods in Data Science AdaBoost: introduction AdaBoost: algorithm Consider a two-class classification problem, y i P t´ 1 , 1 u , x i P R p . AdaBoost algorithm: 1. initialize the weights, w r 0 s “ p 1 { N, . . . , 1 { N q ; 2. for m from 1 to m stop , (a) fit the weak estimator G p¨q to the weighted data; (b) compute the weighted in-sample misclassification rate, err r m s “ ř N i “ 1 w r m ´ 1 s 1 p y i ‰ ˆ G r m s p x i qq ; i (c) compute the voting weights, α r m s “ log pp 1 ´ err r m s q{ err r m s q ; (d) update the weights w i “ w r m ´ 1 s exp t α r m s 1 p y i ‰ ˆ G r m s p x i qqu ; ˜ § i § w r m s w i { ř N “ ˜ i “ 1 ˜ w i ; i 3. compute the final result, m stop α r m s ˆ ˆ ÿ G r m s p x qq G AdaBoost “ sign p m “ 1 STK-IN4300: lecture 10 5/ 30 STK-IN4300: lecture 10 6/ 30 STK-IN4300 - Statistical Learning Methods in Data Science STK-IN4300 - Statistical Learning Methods in Data Science AdaBoost: example AdaBoost: example First iteration: ‚ apply the classifier G p¨q on observations with weights: 1 2 3 4 5 6 7 8 9 10 w i 0 . 10 0 . 10 0 . 10 0 . 10 0 . 10 0 . 10 0 . 10 0 . 10 0 . 10 0 . 10 ‚ observations 1, 2 and 3 are misclassified ñ err r 1 s “ 0 . 3 ; ‚ compute α r 1 s “ 0 . 5 log pp 1 ´ err r 1 s q{ err r 1 s q « 0 . 42 ; ‚ set w i “ w i exp t α r 1 s 1 p y i ‰ ˆ G r 1 s p x i qqu : 1 2 3 4 5 6 7 8 9 10 w i 0 . 15 0 . 15 0 . 15 0 . 07 0 . 07 0 . 07 0 . 07 0 . 07 0 . 07 0 . 07 figure from Schapire & Freund (2014) STK-IN4300: lecture 10 7/ 30 STK-IN4300: lecture 10 8/ 30

  3. STK-IN4300 - Statistical Learning Methods in Data Science STK-IN4300 - Statistical Learning Methods in Data Science AdaBoost: example AdaBoost: example Second iteration: ‚ apply classifier G p¨q on re-weighted observations ( w i { ř i w i ): 1 2 3 4 5 6 7 8 9 10 w i 0 . 17 0 . 17 0 . 17 0 . 07 0 . 07 0 . 07 0 . 07 0 . 07 0 . 07 0 . 07 ‚ observations 6, 7 and 9 are misclassified ñ err r 2 s « 0 . 21 ; ‚ compute α r 2 s “ 0 . 5 log pp 1 ´ err r 2 s q{ err r 2 s q « 0 . 65 ; ‚ set w i “ w i exp t α r 2 s 1 p y i ‰ ˆ G r 2 s p x i qqu : 1 2 3 4 5 6 7 8 9 10 w i 0 . 09 0 . 09 0 . 09 0 . 04 0 . 04 0 . 14 0 . 14 0 . 04 0 . 14 0 . 04 figure from Schapire & Freund (2014) STK-IN4300: lecture 10 9/ 30 STK-IN4300: lecture 10 10/ 30 STK-IN4300 - Statistical Learning Methods in Data Science STK-IN4300 - Statistical Learning Methods in Data Science AdaBoost: example AdaBoost: example Third iteration: ‚ apply classifier G p¨q on re-weighted observations ( w i { ř i w i ): 1 2 3 4 5 6 7 8 9 10 w i 0 . 11 0 . 11 0 . 11 0 . 05 0 . 05 0 . 17 0 . 17 0 . 05 0 . 17 0 . 05 ‚ observations 4, 5 and 8 are misclassified ñ err r 3 s « 0 . 14 ; ‚ compute α r 3 s “ 0 . 5 log pp 1 ´ err r 3 s q{ err r 3 s q « 0 . 92 ; ‚ set w i “ w i exp t α r 3 s 1 p y i ‰ ˆ G r 3 s p x i qqu : 1 2 3 4 5 6 7 8 9 10 w i 0 . 04 0 . 04 0 . 04 0 . 11 0 . 11 0 . 07 0 . 07 0 . 11 0 . 07 0 . 02 figure from Schapire & Freund (2014) STK-IN4300: lecture 10 11/ 30 STK-IN4300: lecture 10 12/ 30

  4. STK-IN4300 - Statistical Learning Methods in Data Science STK-IN4300 - Statistical Learning Methods in Data Science AdaBoost: example AdaBoost: example figure from Schapire & Freund (2014) STK-IN4300: lecture 10 13/ 30 STK-IN4300: lecture 10 14/ 30 STK-IN4300 - Statistical Learning Methods in Data Science STK-IN4300 - Statistical Learning Methods in Data Science Statistical Boosting: Boosting as a forward stagewise additive modelling Statistical Boosting: Boosting as a forward stagewise additive modelling The statistical view of boosting is based on the concept of forward stagewise additive modelling : ‚ minimizes a loss function L p y i , f p x i qq ; ‚ using an additive model, f p x q “ ř M m “ 1 β m b p x ; γ m q ; (see notes) § b p x ; γ m q is the basis, or weak learner; ‚ at each step, ř N p β m , γ m q “ argmin β,γ i “ 1 L p y i , f m ´ 1 p x i q ` βb p x i ; γ qq ; ‚ the estimate is updated as f m p x q “ f m ´ 1 p x q ` β m b p x ; γ m q ‚ e.g., in AdaBoost, β m “ α m { 2 , b p x ; γ m q “ G p x q ; STK-IN4300: lecture 10 15/ 30 STK-IN4300: lecture 10 16/ 30

  5. STK-IN4300 - Statistical Learning Methods in Data Science STK-IN4300 - Statistical Learning Methods in Data Science Statistical Boosting: Why exponential loss? Statistical Boosting: Why exponential loss? Note: The statistical view of boosting: ‚ the exponential loss is not the only possible loss-function; ‚ allows to interpret the results; ‚ by studying the properties of the exponential loss; ‚ deviance (cross/entropy): binomial negative log-likelihood, ´ ℓ p π x q “ ´ y 1 log p π x q ´ p 1 ´ y 1 q log p 1 ´ π x q , It is easy to show that where: 2 log Pr p Y “ 1 | x q f ˚ p x q “ argmin f p x q E Y | X “ x r e ´ Y f p x q s “ 1 Pr p Y “ ´ 1 | x q , § y 1 “ p y ` 1 q{ 2 , i.e., y 1 P t 0 , 1 u ; e f p x q § π x “ Pr p Y “ 1 | X “ x q “ 1 e ´ f p x q ` e f p x q “ 1 ` e ´ 2 f p x q ; i.e. ‚ equivalently, 1 Pr p Y “ 1 | x q “ ´ ℓ p π x q “ log p 1 ` e ´ 2 yf p x q q . 1 ` e ´ 2 f ˚ p x q ; ‚ same population minimizers for E r´ ℓ p π x qs and E r e ´ Y f p x q s . therefore AdaBoost estimates 1/2 the log-odds of Pr p Y “ 1 | x q . STK-IN4300: lecture 10 17/ 30 STK-IN4300: lecture 10 18/ 30 STK-IN4300 - Statistical Learning Methods in Data Science STK-IN4300 - Statistical Learning Methods in Data Science Statistical Boosting: Why exponential loss? Statistical Boosting: steepest descent We saw that AdaBoost iteratively minimizes a loss function. In general, consider ‚ L p f q “ ř N i “ 1 L p y i , f p x i qq ; ‚ ˆ f “ argmin f L p f q ; ‚ the minimization problem can be solved by considering m stop ÿ f m stop “ h m m “ 0 where: § f 0 “ h 0 is the initial guess; § each f m improves the previous f m ´ 1 through h m ; § h m is called “step”. STK-IN4300: lecture 10 19/ 30 STK-IN4300: lecture 10 20/ 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend