lecture 20
play

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University - PowerPoint PPT Presentation

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University Last time Bias/Variance Tradeo ff slide by David Sontag Graphical illustration of bias and variance. http://scott.fortmann-roe.com/docs/BiasVariance.html 2 Last time


  1. Lecture 20: − AdaBoost Aykut Erdem December 2017 Hacettepe University

  2. Last time… Bias/Variance Tradeo ff slide by David Sontag Graphical illustration of bias and variance. http://scott.fortmann-roe.com/docs/BiasVariance.html 2

  3. Last time… Bagging • Leo Breiman (1994) • Take repeated bootstrap samples from training set D. • Bootstrap sampling: Given set D containing N training examples, create D ’ by drawing N examples at random with replacement from D. • Bagging: - Create k bootstrap samples D 1 ... D k . - Train distinct classifier on each D i . - Classify new instance by majority vote / average. slide by David Sontag 3

  4. Last time… Random Forests Tree t=1 t=2 t=3 slide by Nando de Freitas [From the book of Hastie, Friedman and Tibshirani] 4

  5. Last time… Boosting • Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers vote • On each iteration t : - weight each training example by how incorrectly it was classified - Learn a hypothesis – h t - A strength for this hypothesis – a t • Final classifier: - A linear combination of the votes of the di ff erent classifiers weighted by their strength slide by Aarti Singh & Barnabas Poczos • Practically useful • Theoretically interesting 5

  6. The AdaBoost Algorithm 6

  7. Voted combination of classifiers • The general problem here is to try to combine many simple “weak” classifiers into a single “strong” classifier • We consider voted combinations of simple binary ±1 component classifiers where the (non-negative) votes α i can be used to 
 emphasize component classifiers that are more 
 reliable than others slide by Tommi S. Jaakkola 7

  8. Components: Decision stumps • Consider the following simple family of component classifiers generating ±1 labels: where These are called decision 
 stumps. • Each decision stump pays attention to only a single component of the input vector slide by Tommi S. Jaakkola 8

  9. Voted combinations (cont’d.) • We need to define a loss function for the combination so we can determine which new component h (x; θ ) to add and how many votes it should receive 
 • While there are many options for the loss function we consider here only a simple exponential loss slide by Tommi S. Jaakkola 9

  10. Modularity, errors, and loss • Consider adding the m th component: slide by Tommi S. Jaakkola 10

  11. Modularity, errors, and loss • Consider adding the m th component: slide by Tommi S. Jaakkola 11

  12. Modularity, errors, and loss • Consider adding the m th component: 
 • So at the m th iteration the new component (and the votes) slide by Tommi S. Jaakkola should optimize a weighted loss (weighted towards mistakes). 12

  13. Empirical exponential loss (cont’d.) • To increase modularity we’d like to further decouple the optimization of h (x; θ m ) from the associated votes α m • To this end we select h (x; θ m ) that optimizes the rate at which the loss would decrease as a function of α m slide by Tommi S. Jaakkola 13

  14. 
 Empirical exponential loss (cont’d.) • We find that minimizes • We can also normalize the weights: 
 slide by Tommi S. Jaakkola so that 14

  15. Empirical exponential loss (cont’d.) • We find that minimizes 
 where • is subsequently chosen to minimize slide by Tommi S. Jaakkola 15

  16. 16 The AdaBoost Algorithm slide by Jiri Matas and Jan Š ochman

  17. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i ∈ X , y i ∈ { − 1 , +1 } slide by Jiri Matas and Jan Š ochman 17

  18. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i ∈ X , y i ∈ { − 1 , +1 } Initialise weights D 1 ( i ) = 1 /m slide by Jiri Matas and Jan Š ochman 18

  19. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m ⌅ Find h t = arg min h j ∈ H ✏ j = P D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop slide by Jiri Matas and Jan Š ochman 19

  20. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m ⌅ Find h t = arg min h j ∈ H ✏ j = P D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) slide by Jiri Matas and Jan Š ochman 20

  21. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m ⌅ Find h t = arg min h j ∈ H ✏ j = P D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) Z t where Z t is normalisation factor slide by Jiri Matas and Jan Š ochman 21

  22. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m P ⌅ Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 22

  23. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 2 For t = 1 , ..., T : m P ⌅ Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 23

  24. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 3 For t = 1 , ..., T : m ⌅ P Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 24

  25. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 4 For t = 1 , ..., T : m ⌅ P Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 25

  26. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 5 For t = 1 , ..., T : m P ⌅ Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 26

  27. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 6 For t = 1 , ..., T : m P ⌅ Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 27

  28. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 7 For t = 1 , ..., T : m P ⌅ Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend