Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University - PowerPoint PPT Presentation

Lecture 20: − AdaBoost Aykut Erdem December 2017 Hacettepe University

Last time… Bias/Variance Tradeo ff slide by David Sontag Graphical illustration of bias and variance. http://scott.fortmann-roe.com/docs/BiasVariance.html 2

Last time… Bagging • Leo Breiman (1994) • Take repeated bootstrap samples from training set D. • Bootstrap sampling: Given set D containing N training examples, create D ’ by drawing N examples at random with replacement from D. • Bagging: - Create k bootstrap samples D 1 ... D k . - Train distinct classifier on each D i . - Classify new instance by majority vote / average. slide by David Sontag 3

Last time… Random Forests Tree t=1 t=2 t=3 slide by Nando de Freitas [From the book of Hastie, Friedman and Tibshirani] 4

Last time… Boosting • Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers vote • On each iteration t : - weight each training example by how incorrectly it was classified - Learn a hypothesis – h t - A strength for this hypothesis – a t • Final classifier: - A linear combination of the votes of the di ff erent classifiers weighted by their strength slide by Aarti Singh & Barnabas Poczos • Practically useful • Theoretically interesting 5

The AdaBoost Algorithm 6

Voted combination of classifiers • The general problem here is to try to combine many simple “weak” classifiers into a single “strong” classifier • We consider voted combinations of simple binary ±1 component classifiers where the (non-negative) votes α i can be used to   emphasize component classifiers that are more   reliable than others slide by Tommi S. Jaakkola 7

Components: Decision stumps • Consider the following simple family of component classifiers generating ±1 labels: where These are called decision   stumps. • Each decision stump pays attention to only a single component of the input vector slide by Tommi S. Jaakkola 8

Voted combinations (cont’d.) • We need to define a loss function for the combination so we can determine which new component h (x; θ ) to add and how many votes it should receive   • While there are many options for the loss function we consider here only a simple exponential loss slide by Tommi S. Jaakkola 9

Modularity, errors, and loss • Consider adding the m th component: slide by Tommi S. Jaakkola 10

Modularity, errors, and loss • Consider adding the m th component: slide by Tommi S. Jaakkola 11

Modularity, errors, and loss • Consider adding the m th component:   • So at the m th iteration the new component (and the votes) slide by Tommi S. Jaakkola should optimize a weighted loss (weighted towards mistakes). 12

Empirical exponential loss (cont’d.) • To increase modularity we’d like to further decouple the optimization of h (x; θ m ) from the associated votes α m • To this end we select h (x; θ m ) that optimizes the rate at which the loss would decrease as a function of α m slide by Tommi S. Jaakkola 13

  Empirical exponential loss (cont’d.) • We find that minimizes • We can also normalize the weights:   slide by Tommi S. Jaakkola so that 14

Empirical exponential loss (cont’d.) • We find that minimizes   where • is subsequently chosen to minimize slide by Tommi S. Jaakkola 15

16 The AdaBoost Algorithm slide by Jiri Matas and Jan Š ochman

The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i ∈ X , y i ∈ { − 1 , +1 } slide by Jiri Matas and Jan Š ochman 17

The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i ∈ X , y i ∈ { − 1 , +1 } Initialise weights D 1 ( i ) = 1 /m slide by Jiri Matas and Jan Š ochman 18

The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m ⌅ Find h t = arg min h j ∈ H ✏ j = P D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop slide by Jiri Matas and Jan Š ochman 19

The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m ⌅ Find h t = arg min h j ∈ H ✏ j = P D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) slide by Jiri Matas and Jan Š ochman 20

The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m ⌅ Find h t = arg min h j ∈ H ✏ j = P D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) Z t where Z t is normalisation factor slide by Jiri Matas and Jan Š ochman 21

The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m P ⌅ Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 22

The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 3 For t = 1 , ..., T : m ⌅ P Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 24

The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 4 For t = 1 , ..., T : m ⌅ P Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 25

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University - PowerPoint PPT Presentation

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University Last time Bias/Variance Tradeo ff slide by David Sontag Graphical illustration of bias and variance. http://scott.fortmann-roe.com/docs/BiasVariance.html 2 Last time

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

The Rower Development Guide Explained Peter Sheppard Chief Coach U23s & Juniors James

Deacon Training Palestine Baptist Association Highland Avenue Baptist Church November 1, 2014

Welcome! We cannot solve our problems with the same thinking we used when we created them. ~

Better Habits for Healthier Backs Better Habits for Healthier Backs Protect Your Back with

Practical Application Of New Developments In Antithrombotic And Antiplatelet Therapy In ACS

There is a life cycle of stress Prenatal Early life Adulthood And so on. Thus,

Signs of Stress and Trauma That Parents Need to Know About Warning Signs to Look For

Caregiver Stress in Families of Children with ASD Amy Keefer, PhD, ABPP Clinical Psychologist

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University - PowerPoint PPT Presentation

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University Last time Bias/Variance Tradeo ff slide by David Sontag Graphical illustration of bias and variance. http://scott.fortmann-roe.com/docs/BiasVariance.html 2 Last time

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

The Rower Development Guide Explained Peter Sheppard Chief Coach U23s &amp; Juniors James

Deacon Training Palestine Baptist Association Highland Avenue Baptist Church November 1, 2014

Welcome! We cannot solve our problems with the same thinking we used when we created them. ~

Better Habits for Healthier Backs Better Habits for Healthier Backs Protect Your Back with

Practical Application Of New Developments In Antithrombotic And Antiplatelet Therapy In ACS

There is a life cycle of stress Prenatal Early life Adulthood And so on. Thus,

Signs of Stress and Trauma That Parents Need to Know About Warning Signs to Look For

Caregiver Stress in Families of Children with ASD Amy Keefer, PhD, ABPP Clinical Psychologist

The Rower Development Guide Explained Peter Sheppard Chief Coach U23s & Juniors James