Lab #10: Demonstration of AdaBoost CS109A Introduction to Data - - PowerPoint PPT Presentation

lab 10 demonstration of adaboost
SMART_READER_LITE
LIVE PREVIEW

Lab #10: Demonstration of AdaBoost CS109A Introduction to Data - - PowerPoint PPT Presentation

Lab #10: Demonstration of AdaBoost CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1 Our Data Training Data Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes No Yes 180


slide-1
SLIDE 1

CS109A Introduction to Data Science

Pavlos Protopapas, Kevin Rader, and Chris Tanner

Lab #10: Demonstration of AdaBoost

1

slide-2
SLIDE 2

CS109A, PROTOPAPAS, RADER, TANNER

2

Our Data

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data

slide-3
SLIDE 3

CS109A, PROTOPAPAS, RADER, TANNER

3

Bagging

  • 1. Shuffle (i.e., bootstrap the data)
  • 2. Train a new decision tree Ti

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data

slide-4
SLIDE 4

CS109A, PROTOPAPAS, RADER, TANNER

4

Bagging

  • 1. Shuffle (i.e., bootstrap the data)
  • 2. Train a new decision tree Ti

Do N times

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data

We have {T1, T2, T3, …, TN}

slide-5
SLIDE 5

CS109A, PROTOPAPAS, RADER, TANNER

5

Bagging

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

  • 1. Shuffle (i.e., bootstrap the data)
  • 2. Train a new decision tree Ti

Do N times

Training Data

We have {T1, T2, T3, …, TN}

Chest Pain Blocked Arteries Patient Weight Heart Disease No Yes 158 ?

Testing Data

slide-6
SLIDE 6

CS109A, PROTOPAPAS, RADER, TANNER

6

Bagging

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

  • 1. Shuffle (i.e., bootstrap the data)
  • 2. Train a new decision tree Ti

Do N times

Chest Pain Blocked Arteries Patient Weight Heart Disease No Yes 158 ?

Training Data Testing Data

We have {T1, T2, T3, …, TN}

Take a majority vote from {T1, T2, T3, …, TN}

slide-7
SLIDE 7

CS109A, PROTOPAPAS, RADER, TANNER

7

Boosting

1. Shuffle (i.e., bootstrap the data) 2. Select a random subset of Pi features 3. Train a new decision tree Ti

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data

slide-8
SLIDE 8

CS109A, PROTOPAPAS, RADER, TANNER

8

Boosting

1. Shuffle (i.e., bootstrap the data) 2. Select a random subset of Pi features 3. Train a new decision tree Ti

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data

Do N times We have {T1, T2, T3, …, TN}

slide-9
SLIDE 9

CS109A, PROTOPAPAS, RADER, TANNER

9

Boosting

1. Shuffle (i.e., bootstrap the data) 2. Select a random subset of Pi features 3. Train a new decision tree Ti

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data

Do N times We have {T1, T2, T3, …, TN}

Chest Pain Blocked Arteries Patient Weight Heart Disease No Yes 158 ?

Testing Data Take a majority vote from {T1, T2, T3, …, TN}

slide-10
SLIDE 10

CS109A, PROTOPAPAS, RADER, TANNER

10

Ideas

“Fool me once, shame on … shame on you. Fool me – you can’t get fooled again” –George W. Bush

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data

We have {T1, T2, T3, …, TN}

“Fool me once, shame on you; fool me twice, shame

  • n me” –Proverb
slide-11
SLIDE 11

CS109A, PROTOPAPAS, RADER, TANNER

11

Ideas

“Fool me once, shame on … shame on you. Fool me – you can’t get fooled again” –George W. Bush

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data

We have {T1, T2, T3, …, TN}

“Fool me once, shame on you; fool me twice, shame

  • n me” –Proverb

Let’s learn from our mistakes!

slide-12
SLIDE 12

CS109A, PROTOPAPAS, RADER, TANNER

12

Gradient Boosting

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data

We have {T1, T2, T3, …, TN}

slide-13
SLIDE 13

CS109A, PROTOPAPAS, RADER, TANNER

13

Gradient Boosting

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data

We have {T1, T2, T3, …, TN}

Each Th is:

  • a ”weak”/simple decision tree
  • built after the previous tree
  • tries to learn the shortcomings (the

errors/residuals) from the previous tree’s predictions

slide-14
SLIDE 14

CS109A, PROTOPAPAS, RADER, TANNER

Gradient Boosting: illustration

14

slide-15
SLIDE 15

CS109A, PROTOPAPAS, RADER, TANNER

Gradient Boosting: illustration

15

slide-16
SLIDE 16

CS109A, PROTOPAPAS, RADER, TANNER

Gradient Boosting: illustration

16

slide-17
SLIDE 17

CS109A, PROTOPAPAS, RADER, TANNER

Gradient Boosting: illustration

17

slide-18
SLIDE 18

CS109A, PROTOPAPAS, RADER, TANNER

Gradient Boosting: illustration

18

slide-19
SLIDE 19

CS109A, PROTOPAPAS, RADER, TANNER

Gradient Boosting: illustration

19

slide-20
SLIDE 20

CS109A, PROTOPAPAS, RADER, TANNER

Gradient Boosting: illustration

20

slide-21
SLIDE 21

CS109A, PROTOPAPAS, RADER, TANNER

Gradient Boosting: illustration

21

slide-22
SLIDE 22

CS109A, PROTOPAPAS, RADER, TANNER

22

Gradient Boosting

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data

We have {T1, T2, T3, …, TN}

We can determine each !h by using gradient descent.

slide-23
SLIDE 23

CS109A, PROTOPAPAS, RADER, TANNER

23

Idea

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data If we have categorical data (not a regression task), we can use AdaBoost

slide-24
SLIDE 24

CS109A, PROTOPAPAS, RADER, TANNER

24

Idea

Chest Pain Blocked Arteries Patient Weight Heart Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No

Training Data If we have categorical data (not a regression task), we can use AdaBoost 1. Train a single weak (stump) Decision Tree Ti 2. Calculate the total error of your predictions 3. Use this error ( ) to determine how much stock to place in that Tree 4. Update the weights of each observation 5. Update our running model T !i

slide-25
SLIDE 25

CS109A, PROTOPAPAS, RADER, TANNER

AdaBoost

With a minor adjustment to the exponential loss function, we have the algorithm for gradient descent:

  • 1. Choose an initial distribution over the training data, !" = 1/&.

2. At the ith step, fit a simple classifier T(i) on weighted training data

  • 3. Update the weights:

where Z is the normalizing constant for the collection of updated weights

  • 4. Update ': ' ← ' + +(-)'(-)

where + is the learning rate.

25

{ 01, !131 , … , (05, !535)}

slide-26
SLIDE 26

CS109A, PROTOPAPAS, RADER, TANNER

AdaBoost: start with equal weights

26

slide-27
SLIDE 27

CS109A, PROTOPAPAS, RADER, TANNER

AdaBoost: fit a simple decision tree

27

slide-28
SLIDE 28

CS109A, PROTOPAPAS, RADER, TANNER

AdaBoost: update the weights

28

slide-29
SLIDE 29

CS109A, PROTOPAPAS, RADER, TANNER

AdaBoost: fit another simple decision tree on re-weighted data

29

slide-30
SLIDE 30

CS109A, PROTOPAPAS, RADER, TANNER

AdaBoost: add the new model to the ensemble

30

! ← ! + $(&)!(&)

slide-31
SLIDE 31

CS109A, PROTOPAPAS, RADER, TANNER

AdaBoost: update the weights

31

slide-32
SLIDE 32

CS109A, PROTOPAPAS, RADER, TANNER

AdaBoost: fit a third, simple decision tree on re-weighted data

32

slide-33
SLIDE 33

CS109A, PROTOPAPAS, RADER, TANNER

AdaBoost: add the new model to the ensemble, repeat…

33

! ← ! + $(&)!(&)

slide-34
SLIDE 34

CS109A, PROTOPAPAS, RADER, TANNER

Choosing the Learning Rate

Unlike in the case of gradient boosting for regression, we can analytically solve for the optimal learning rate for AdaBoost, by

  • ptimizing:

Doing so, we get that

34