CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay - - PowerPoint PPT Presentation

cs570 data mining classification ensemble methods
SMART_READER_LITE
LIVE PREVIEW

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay - - PowerPoint PPT Presentation

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al. , and Li Xiong Gnay (Emory) Classification: Ensemble Methods Fall 2013 1 /


slide-1
SLIDE 1

CS570 Data Mining Classification: Ensemble Methods

Cengiz Günay

  • Dept. Math & CS, Emory University

Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong

Günay (Emory) Classification: Ensemble Methods Fall 2013 1 / 6

slide-2
SLIDE 2

Today

Due today midnight: Homework #2 – Frequent itemsets Given today: Homework #3 – Classification Today’s menu: Classification: Ensemble Methods

Günay (Emory) Classification: Ensemble Methods Fall 2013 2 / 6

slide-3
SLIDE 3

Ensemble Methods

  • Given a data set, generate multiple models

and combine the results

  • Bagging
  • Random Forests
  • Boosting

– PAC learning significance

slide-4
SLIDE 4

General Idea

slide-5
SLIDE 5

Why does it work?

 Suppose there are 25 base classifiers

Each classifier has error rate, ε = 0.35

Assume classifiers are independent

Probability that the ensemble classifier makes a wrong prediction:

i=13 25

(

25 i ) εi(1−ε)25−i=0.06

slide-6
SLIDE 6

Types of Ensemble Methods

Can be obtained by manipulating:

1 Training set:

Bagging Boosting

Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

slide-7
SLIDE 7

Types of Ensemble Methods

Can be obtained by manipulating:

1 Training set:

Bagging Boosting

2 Input features:

Random forests Multi-objective evolutionary algorithms Forward/backward elimination?

Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

slide-8
SLIDE 8

Types of Ensemble Methods

Can be obtained by manipulating:

1 Training set:

Bagging Boosting

2 Input features:

Random forests Multi-objective evolutionary algorithms Forward/backward elimination?

3 Class labels:

Multi-classes Active learning

Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

slide-9
SLIDE 9

Types of Ensemble Methods

Can be obtained by manipulating:

1 Training set:

Bagging Boosting

2 Input features:

Random forests Multi-objective evolutionary algorithms Forward/backward elimination?

3 Class labels:

Multi-classes Active learning

4 Learning algorithm:

ANNs Decision trees

Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

slide-10
SLIDE 10

Bagging

  • Create a data set by sampling data points with

replacement

  • Create model based on the data set
  • Generate more data sets and models
  • Predict by combining votes

– Classification: majority vote – Prediction: average

slide-11
SLIDE 11

Bagging

 Sampling with replacement  Build classifier on each bootstrap sample  Each sample has probability (1 – 1/n)n of being

selected

Original Data 1 2 3 4 5 6 7 8 9 10 Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9 Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2 Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

slide-12
SLIDE 12

Bagging

Advantages: Less overfitting Helps when classifier is unstable (has high variance) Disadvantages: Not useful when classifier is stable and has large bias

Günay (Emory) Classification: Ensemble Methods Fall 2013 4 / 6

slide-13
SLIDE 13

PAC learning

  • Model defining learning with given accuracy

and confidence using polynomial sample complexity

  • References:

– L. Valiant. A theory of the learnable.

  • http://web.mit.edu/6.435/www/Valiant84.pdf

– D. Haussler. Overview of the Probably Approximately Correct (PAC) Learning Framework

  • http://www.cs.iastate.edu/~honavar/pac.pdf
slide-14
SLIDE 14

Boosting

  • Use weak learners and combine to form strong

learner in PAC learning sense

  • Learn using a weak learner
  • Boost the accuracy by reweighting the examples

misclassified by previous weak learner and forcing the next weak learner to focus on the “hard” examples

  • Predict by using a weighted combination of the

weak learners

– Weight is determined by their accuracy

slide-15
SLIDE 15

Boosting

 An iterative procedure to adaptively change

distribution of training data by focusing more on previously misclassified records

Initially, all N records are assigned equal weights

Unlike bagging, weights may change at the end of boosting round

slide-16
SLIDE 16

Boosting

 Records that are wrongly classified will have their

weights increased

 Records that are classified correctly will have their

weights decreased

Original Data 1 2 3 4 5 6 7 8 9 10 Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3 Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2 Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

  • Example 4 is hard to classify
  • Its weight is increased, therefore it is more

likely to be chosen again in subsequent rounds

slide-17
SLIDE 17

Boosting

Advantages: Focuses on samples that are hard to classify Sample weights can be used for:

1

Sampling probability

2

Used by classifier to value them more

Adaboost: Calculates classifier importance instead of voting Exponential weight update rules But, susceptible to overfitting

Günay (Emory) Classification: Ensemble Methods Fall 2013 5 / 6

slide-18
SLIDE 18

Example: AdaBoost

 Base classifiers: C1, C2, …, CT  Error rate:  Importance of a classifier:

εi= 1 N ∑

j=1 N

w jδ (Ci( x j)≠ y j)

αi= 1 2 ln( 1−εi εi )

slide-19
SLIDE 19

Example: AdaBoost

 Weight update:  If any intermediate rounds produce error rate

higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated

 Classification:

wi

( j+ 1)=wi

( j )

Z j { exp−α j if C j( xi )=yi exp

α j

if C j( xi )≠yi} where Z j is the normalization factor

( )

=

= =

T j j j y

y x C x C

1

) ( max arg ) ( * δ α

slide-20
SLIDE 20

(C) Vipin Kumar, Parallel Issues in Data Mining, V 11

Illustrating AdaBoost

Data points for training Initial weights for each data point

slide-21
SLIDE 21

(C) Vipin Kumar, Parallel Issues in Data Mining, V 12

Illustrating AdaBoost

slide-22
SLIDE 22

Random Forests

  • Sample a data set with replacement
  • Select m variables at random from p variables
  • Create a tree
  • Similarly create more trees
  • Combine the results
  • Reference:

– Hastie, Tibshirani, Friedman, The Elements of Statistical Learning, Chapter 15

slide-23
SLIDE 23

Random Forests

Advantages: Only for decision trees Lowers generalization error Uses randomization in tree construction: #features= log2 d + 1 Equivalent accuracy to Adaboost, but faster See table in Tan et al p. 294 for comparison of ensemble methods.

Günay (Emory) Classification: Ensemble Methods Fall 2013 6 / 6