Stacking for supervised learning Stacking for supervised learning - - PowerPoint PPT Presentation

stacking for supervised learning stacking for supervised
SMART_READER_LITE
LIVE PREVIEW

Stacking for supervised learning Stacking for supervised learning - - PowerPoint PPT Presentation

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL, University of Ulster 1 Ensemble learning Ensemble learning l Postulate multiple hypotheses to explain the data l Shortcomings of single model learning


slide-1
SLIDE 1

1

Stacking for supervised learning Stacking for supervised learning

Niall Rooney, NIKEL, University of Ulster

slide-2
SLIDE 2

2

Ensemble learning Ensemble learning

l Postulate multiple hypotheses to explain the

data

l Shortcomings of single model learning

algorithms (Dietterich , 2002)

Statistical problem Computational problem Representational problem

slide-3
SLIDE 3

3

Ensemble learning Ensemble learning

l Generalization Error: Bias + Variance

– Bias: how close the algorithm’s average prediction is close to the target – Variance : how much the algorithm’s predictions “bounces round” for different training sets – a model which is too simple, or too inflexible, will have a large bias – a model which has too much flexibility will have high variance

slide-4
SLIDE 4

4

Ensemble learning Ensemble learning

l Generalization Error: Ensembles

– Ensembles reduce bias and/or variance – Ensembles to be effective – need diverse and accurate base models – Diversity measured by level of variability in base members predictions (for regression)

slide-5
SLIDE 5

5

Ensemble learning Ensemble learning

§ Homogeneous learning

  • data sampling, feature sampling,

randomization, parameter settings

§ Heterogeneous learning

  • Same data, different learning algorithms
slide-6
SLIDE 6

6

Ensemble Learning Ensemble Learning

Classifier 1 Classifier 2 Classifier N

. . .

Input Features Combiner Class Predictions Class Prediction

slide-7
SLIDE 7

7

Ensemble learning Ensemble learning

Methods of combination:

Voting, Weighting, Selection Mixture of experts Error-correcting output codes Bagging Boosting Stacking

slide-8
SLIDE 8

8

Ensemble Learning: Stacking Ensemble Learning: Stacking

Base Model1 Base Model 2 Base Modeln …

Meta Model

Prediction instance

slide-9
SLIDE 9

9

Meta Technique: SR Meta Technique: SR

CV Meta-training set

{ }

( ( ),..., ( ), ) f f y

j m j j 1 x

x

...

M

1

M

2

Mm

instance

Instance x* Base Model fi Base Predictions f 1(x*) f 2(x*) f m(x*) Combining (Meta-Level) model

Meta-M

Final Prediction Meta-M(f 1(x* ),..., f m(x*) )

slide-10
SLIDE 10

10

Stacking for classification Stacking for classification

§ Use class distributions from base classifiers

rather than class predictions

1 1 1 1

{( ( | ),..., ( | ),..., ( | ),..., ( | ), )}

k m m k

P C x P C x P C x P C x y

§ Choice of Meta-classifier:

Multi-response linear regression

  • For a classification with m class values, m

regression problems

  • Only use probabilities related to class Cj to

predict class Cj

slide-11
SLIDE 11

11

Stacking for classification Stacking for classification

§ Different “type” of base classifers § Multi-response model trees used to

guarantee better performance than Selecting best classifier

slide-12
SLIDE 12

12

Stacking for regression Stacking for regression

§ Linear regression requires non-negative

weights

§ Model trees meta-learner § Homogeneous Stacking using random

feature sub-sets

§ Feature sub-sets can be improved upon

using hill-climbing or GA techniques

slide-13
SLIDE 13

13

Related Related techniques:Mutiple techniques:Mutiple meta meta-

  • levels

levels

Classifer1 <x> Classifer2 Classifer3

<x,P1(C1),..P1(Ck)> <x,P1(C1),..,P1(Ck), P2(C1),..,P2(Ck) >

Cascade Generalization

slide-14
SLIDE 14

14

Related Related techniques:Mutiple techniques:Mutiple meta meta-

  • levels

levels

Classifer1 Classifer2 Classifer3

Combiner Trees

Classifer4 T1 T2 T1 T1 Combiner1 Combiner2 Combiner3 Disjoint training sets

slide-15
SLIDE 15

15

Related Techniques: Dynamic Related Techniques: Dynamic Integration Integration

Meta- Level Training Set

...

M

1

M

2

M

m

instance x* Base Model fi Base errors Combining model (Meta-level)

Meta-M

Final Prediction Meta-M( f 1(x* ),..., f m(x*) )

{(xj,Err1(xj),..,Errm(xj),yj)}

fm(x*) f2(x*) f1(x*) Erri(x)=|fi(x)-yi|

slide-16
SLIDE 16

16

Dynamic Integration Dynamic Integration

Meta-M Meta Model - distance weighted k-NN

l NN – set of k nearest meta-instances l For each member find cumulative error

  • f each model
slide-17
SLIDE 17

17

Dynamic Integration Dynamic Integration

l Dynamic Selection (DS)

– choose the model with lowest cumulative error

l Dynamic Weighting (DW)

– combine the models with weights based on their cumulative error

l Dynamic Weighting with Selection (DWS)

– combine the models as DW but exclude models if they have larger than median cumulative error

slide-18
SLIDE 18

18

Applications Applications

l Distributed data mining l Intrusion detection l Concept drift

slide-19
SLIDE 19

19

Key papers Key papers

l

Wolpert, D. H.: Stacked Generalization. Neural Networks, 5 (1992) 241-259

l

Breiman, L.: Stacked Regressions. Machine Learning, 24 (1996) 49-64

l

Dietterich, T. G.: Ensemble Methods in Machine Learning. Lecture Notes in Computer Science, 1857 (2000) 1-15

l

Dzeroski, S., & Zenko, B.: Is Combining Classifiers with Stacking Better than Selecting the Best One? Machine Learning, 54 (2004) 255-273

l

Ting, K. M., & Witten, I. H.: Issues in Stacked Generalization. Journal of Artificial Intelligence Research, 10 (1999) 271-289