stacking for supervised learning stacking for supervised
play

Stacking for supervised learning Stacking for supervised learning - PowerPoint PPT Presentation

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL, University of Ulster 1 Ensemble learning Ensemble learning l Postulate multiple hypotheses to explain the data l Shortcomings of single model learning


  1. Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL, University of Ulster 1

  2. Ensemble learning Ensemble learning l Postulate multiple hypotheses to explain the data l Shortcomings of single model learning algorithms (Dietterich , 2002) � Statistical problem � Computational problem � Representational problem 2

  3. Ensemble learning Ensemble learning l Generalization Error: Bias + Variance – Bias: how close the algorithm’s average prediction is close to the target – Variance : how much the algorithm’s predictions “bounces round” for different training sets – a model which is too simple, or too inflexible, will have a large bias – a model which has too much flexibility will have high variance 3

  4. Ensemble learning Ensemble learning l Generalization Error: Ensembles – Ensembles reduce bias and/or variance – Ensembles to be effective – need diverse and accurate base models – Diversity measured by level of variability in base members predictions (for regression) 4

  5. Ensemble learning Ensemble learning § Homogeneous learning - data sampling, feature sampling, randomization, parameter settings § Heterogeneous learning - Same data, different learning algorithms 5

  6. Ensemble Learning Ensemble Learning Class Prediction Combiner Class Predictions . . . Classifier 1 Classifier 2 Classifier N Input Features 6

  7. Ensemble learning Ensemble learning Methods of combination: � Voting, Weighting, Selection � Mixture of experts � Error-correcting output codes � Bagging � Boosting � Stacking 7

  8. Ensemble Learning: Stacking Ensemble Learning: Stacking instance Base Model 1 Base Model 2 … Base Model n Meta Model Prediction 8

  9. Meta Technique: SR Meta Technique: SR Meta-M( f 1 ( x * ),..., f m ( x * ) ) Final Prediction Combining (Meta-Level) CV Meta-training set Meta-M model { } 1 x x ( f ( ),..., f ( ), y ) j m j j Base Predictions f m ( x * ) f 1 ( x * ) f 2 ( x * ) Base ... Model M M M m 1 2 f i instance Instance x * 9

  10. Stacking for classification Stacking for classification § Use class distributions from base classifiers rather than class predictions {( ( P C | ),..., x P C ( | ),..., x P C ( | ),..., x P C ( | ), )} x y 1 1 1 k m 1 m k § Choice of Meta-classifier: Multi-response linear regression - For a classification with m class values, m regression problems - Only use probabilities related to class C j to predict class C j 10

  11. Stacking for classification Stacking for classification § Different “type” of base classifers § Multi-response model trees used to guarantee better performance than Selecting best classifier 11

  12. Stacking for regression Stacking for regression § Linear regression requires non-negative weights § Model trees meta-learner § Homogeneous Stacking using random feature sub-sets § Feature sub-sets can be improved upon using hill-climbing or GA techniques 12

  13. Related techniques:Mutiple techniques:Mutiple meta meta- - Related levels levels Cascade Generalization <x> Classifer 1 <x,P 1 (C 1 ) ,.. P 1 (C k )> Classifer 2 Classifer 3 <x,P 1 (C 1 ),..,P 1 (C k ), P 2 (C1),..,P 2 (Ck) > 13

  14. Related techniques:Mutiple techniques:Mutiple meta meta- - Related levels levels Combiner Trees Combiner 3 Combiner 1 Combiner 2 Classifer 1 Classifer 2 Classifer 3 Classifer 4 T 1 T 2 T 1 T 1 Disjoint training sets 14

  15. Related Techniques: Dynamic Related Techniques: Dynamic Integration Integration Meta-M ( f 1 ( x * ),..., f m ( x * ) ) Final Prediction Combining model Meta- Level Training Set Meta-M (Meta-level) { (x j ,Err 1 ( x j ),..,Err m ( x j ),y j ) } Base errors Err i (x)=|f i (x)-y i | f 1 (x * ) f 2 (x * ) f m (x * ) Base ... Model M M M 1 2 m f i instance x * 15

  16. Dynamic Integration Dynamic Integration Meta Model - distance weighted k-NN Meta-M l NN – set of k nearest meta-instances l For each member find cumulative error of each model 16

  17. Dynamic Integration Dynamic Integration l Dynamic Selection (DS) – choose the model with lowest cumulative error l Dynamic Weighting (DW) – combine the models with weights based on their cumulative error l Dynamic Weighting with Selection (DWS) – combine the models as DW but exclude models if they have larger than median cumulative error 17

  18. Applications Applications l Distributed data mining l Intrusion detection l Concept drift 18

  19. Key papers Key papers Wolpert, D. H.: Stacked Generalization. Neural Networks, 5 l (1992) 241-259 Breiman, L.: Stacked Regressions. Machine Learning, 24 (1996) l 49-64 Dietterich, T. G.: Ensemble Methods in Machine Learning. l Lecture Notes in Computer Science, 1857 (2000) 1-15 Dzeroski, S., & Zenko, B.: Is Combining Classifiers with l Stacking Better than Selecting the Best One? Machine Learning, 54 (2004) 255-273 Ting, K. M., & Witten, I. H.: Issues in Stacked Generalization. l Journal of Artificial Intelligence Research, 10 (1999) 271-289 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend