Stacking for supervised learning Stacking for supervised learning - PowerPoint PPT Presentation

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL, University of Ulster 1

Ensemble learning Ensemble learning l Postulate multiple hypotheses to explain the data l Shortcomings of single model learning algorithms (Dietterich , 2002) � Statistical problem � Computational problem � Representational problem 2

Ensemble learning Ensemble learning l Generalization Error: Bias + Variance – Bias: how close the algorithm’s average prediction is close to the target – Variance : how much the algorithm’s predictions “bounces round” for different training sets – a model which is too simple, or too inflexible, will have a large bias – a model which has too much flexibility will have high variance 3

Ensemble learning Ensemble learning l Generalization Error: Ensembles – Ensembles reduce bias and/or variance – Ensembles to be effective – need diverse and accurate base models – Diversity measured by level of variability in base members predictions (for regression) 4

Ensemble learning Ensemble learning § Homogeneous learning - data sampling, feature sampling, randomization, parameter settings § Heterogeneous learning - Same data, different learning algorithms 5

Ensemble Learning Ensemble Learning Class Prediction Combiner Class Predictions . . . Classifier 1 Classifier 2 Classifier N Input Features 6

Ensemble learning Ensemble learning Methods of combination: � Voting, Weighting, Selection � Mixture of experts � Error-correcting output codes � Bagging � Boosting � Stacking 7

Ensemble Learning: Stacking Ensemble Learning: Stacking instance Base Model 1 Base Model 2 … Base Model n Meta Model Prediction 8

Meta Technique: SR Meta Technique: SR Meta-M( f 1 ( x * ),..., f m ( x * ) ) Final Prediction Combining (Meta-Level) CV Meta-training set Meta-M model { } 1 x x ( f ( ),..., f ( ), y ) j m j j Base Predictions f m ( x * ) f 1 ( x * ) f 2 ( x * ) Base ... Model M M M m 1 2 f i instance Instance x * 9

Stacking for classification Stacking for classification § Use class distributions from base classifiers rather than class predictions {( ( P C | ),..., x P C ( | ),..., x P C ( | ),..., x P C ( | ), )} x y 1 1 1 k m 1 m k § Choice of Meta-classifier: Multi-response linear regression - For a classification with m class values, m regression problems - Only use probabilities related to class C j to predict class C j 10

Stacking for classification Stacking for classification § Different “type” of base classifers § Multi-response model trees used to guarantee better performance than Selecting best classifier 11

Stacking for regression Stacking for regression § Linear regression requires non-negative weights § Model trees meta-learner § Homogeneous Stacking using random feature sub-sets § Feature sub-sets can be improved upon using hill-climbing or GA techniques 12

Related techniques:Mutiple techniques:Mutiple meta meta- - Related levels levels Cascade Generalization <x> Classifer 1 <x,P 1 (C 1 ) ,.. P 1 (C k )> Classifer 2 Classifer 3 <x,P 1 (C 1 ),..,P 1 (C k ), P 2 (C1),..,P 2 (Ck) > 13

Related techniques:Mutiple techniques:Mutiple meta meta- - Related levels levels Combiner Trees Combiner 3 Combiner 1 Combiner 2 Classifer 1 Classifer 2 Classifer 3 Classifer 4 T 1 T 2 T 1 T 1 Disjoint training sets 14

Related Techniques: Dynamic Related Techniques: Dynamic Integration Integration Meta-M ( f 1 ( x * ),..., f m ( x * ) ) Final Prediction Combining model Meta- Level Training Set Meta-M (Meta-level) { (x j ,Err 1 ( x j ),..,Err m ( x j ),y j ) } Base errors Err i (x)=|f i (x)-y i | f 1 (x * ) f 2 (x * ) f m (x * ) Base ... Model M M M 1 2 m f i instance x * 15

Dynamic Integration Dynamic Integration Meta Model - distance weighted k-NN Meta-M l NN – set of k nearest meta-instances l For each member find cumulative error of each model 16

Dynamic Integration Dynamic Integration l Dynamic Selection (DS) – choose the model with lowest cumulative error l Dynamic Weighting (DW) – combine the models with weights based on their cumulative error l Dynamic Weighting with Selection (DWS) – combine the models as DW but exclude models if they have larger than median cumulative error 17

Applications Applications l Distributed data mining l Intrusion detection l Concept drift 18

Key papers Key papers Wolpert, D. H.: Stacked Generalization. Neural Networks, 5 l (1992) 241-259 Breiman, L.: Stacked Regressions. Machine Learning, 24 (1996) l 49-64 Dietterich, T. G.: Ensemble Methods in Machine Learning. l Lecture Notes in Computer Science, 1857 (2000) 1-15 Dzeroski, S., & Zenko, B.: Is Combining Classifiers with l Stacking Better than Selecting the Best One? Machine Learning, 54 (2004) 255-273 Ting, K. M., & Witten, I. H.: Issues in Stacked Generalization. l Journal of Artificial Intelligence Research, 10 (1999) 271-289 19

Stacking for supervised learning Stacking for supervised learning - PowerPoint PPT Presentation

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL, University of Ulster 1 Ensemble learning Ensemble learning l Postulate multiple hypotheses to explain the data l Shortcomings of single model learning

Book Stacking Harmonic Sums table Albert R Meyer, April 6, 2012 Albert R Meyer,

Information Option Stacking (draft-zheng-dhc-relay-agent-stacking-00) Robin Zheng IETF 76 - DHC

CRS stacking: a simplified explanation Motivation CRS stack Jrgen Mann 1 , Jrg Schleicher 2 ,

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Showmanship Gaiting, Stacking, and More Topics What is Showmanship? Ring Producer

Presenting a live 90-minute webinar with interactive Q&A All-Sums-With-Stacking Rule:

Stacking Blocks and Counting Permutations Hana Mizuno Occidental College mizuno@oxy.edu

Lecture 21: Stacking CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader

Stacking Energies and RNA Structure Prediction Bioinformatics Senior Project Adrian Lawsin

Efficient Training of BERT by Progressively Stacking Linyuan Gong, Di He , Zhuohan Li, Tao Qin,

Progressive Stacking in Chat We invite BIPOC (Black, Indigenous, People of Color) to add an

Towards Proximity Graph Auto-Configuration: an Approach Based on Meta-learning Rafael S. Oyamada,

Meta-policies for Distributed Role-based Access Control Andrs Belokosztolszki, Ken Moody

Meta Queries Workshop Scott Joyce Advanced Meta Queries Which table do I use? How do I

Bayesian Meta-Learning CS 330 1 Logistics Homework 2 due next Wednesday. Project proposal due in

Improving Cross-Validation Classifier Selection Accuracy through Meta- learning Jesse H. Krijthe

A toolkit for metainferential logics David Ripley Monash University http://davewripley.rocks

Efficient Off-Policy Meta- Reinforcement Learning via Probabilistic Context Variables Rakelly,

(Meta-)Datamanagement with KNIME SWIB 2017 Workshop SWIIB 2017 Workshop KNIME 1 Your mentors

Stacking for supervised learning Stacking for supervised learning - PowerPoint PPT Presentation

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL, University of Ulster 1 Ensemble learning Ensemble learning l Postulate multiple hypotheses to explain the data l Shortcomings of single model learning

Book Stacking Harmonic Sums table Albert R Meyer, April 6, 2012 Albert R Meyer,

Information Option Stacking (draft-zheng-dhc-relay-agent-stacking-00) Robin Zheng IETF 76 - DHC

CRS stacking: a simplified explanation Motivation CRS stack Jrgen Mann 1 , Jrg Schleicher 2 ,

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Showmanship Gaiting, Stacking, and More Topics What is Showmanship? Ring Producer

Presenting a live 90-minute webinar with interactive Q&amp;A All-Sums-With-Stacking Rule:

Stacking Blocks and Counting Permutations Hana Mizuno Occidental College mizuno@oxy.edu

Lecture 21: Stacking CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader

Stacking Energies and RNA Structure Prediction Bioinformatics Senior Project Adrian Lawsin

Efficient Training of BERT by Progressively Stacking Linyuan Gong, Di He , Zhuohan Li, Tao Qin,

Progressive Stacking in Chat We invite BIPOC (Black, Indigenous, People of Color) to add an

Towards Proximity Graph Auto-Configuration: an Approach Based on Meta-learning Rafael S. Oyamada,

Meta-policies for Distributed Role-based Access Control Andrs Belokosztolszki, Ken Moody

Meta Queries Workshop Scott Joyce Advanced Meta Queries Which table do I use? How do I

Bayesian Meta-Learning CS 330 1 Logistics Homework 2 due next Wednesday. Project proposal due in

Improving Cross-Validation Classifier Selection Accuracy through Meta- learning Jesse H. Krijthe

A toolkit for metainferential logics David Ripley Monash University http://davewripley.rocks

Efficient Off-Policy Meta- Reinforcement Learning via Probabilistic Context Variables Rakelly,

(Meta-)Datamanagement with KNIME SWIB 2017 Workshop SWIIB 2017 Workshop KNIME 1 Your mentors

Presenting a live 90-minute webinar with interactive Q&A All-Sums-With-Stacking Rule: