Lecture 21: Stacking CS109A Introduction to Data Science Pavlos - PowerPoint PPT Presentation

Lecture 21: Stacking CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader

Outline General Review of Methods • Stacking • CS109A, P ROTOPAPAS , R ADER 2

Module 1: Regression Methods When is it appropriate to perform a regression method? What regression models have we learned? 1. Linear Regression (simple, multiple, polynomial, interactions, model selection, Ridge & Lasso, etc...) 2. k -NN 3. Regression Trees What is the main difference between these two types of models (advantages and disadvantages)? When should you use each method? CS109A, P ROTOPAPAS , R ADER 3

Module 2: Classification Methods When is it appropriate to perform a classification method? What classification models have we learned? 1. Logistic Regression: same details as linear regression apply 2. k -NN 3. Discriminant Analysis: LDA/QDA 4. Classification Trees 5. SVM What is the main difference between these two types of models (advantages and disadvantages)? When should you use each method? CS109A, P ROTOPAPAS , R ADER 4

Module 3: Ensemble Methods What does it mean for a model to be an ensemble method? 1. Bagging Trees 2. Random Forests 3. Boosting Models 4. Neural Networks 5. Stacking Models (coming today) What approach does each model take to improve prediction accuracy? CS109A, P ROTOPAPAS , R ADER 5

Bags and Forests of Trees (cont.) Bagging: • create an ensemble of full trees, each trained on a bootstrap sample of the training set; • “average” the predictions Random forest: • create an ensemble of full trees, each trained on a bootstrap sample of the training set; • in each tree and each split, randomly select a subset of predictors, choose a predictor from this subset for splitting; • average the predictions Note that the ensemble building aspects of both method are embarrassingly parallel! CS109A, P ROTOPAPAS , R ADER 6

Bags and Forests of Trees (cont.) Boosting: • Iteratively build a model from lots of little models. • Each subsequent model predicts the residuals from the previous model, overweighting the large residuals. Neural Networks: • Build layers of models based on overly simple “neurons” of models. • Uses back-propagation to efficiently communicate between output of models to update earlier models. These methods are not as easily parallelizable. CS109A, P ROTOPAPAS , R ADER 7

Stacking CS109A, P ROTOPAPAS , R ADER 8

Motivation for Stacking For each of our ensemble methods so far (besides Neural Networks), we have: • Fit the base model on the same type (regression trees, for example). • Combined the predictions in a naïve way. Stacking is a way to generalize the ensembling approach to combine outputs of various types of model, and improves on the combination as well. CS109A, P ROTOPAPAS , R ADER 9

Motivation for Stacking Recall that in boosting, the final model T , we learn is a weighted sum of simple models, T h , where ! h is the learning rate. In AdaBoost for example, we can analytically determine the optimal values of ! h for each simple model T h . On the other hand, we can also determine the final model T implicitly by learning any model, called meta-learner, that transforms the outputs of T h into a prediction . CS109A, P ROTOPAPAS , R ADER 10

Stacked Generalization The framework for stacked generalization or stacking (Wolpert 1992) is: • train L number of models, T l on the training data • train a meta-learner ! T on the predictions of the ensemble of models, i.e. train using the data CS109A, P ROTOPAPAS , R ADER 11

Stacking: an Illustration CS109A, P ROTOPAPAS , R ADER 12

Stacked Generalization Stacking is a very general method, • the models, T l , in the ensemble can come from different classes. The ensemble can contain a mixture of logistic regression models, trees, random forests, etc. • the meta-learner, T , can be of any type. Note: we want to train T on the out of sample predictions of the ensemble. For example we train T on where T l ( x n ) is generated by training T l on CS109A, P ROTOPAPAS , R ADER 13

Stacking: General Guidelines The flexibility of stacking makes it widely applicable but difficult to analyze theoretically. Some general rules have been found through empirical studies: • models in the ensemble should be diverse, i.e. their errors should not be correlated. • for binary classification, each model in the ensemble should have error rate < 1/2. • if models in the ensemble outputs probabilities, it’s better to train the meta-learner on probabilities rather than predictions. • apply regularization to the meta-learner to avoid overfitting. CS109A, P ROTOPAPAS , R ADER 14

Stacking: Subsemble Approach We can extend the stacking framework to include ensembles of models that specialize on small subsets of data (Sapp et. al. 2014), for de-correlation or improved computational efficiency: • divide the data into J subsets • train models, T j , on each subset • train a meta-learner ! T on the predictions of the ensemble of models, i.e. train using the data Again, we want to make sure that each T j ( x i ) is an out of sample prediction. CS109A, P ROTOPAPAS , R ADER 15

Stacking in sklearn Unfortunately, Python does not have stacking algorithms implemented for you L So how can we do it? We can set it up by ‘manually’ fitting several base models, take the outputs of those models, and fitting the meta model on the outputs of those base models. It’s a model on models! CS109A, P ROTOPAPAS , R ADER 16

Lecture 21: Stacking CS109A Introduction to Data Science Pavlos - PowerPoint PPT Presentation

Lecture 21: Stacking CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader Outline General Review of Methods Stacking CS109A, P ROTOPAPAS , R ADER 2 Module 1: Regression Methods When is it appropriate to perform a

Book Stacking Harmonic Sums table Albert R Meyer, April 6, 2012 Albert R Meyer,

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Information Option Stacking (draft-zheng-dhc-relay-agent-stacking-00) Robin Zheng IETF 76 - DHC

CRS stacking: a simplified explanation Motivation CRS stack Jrgen Mann 1 , Jrg Schleicher 2 ,

Showmanship Gaiting, Stacking, and More Topics What is Showmanship? Ring Producer

Presenting a live 90-minute webinar with interactive Q&A All-Sums-With-Stacking Rule:

Stacking Blocks and Counting Permutations Hana Mizuno Occidental College mizuno@oxy.edu

Stacking Energies and RNA Structure Prediction Bioinformatics Senior Project Adrian Lawsin

Efficient Training of BERT by Progressively Stacking Linyuan Gong, Di He , Zhuohan Li, Tao Qin,

Progressive Stacking in Chat We invite BIPOC (Black, Indigenous, People of Color) to add an

Parameterization, stacking, and with the CRS Stack method inversion of locally coherent events

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

What is Cross Laminated Timber (CLT)? It is a prefabricated panel formed by stacking layers of

Hot Forming Line Isgec Solution for g Hot Forming Line Composition : g p 1. De-Stacking and

IEE5008 Autumn 2012 Memory Systems 3D Stacking SRAM Anwar,Hossameldin Department of

Stacking Your Way to Success September 12, 2017 onpointforcollege.org

Investor Presentation August 2020 Safe Harbor Statement This Presentation may contain certain

An Update on Ovintiv September 2020 1 Ratings Action The Company issued the following statement

A World Leader in Imaging Components Forward-Looking Statements This presentation contains

Bananas A: Bananas! B: What? A: Bananas! C: Do you know where do we grow bananas? A &

Caltrain Electrification Update Bike Advisory Committee August 24, 2017 Construction Update

Company Overview Industry Overview Manufacturing Facilities Financials Company Overview Bodal

Q3 FY2016 Earnings Presentation February 2016 Important Notice This presentation contains

PEARL GLOBAL INDUSTRIES LIMITED Q1 FY17 RESULTS UPDATE August 2016 Private & Confidential

Lecture 21: Stacking CS109A Introduction to Data Science Pavlos - PowerPoint PPT Presentation

Lecture 21: Stacking CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader Outline General Review of Methods Stacking CS109A, P ROTOPAPAS , R ADER 2 Module 1: Regression Methods When is it appropriate to perform a

Book Stacking Harmonic Sums table Albert R Meyer, April 6, 2012 Albert R Meyer,

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Information Option Stacking (draft-zheng-dhc-relay-agent-stacking-00) Robin Zheng IETF 76 - DHC

CRS stacking: a simplified explanation Motivation CRS stack Jrgen Mann 1 , Jrg Schleicher 2 ,

Showmanship Gaiting, Stacking, and More Topics What is Showmanship? Ring Producer

Presenting a live 90-minute webinar with interactive Q&amp;A All-Sums-With-Stacking Rule:

Stacking Blocks and Counting Permutations Hana Mizuno Occidental College mizuno@oxy.edu

Stacking Energies and RNA Structure Prediction Bioinformatics Senior Project Adrian Lawsin

Efficient Training of BERT by Progressively Stacking Linyuan Gong, Di He , Zhuohan Li, Tao Qin,

Progressive Stacking in Chat We invite BIPOC (Black, Indigenous, People of Color) to add an

Parameterization, stacking, and with the CRS Stack method inversion of locally coherent events

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

What is Cross Laminated Timber (CLT)? It is a prefabricated panel formed by stacking layers of

Hot Forming Line Isgec Solution for g Hot Forming Line Composition : g p 1. De-Stacking and

IEE5008 Autumn 2012 Memory Systems 3D Stacking SRAM Anwar,Hossameldin Department of

Stacking Your Way to Success September 12, 2017 onpointforcollege.org

Investor Presentation August 2020 Safe Harbor Statement This Presentation may contain certain

An Update on Ovintiv September 2020 1 Ratings Action The Company issued the following statement

A World Leader in Imaging Components Forward-Looking Statements This presentation contains

Bananas A: Bananas! B: What? A: Bananas! C: Do you know where do we grow bananas? A &amp;

Caltrain Electrification Update Bike Advisory Committee August 24, 2017 Construction Update

Company Overview Industry Overview Manufacturing Facilities Financials Company Overview Bodal

Q3 FY2016 Earnings Presentation February 2016 Important Notice This presentation contains

PEARL GLOBAL INDUSTRIES LIMITED Q1 FY17 RESULTS UPDATE August 2016 Private &amp; Confidential

Presenting a live 90-minute webinar with interactive Q&A All-Sums-With-Stacking Rule:

Bananas A: Bananas! B: What? A: Bananas! C: Do you know where do we grow bananas? A &

PEARL GLOBAL INDUSTRIES LIMITED Q1 FY17 RESULTS UPDATE August 2016 Private & Confidential