Ensemble Methods Or, Model Combination Based on lecture by - PowerPoint PPT Presentation

CSE 6242/CX 4242 Ensemble Methods Or, Model Combination Based on lecture by Parikshit Ram

Numerous Possible Classifiers! Classifier Training Cross Testing Accuracy time validation time kNN None Can be slow Slow ?? classifier Decision Slow Very slow Very fast ?? trees Naive Fast None Fast ?? Bayes classifier … … … … …

Which Classifier/Model to Choose? Possible strategies: • Go from simplest model to more complex model until you obtain desired accuracy • Discover a new model if the existing ones do not work for you • Combine all (simple) models

Common Strategy: Bagging   (Bootstrap Aggregating) Consider the data set S = {(x i , y i )} i=1,..,n • Pick a sample S * with replacement of size n from S • Train on this set S * to get a classifier f * • Repeat above steps B times to get f 1 , f 2 ,...,f B • Final classifier f(x) = majority {f b (x)} j=1,...,B

Common Strategy: Bagging Why would bagging work? • Combining multiple classifiers reduces the variance of the final classifier When would this be useful? • We have a classifier with high variance (any examples?)

Bagging decision trees Consider the data set S • Pick a sample S * with replacement of size n from S • Grow a decision tree T b greedily • Repeat B times to get T 1 ,...,T B • The final classifier will be

Random Forests Almost identical to bagging decision trees, except we introduce some randomness: • Randomly pick any m of the d attributes available • Grow the tree only using those m attributes That is, Bagged random decision trees = Random forests

Points about random forests Algorithm parameters • Usual values for m: • Usual value for B : keep increasing B until the training error stabilizes

Bagging/Random forests Consider the data set S = {(x i , y i )} i=1,..,n • Pick a sample S * with replacement of size n from S • Do the training on this set S * to get a classifier (e.g. random decision tree) f * • Repeat the above step B times to get f 1 , f 2 ,...,f B • Final classifier f(x) = majority {f b (x)} j=1,...,B

Final words Advantages • Efficient and simple training • Allows you to work with simple classifiers • Random-forests generally useful and accurate in practice (one of the best classifiers) • Embarrassingly parallelizable Caveats: • Needs low-bias classifiers • Can make a not-good-enough classifier worse

Final words Reading material • Bagging: ESL Chapter 8.7 • Random forests: ESL Chapter 15 http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf

Strategy 2: Boosting Consider the data set S = {(x i , y i )} i=1,..,n • Assign a weight w (i,0) = (1/n) to each i • Repeat for t = 1,...,T : o Train a classifier f t on S that minimizes the weighted loss: o Obtain a weight a t for the classifier f t o Update the weight for every point i to w (i, t+1) as following: � Increase the weights for i: � Decrease the weights for i: • Final:

Final words on boosting Advantages • Extremely useful in practice and has great theory as well • Can work with very simple classifiers Caveats: • Training is inherently sequential o Hard to parallelize Reading material: • ESL book, Chapter 10 • Le Song's slides: http://www.cc.gatech.edu/~lsong/teaching/CSE6704/lecture9.pdf

Visualizing Classification Usual tools • ROC curve / cost curves o True-positive rate vs.   false-positive rate • Confusion matrix

Visualizing Classification Newer tool • Visualize the data and class boundary with 2D projection (dimensionality reduction)

Weights in combined models Bagging / Random forests • Majority voting Let people play with the weights?

EnsembleMatrix http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009-EnsembleMatrix.pdf

Understanding performance • • http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009-EnsembleMatrix.pdf

Improving performance http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009- EnsembleMatrix.pdf

Improving performance • Adjust the weights of the individual classifiers • Data partition to separate problem areas o Adjust weights just for these individual parts • State-of-the-art performance, on one dataset http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009-EnsembleMatrix.pdf

ReGroup - Naive Bayes at work http://www.cs.washington.edu/ai/pubs/amershiCHI2012_ReGroup.pdf

ReGroup Gender, Age group Y - In group? Family X - Features of a friend Home city/state/country P(Y = true|X) = ? Current city/state/country High school/college/grad school Compute P(X d |Y = true) for Workplace each feature d using the Amount of correspondence current group members Recency of correspondence (how?) Friendship duration # of mutual friends Amount seen together Features to represent each friend http://www.cs.washington.edu/ai/pubs/amershiCHI2012_ReGroup.pdf

ReGroup Not exactly Y - In group? X - Features of a friend classification! P(Y|X) = P(X|Y)P(Y)/P(X) P(X|Y) • Reorder remaining = P(X 1 |Y)*...*P(X d | friends with respect Y) to P(X|Y=true) • "Train" every time a Compute P(X i |Y = true) new member is for every feature d added to the group using the current group members • Use simple counting http://www.cs.washington.edu/ai/pubs/amershiCHI2012_ReGroup.pdf

Some additional reading • Interactive machine learning o http://research.microsoft.com/en-us/um/redmond/groups/cue/iml/ o http://research.microsoft.com/en-us/um/people/samershi/pubs.html o http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/ CHI2009-EnsembleMatrix.pdf o http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/ AAAI2012-PnP.pdf o http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/ AAAI2012-L2L.pdf

Ensemble Methods Or, Model Combination Based on lecture by - PowerPoint PPT Presentation

CSE 6242/CX 4242 Ensemble Methods Or, Model Combination Based on lecture by Parikshit Ram Numerous Possible Classifiers! Classifier Training Cross Testing Accuracy time validation time kNN None Can be slow Slow ?? classifier

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos

Ensemble Methods Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun & Rich Zemels

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Suggested Interac.vity: Seeking Perceived Affordances for Informa.on Visualiza.on

SUMMER MANNERS Kate Naito, CPDT-KA Doggie Academy PURPOSE Teach your dog to pay attention: 1.

Cross-Domain Cue Switching Tiansi Dong tdong@uni-bonn.de AI Foundations Group Bonn-Aachen

A limiting random analytic function related to the CUE Joseph Najnudel Joint work with Reda

Feb 14 Occasional, but accident is not wanted common practice increases frequency and

United States of America v. Apple, Inc. Opening Statement June 3, 2013 PX-0514 Amazon

Stochastic modeling of neurophysiological and behavioral data: attention and decision making in

What Can You Say With Only What Can You Say With Only Three Pixels? Three Pixels? Christopher

Sambuz

Useful Links

Newsletter

Mail Us

Ensemble Methods Or, Model Combination Based on lecture by - PowerPoint PPT Presentation

CSE 6242/CX 4242 Ensemble Methods Or, Model Combination Based on lecture by Parikshit Ram Numerous Possible Classifiers! Classifier Training Cross Testing Accuracy time validation time kNN None Can be slow Slow ?? classifier

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos

Ensemble Methods Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math &amp; CS, Emory

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun &amp; Rich Zemels

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

State Song &amp; Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Suggested Interac.vity: Seeking Perceived Affordances for Informa.on Visualiza.on

SUMMER MANNERS Kate Naito, CPDT-KA Doggie Academy PURPOSE Teach your dog to pay attention: 1.

Cross-Domain Cue Switching Tiansi Dong tdong@uni-bonn.de AI Foundations Group Bonn-Aachen

A limiting random analytic function related to the CUE Joseph Najnudel Joint work with Reda

Feb 14 Occasional, but accident is not wanted common practice increases frequency and

United States of America v. Apple, Inc. Opening Statement June 3, 2013 PX-0514 Amazon

Stochastic modeling of neurophysiological and behavioral data: attention and decision making in

What Can You Say With Only What Can You Say With Only Three Pixels? Three Pixels? Christopher

Sambuz

Useful Links

Newsletter

Mail Us

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun & Rich Zemels

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are