Chapter 7: Ensemble Learning and Random Forest Dr. Xudong Liu - PowerPoint PPT Presentation

Chapter 7: Ensemble Learning and Random Forest Dr. Xudong Liu Assistant Professor School of Computing University of North Florida Monday, 9/23/2019 1 / 23

Notations 1 Voting classifiers: hard and soft 2 Bagging and pasting Random Forests 3 Boosting: AdaBoost, GradientBoost, Stacking Overview 2 / 23

Hard Voting Classifiers In the setting of binary classification, hard voting is a simple way for an ensemble of classifiers to make predictions, that is, to output the majority winner between the two classes. If multi-classes, output the Plurality winner instead, or the winner according another voting rule. Even if each classifier is a weak learner, the ensemble can be a strong learner under hard voting, provided sufficiently many weak yet diverse learners. Voting Classifiers 3 / 23

Training Diverse Classifiers Voting Classifiers 4 / 23

Hard Voting Predictions Voting Classifiers 5 / 23

Ensemble of Weak is Strong? Think of a slightly biased coin with 51% chance of heads and 49% of tails. Law of large numbers: as you keep tossing the coin, assuming every toss is independent of others, the ratio of heads gets closer and closer to the probability of heads 51%. Voting Classifiers 6 / 23

Ensemble of Weak is Strong? Eventually, all 10 series end up consistently above 50%. As a result, the 10,000 tosses as a whole will output heads with close to 100% chance! Even for an ensemble of 1000 classifiers, each correct 51% of the time, using hard voting it can be of up to 75% accuracy. Scikit-Learn: from sklearn.ensemble import VotingClassifier Voting Classifiers 7 / 23

Soft Voting Predictions If the classifiers in the ensemble have class probabilities , we may use soft voting to aggregate. Soft voting: the ensemble will predict the class with the highest class probability, averaged over all the individual classifiers. Often better than hard voting. Scikit-Learn: set voting=“soft” But how do we train the individual classifiers that are diverse ? Voting Classifiers 8 / 23

Bagging and Pasting Both bagging and pasting are to use the same training algorithm for every predictor, but to train them on different random subsets. Bagging: sampling is performed with replacement. Pasting: sampling is performed without replacement. Scikit-Learn: BaggingClassifier and BaggingRegressor . Set bootstrap=False if you want pasting. Bagging and Pasting 9 / 23

Bagging and Pasting Bagging and Pasting 10 / 23

Random Forests A random forest is an ensemble of decision trees trained using bagging. To ensure diversity, when splitting in a member decision tree, the algorithm searches for the best feature among a random subset of attributes. Bagging and Pasting 11 / 23

Boosting Boosting is a learning method for ensemble models, where individual predictors are trained sequentially , each trying to correct its predecessor. We will talk about AdaBoost (Adaptive Boosting) 1 and GradientBoost 2 . 1 A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting , Freund and Schapire, 1997. 2 Arcing the Edge , Breiman, 1997. Boosting 12 / 23

AdaBoost First, a base classifier is trained and used to predict on the training set. The weights of those misclassified training examples are increased. A second classifier is trained using the the same training set but with updated weightes. Again, weights of misclassified examples are increased. And the algorithm repeats, until either the desired number of predictors is reached, or when a perfect predictor is found. Scikit-Learn: AdaBoostClassifier Boosting 13 / 23

AdaBoost Boosting 14 / 23

AdaBoost Boosting 15 / 23

AdaBoost Algorithm Each example weight w ( i ) in the training set is initialized to 1 m , where m is the number of examples in training set. Set j = 1, we train the j -th predictor and compute its weighted error r j and its weight α j : m w ( i ) � y ( i ) � = y ( i ) i =1 , ˆ , α j = η · log 1 − r j j r j = r j . m w ( i ) � i =1 Next, we increase weights of misclassified examples. For i ← 1 to m , we update y ( i ) � w ( i ) = y ( i ) if ˆ w ( i ) ← j w ( i ) · exp ( α j ) y ( i ) � = y ( i ) if ˆ j Finally, to make predictions, assuming N predictors, we have N y ( i ) � ˆ = arg max α j j k j =1 , ˆ y j ( x )= k Boosting 16 / 23

Gradient Boosting Like AdaBoost, GradientBoost sequentially adds predictors to the ensemble, each one correcting its predecessor. Unlike AdaBoost, GradientBoost tries to fit the new predictor to the residual errors made by the previous predictor. Scikit-Learn: GradientBoostingClassifier and GradientBoostingRegressor Boosting 17 / 23

Gradient Boosting Boosting 18 / 23

Gradient Boosting Boosting 19 / 23

Stacking In the Stacking 3 method, we train a model to perform the aggregation of the predictions from the member predictors in the ensemble, instead of using trivial aggregating functions such as hard and soft voting. 3 Stacked Generalization , Wolpert, 1992. Stacking 20 / 23

Stacking Stacking 21 / 23

Chapter 7: Ensemble Learning and Random Forest Dr. Xudong Liu - PowerPoint PPT Presentation

Chapter 7: Ensemble Learning and Random Forest Dr. Xudong Liu Assistant Professor School of Computing University of North Florida Monday, 9/23/2019 1 / 23 Notations 1 Voting classifiers: hard and soft 2 Bagging and pasting Random Forests 3

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

U.S. Forest Service Forest Service U.S. Forest Inventory and Analysis Forest Service Research

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Ensemble Learning 4/10/17 Ensemble Learning Hypothesis Space: Supervised learning (data has

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Random Forest Applied Multivariate Statistics Spring 2012 Overview Intuition of Random

CURRENT U.S. FOREST DATA AND MAPS Forest age FIA MapMaker Forest ownership TPO Data CURRENT

Introduction to Machine Learning Random Forest: Benchmarking Trees, Forests, and Bagging K-NN

Epping Forest Arts Epping Forest Arts Epping Forest Councils Epping Forest Councils Arts

Forest management associations Forest owners own associations Forest Management Association is

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

ensemble Learning Instructor: Prof. Shuai Huang Industrial and Systems Engineering University of

US Forest Service Presentation Forest Health and Water Implications United State Forest Service

National Forest Monitoring and National Forest Inventory at FAO FAO Forestry

Forest Health Protection Priorities in the US Forest Service Rick Cooksey Continental Dialogue

Logic-based Evaluation of Forest Logic-based Evaluation of Forest Ecosystem Sustainability

10701 Machine Learning Boosting Fighting the bias-variance tradeoff Simple (a.k.a. weak)

In-class Racket quiz October 31 st CS 152: Programming Language Paradigms Taming the Dark,

Functional abstraction Readings: HtDP , sections 21-24. Language level: Intermediate Student With

Curry functional logic language Modern research language Combines functional

Book Stacking Harmonic Sums table Albert R Meyer, April 6, 2012 Albert R Meyer,

CSCE 496/896 Lecture 5: Stephen Scott Autoencoders Introduction Basic Idea Stacked AE Stephen

Information Option Stacking (draft-zheng-dhc-relay-agent-stacking-00) Robin Zheng IETF 76 - DHC

Beating Confusion with Simultaneous Stacking Marco Viero KIPAC/Stanford w/ Lorenzo Moncelsi