Introduction CSCE CSCE Sometimes a single classifier (e.g., neural - PDF document

Introduction CSCE CSCE Sometimes a single classifier (e.g., neural network, 478/878 478/878 Lecture 7: Lecture 7: decision tree) won’t perform well, but a weighted CSCE 478/878 Lecture 7: Bagging and Bagging and Boosting Boosting combination of them will Bagging and Boosting Stephen Scott Stephen Scott When asked to predict the label for a new example, each classifier (inferred from a base learner ) makes its Introduction Introduction Outline Outline own prediction, and then the master algorithm (or Bagging Stephen Scott Bagging meta-learner ) combines them using the weights for its Boosting Boosting own prediction If the classifiers themselves cannot learn (e.g., (Adapted from Ethem Alpaydin and Rob Schapire and Yoav Freund) heuristics) then the best we can do is to learn a good set of weights (e.g., Weighted Majority ) If we are using a learning algorithm (e.g., ANN, dec. tree), then we can rerun the algorithm on different subsamples of the training set and set the classifiers’ sscott@cse.unl.edu weights during training 1 / 19 2 / 19 Bagging Outline [Breiman, ML Journal, 1996] CSCE CSCE Bagging = Bootstrap aggregating 478/878 478/878 Lecture 7: Lecture 7: Bootstrap sampling: given a set X containing N training Bagging and Bagging and Boosting Boosting examples: Stephen Scott Stephen Scott Create X j by drawing N examples uniformly at random Introduction Introduction with replacement from X Outline Outline Bagging Bagging Bagging Expect X j to omit ≈ 37 % of examples from X Boosting Experiment Boosting Stability Boosting Bagging: Create L bootstrap samples X 1 , . . . , X L Train classifier d j on X j Classify new instance x by majority vote of learned classifiers (equal weights) Result: An ensemble of classifiers 3 / 19 4 / 19 Bagging Experiment Bagging Experiment [Breiman, ML Journal, 1996] Results CSCE CSCE 478/878 478/878 Lecture 7: Lecture 7: Bagging and Bagging and Boosting Boosting Given sample X of labeled data, Breiman did the following Stephen Scott Stephen Scott Data Set ¯ ¯ Decrease e S e B 100 times and reported avg: waveform 29.0 19.4 33% Introduction Introduction heart 10.0 5.3 47% Outline Divide X randomly into test set T (10%) and train set D Outline 1 breast cancer 6.0 4.2 30% Bagging (90%) Bagging Experiment Experiment ionosphere 11.2 8.6 23% Stability Stability Learn decision tree from D and let e S be error rate on T 2 Boosting Boosting diabetes 23.4 18.8 20% Do 50 times: Create bootstrap set X j and learn 3 glass 32.0 24.9 27% decision tree (so ensemble size = 50). Then let e B be soybean 14.5 10.6 27% the error of a majority vote of the trees on T 5 / 19 6 / 19

Bagging Experiment When Does Bagging Help? (cont’d) CSCE CSCE Same experiment, but using a nearest neighbor classifier, 478/878 478/878 Lecture 7: where prediction of new example x ’s label is that of x ’s Lecture 7: Bagging and Bagging and Boosting Boosting nearest neighbor in training set, where distance is e.g., Stephen Scott Euclidean distance Stephen Scott When learner is unstable , i.e., if small change in training set causes large change in hypothesis produced Introduction Introduction Results Outline Outline Decision trees, neural networks Bagging Bagging Data Set ¯ e S ¯ e B Decrease Experiment Experiment Not nearest neighbor Stability Stability waveform 26.1 26.1 0% Boosting Boosting heart 6.3 6.3 0% Experimentally, bagging can help substantially for unstable breast cancer 4.9 4.9 0% learners; can somewhat degrade results for stable learners ionosphere 35.7 35.7 0% diabetes 16.4 16.4 0% glass 16.4 16.4 0% What happened? 7 / 19 8 / 19 Boosting Boosting [Schapire & Freund Book] Algorithm Idea [ p j ↔ D j ; d j ↔ h j ] CSCE CSCE 478/878 478/878 Lecture 7: Lecture 7: Repeat for j = 1 , . . . , L : Bagging and Bagging and Boosting Boosting Similar to bagging, but don’t always sample uniformly; Run learning algorithm on examples randomly drawn Stephen Scott Stephen Scott 1 instead adjust resampling distribution p j over X to focus from training set X according to distribution p j ( p 1 = Introduction Introduction attention on previously misclassified examples uniform) Outline Outline Can sample X according to p j and train normally, or Final classifier weights learned classifiers, but not uniform; Bagging Bagging directly minimize error on X w.r.t. p j Boosting instead weight of classifier d j depends on its performance Boosting Algorithm Algorithm Output of learner is binary hypothesis d j 2 on data it was trained on Example Example Experimental Results Experimental Results Compute error p j ( d j ) = error of d j on examples from X 3 Miscellany Miscellany Final classifier is weighted combination of d 1 , . . . , d L , where drawn according to p j (can compute exactly) d j ’s weight depends on its error on X w.r.t. p j Create p j + 1 from p j by decreasing weight of instances 4 that d j predicts correctly 9 / 19 10 / 19 Boosting Boosting Algorithm Pseudocode (Fig 17.2) Algorithm Pseudocode (Schapire & Freund) CSCE CSCE Given: (x 1 , y 1 ), . . . , (x m , y m ) where x i ∈ X , y i ∈ { − 1 , + 1 } . 478/878 478/878 Initialize: D 1 (i) = 1 /m for i = 1 , . . . , m . Lecture 7: Lecture 7: Bagging and Bagging and For t = 1 , . . . , T : Boosting Boosting • Train weak learner using distribution D t . Stephen Scott Stephen Scott • Get weak hypothesis h t : X → { − 1 , + 1 } . • Aim: select h t to minimalize the weighted error: Introduction Introduction � t . = Pr i ∼ D t [ h t (x i ) ̸ = y i ] . Outline Outline • Choose α t = 1 � 1 − � t � 2 ln . Bagging Bagging � t • Update, for i = 1 , . . . , m : Boosting Boosting � e − α t Algorithm Algorithm D t + 1 (i) = D t (i) if h t (x i ) = y i Example Example × Z t e α t if h t (x i ) ̸ = y i Experimental Results Experimental Results Miscellany Miscellany = D t (i) exp ( − α t y i h t (x i )) , Z t where Z t is a normalization factor (chosen so that D t + 1 will be a distribution). Output the final hypothesis: � T � � H(x) = sign α t h t (x) . t = 1 11 / 19 12 / 19

Boosting Boosting Schapire & Freund Example: Decision Stumps Schapire & Freund Example: Decision Stumps ⇣ ⌘ ⇣ ⌘ 1 − ✏ j 1 − ✏ j D j = p j ; h j = d j ; α j = 1 2 ln ( 1 / β j ) = 1 D j = p j ; h j = d j ; α j = 1 2 ln ( 1 / β j ) = 1 2 ln 2 ln ✏ j ✏ j D 1 h 1 CSCE CSCE 2 478/878 478/878 Lecture 7: 4 Lecture 7: 1 3 Bagging and Bagging and Boosting Boosting 6 1 2 3 4 5 6 7 8 9 10 5 7 Stephen Scott Stephen Scott D 1 (i) 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 � 1 = 0 . 30, α 1 ≈ 0 . 42 e − α 1 y i h 1 (x i ) 1.53 1.53 1.53 0.65 0.65 0.65 0.65 0.65 0.65 0.65 Introduction 8 10 Introduction D 1 (i) e − α 1 y i h 1 (x i ) 0.15 0.15 0.15 0.07 0.07 0.07 0.07 0.07 0.07 0.07 Z 1 ≈ 0 . 92 Outline 9 Outline D 2 (i) 0.17 0.17 0.17 0.07 0.07 0.07 0.07 0.07 0.07 0.07 � 2 ≈ 0 . 21, α 2 ≈ 0 . 65 Bagging Bagging e − α 2 y i h 2 (x i ) 0.52 0.52 0.52 0.52 0.52 1.91 1.91 0.52 1.91 0.52 D 2 (i) e − α 2 y i h 2 (x i ) 0.09 0.09 0.09 0.04 0.04 0.14 0.14 0.04 0.14 0.04 Z 2 ≈ 0 . 82 Boosting Boosting D 2 h 2 Algorithm Algorithm D 3 (i) 0.11 0.11 0.11 0.05 0.05 0.17 0.17 0.05 0.17 0.05 � 3 ≈ 0 . 14, α 3 ≈ 0 . 92 Example Example e − α 3 y i h 3 (x i ) 0.40 0.40 0.40 2.52 2.52 0.40 0.40 2.52 0.40 0.40 Experimental Results Experimental Results D 3 (i) e − α 3 y i h 3 (x i ) 0.04 0.04 0.04 0.11 0.11 0.07 0.07 0.11 0.07 0.02 Z 3 ≈ 0 . 69 Miscellany Miscellany Calculations are shown for the ten examples as numbered in the figure. Examples on which hypothesis h t makes a mistake are indicated by underlined figures in the rows marked D t . 13 / 19 14 / 19 Boosting Boosting Schapire & Freund Example: Decision Stumps Example (cont’d) ⇣ ⌘ 1 − ✏ j D j = p j ; h j = d j ; α j = 1 2 ln ( 1 / β j ) = 1 2 ln ✏ j CSCE CSCE 478/878 478/878 Lecture 7: Lecture 7: Bagging and Bagging and H = sign 0.42 + 0.65 + 0.92 final Boosting Boosting D 3 Stephen Scott Stephen Scott Introduction Introduction h 3 Outline Outline Not in original Bagging Bagging hypothesis class! Boosting Boosting = Algorithm Algorithm Example Example Experimental Results Experimental Results Miscellany Miscellany In this case, need at least two of the three hypotheses to predict + 1 for weighted sum to exceed 0. 15 / 19 16 / 19 Boosting Boosting Experimental Results Experimental Results (cont’d) CSCE CSCE 478/878 478/878 Lecture 7: Lecture 7: Bagging and Bagging and Scatter plot: Percent classification error of non-boosted vs Boosting Boosting boosted on 27 learning tasks 30 30 Stephen Scott Stephen Scott 25 25 Introduction 30 Introduction Boosting C4.5 80 20 20 Outline Outline 25 C4.5 Bagging Bagging 15 15 60 20 Stumps Boosting Boosting C4.5 10 10 Algorithm Algorithm 15 40 Example Example 5 5 Experimental Results Experimental Results 10 Miscellany Miscellany 20 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 5 Boosting stumps Boosting stumps 0 0 0 20 40 60 80 0 5 10 15 20 25 30 Boosting stumps Boosting C4.5 17 / 19 18 / 19

Introduction CSCE CSCE Sometimes a single classifier (e.g., neural - PDF document

Introduction CSCE CSCE Sometimes a single classifier (e.g., neural network, 478/878 478/878 Lecture 7: Lecture 7: decision tree) wont perform well, but a weighted CSCE 478/878 Lecture 7: Bagging and Bagging and Boosting Boosting

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Data Mining with Weka Class 4 Lesson 1 Classification boundaries Ian H. Witten Department of

NANASP Welcome NANASP is very pleased to have this follow-up webinar to our conference session

Telemedicine Service for Care Homes What is the service? Telemedicine is a service allowing

Collaborating with Health Care Workers September 14, 2020 Jessica Abraham, PharmD, APh Director

WEL ELCOME OME All Program Directors & FM Site Directors Meeting Fr Frid iday ay, , Ma

1 Image Classification BVM 2018 Tutorial: Advanced Deep Learning Methods Jakob Wasserthal,

Big Data in Pharmaceutical Industry (Novartis) Jean-Michel Gaullier, PhD Sollentuna, August 3 rd

Data-Driven Research Program Data-Driven Research Program Linked Longitudinal Retrospective

Introduction CSCE CSCE Sometimes a single classifier (e.g., neural - PDF document

Introduction CSCE CSCE Sometimes a single classifier (e.g., neural network, 478/878 478/878 Lecture 7: Lecture 7: decision tree) wont perform well, but a weighted CSCE 478/878 Lecture 7: Bagging and Bagging and Boosting Boosting

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design &amp; Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Data Mining with Weka Class 4 Lesson 1 Classification boundaries Ian H. Witten Department of

NANASP Welcome NANASP is very pleased to have this follow-up webinar to our conference session

Telemedicine Service for Care Homes What is the service? Telemedicine is a service allowing

Collaborating with Health Care Workers September 14, 2020 Jessica Abraham, PharmD, APh Director

WEL ELCOME OME All Program Directors &amp; FM Site Directors Meeting Fr Frid iday ay, , Ma

1 Image Classification BVM 2018 Tutorial: Advanced Deep Learning Methods Jakob Wasserthal,

Big Data in Pharmaceutical Industry (Novartis) Jean-Michel Gaullier, PhD Sollentuna, August 3 rd

Data-Driven Research Program Data-Driven Research Program Linked Longitudinal Retrospective

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

WEL ELCOME OME All Program Directors & FM Site Directors Meeting Fr Frid iday ay, , Ma