Theory and Applications of Boosting Yoav Freund UCSD 2 1 0 2 - PowerPoint PPT Presentation

Theory and Applications of Boosting Yoav Freund UCSD 2 1 0 2 l o o h c S r e m m Many slides from Rob Schapire u S z u r C a t n a S Monday, July 16, 2012

0 1 2 m e r S c h o o l 2 a C r u z S u m Monday, July 16, 2012 S a n t

Plan • • • Day 2: Applications Day 3: Advanced Topics Day 1: Basics • • • ADTrees Boosting and repeated Boosting, matrix games • • JBoost Adaboost, • Boosting and Loss • • Viola and Jones minimization. Margins theory. • • • Active Learning and Drifting games and Boost Confidence-rated Pedestrian Detection By Majority. boosting • • Genome Wide association Brownboost and Boosting studies with High Noise. • 2 Online boosting and 1 0 2 tracking. l o o h c S r e m m u S z u r C a t n a S Monday, July 16, 2012

Example: “How May I Help You?” Example: “How May I Help You?” Example: “How May I Help You?” Example: “How May I Help You?” Example: “How May I Help You?” [Gorin et al.] • goal: automatically categorize type of call requested by phone customer ( Collect, CallingCard, PersonToPerson, etc. ) • yes I’d like to place a collect call long distance (Collect) please • operator I need to make a call but I need to bill (ThirdNumber) it to my office • yes I’d like to place a call on my master card (CallingCard) please • I just called a number in sioux city and I musta rang the wrong number because I got the wrong party and I would like to have that taken off of (BillingCredit) my bill 2 1 0 2 • observation: l o o h • easy to find “rules of thumb” that are “often” correct c S r e m • e.g.: “IF ‘ card ’ occurs in utterance m u S THEN predict ‘ CallingCard ’ ” z u r C • hard to find single highly accurate prediction rule a t n a S Monday, July 16, 2012

The Boosting Approach The Boosting Approach The Boosting Approach The Boosting Approach The Boosting Approach • devise computer program for deriving rough rules of thumb • apply procedure to subset of examples • obtain rule of thumb • apply to 2nd subset of examples • obtain 2nd rule of thumb • repeat T times 2 1 0 2 l o o h c S r e m m u S z u r C a t n a S Monday, July 16, 2012

Key Details Key Details Key Details Key Details Key Details • how to choose examples on each round? • concentrate on “hardest” examples (those most often misclassified by previous rules of thumb) • how to combine rules of thumb into single prediction rule? • take (weighted) majority vote of rules of thumb 2 1 0 2 l o o h c S r e m m u S z u r C a t n a S Monday, July 16, 2012

Boosting Boosting Boosting Boosting Boosting • boosting = general method of converting rough rules of thumb into highly accurate prediction rule • technically: • assume given “weak” learning algorithm that can consistently find classifiers (“rules of thumb”) at least slightly better than random, say, accuracy ≥ 55% (in two-class setting) [ “weak learning assumption” ] • given su ffi cient data, a boosting algorithm can provably construct single classifier with very high accuracy, say, 2 1 0 2 99% l o o h c S r e m m u S z u r C a t n a S Monday, July 16, 2012

Some History • How it all began ... 2 1 0 2 l o o h c S r e m m u S z u r C a t n 8 a S Monday, July 16, 2012

Strong and Weak Learnability Strong and Weak Learnability Strong and Weak Learnability Strong and Weak Learnability Strong and Weak Learnability • boosting’s roots are in “PAC” learning model [Valiant ’84] • get random examples from unknown, arbitrary distribution • strong PAC learning algorithm: • for any distribution with high probability given polynomially many examples (and polynomial time) can find classifier with arbitrarily small generalization error • weak PAC learning algorithm 2 • same, but generalization error only needs to be slightly 1 0 2 better than random guessing ( 1 l 2 − γ ) o o h c S • [Kearns & Valiant ’88] : r e m m • does weak learnability imply strong learnability? u S z u r C a t n a S Monday, July 16, 2012

If Boosting Possible, Then... If Boosting Possible, Then... If Boosting Possible, Then... If Boosting Possible, Then... If Boosting Possible, Then... • can use (fairly) wild guesses to produce highly accurate predictions • if can learn “part way” then can learn “all the way” • should be able to improve any learning algorithm • for any learning problem: • either can always learn with nearly perfect accuracy • or there exist cases where cannot learn even slightly better than random guessing 2 1 0 2 l o o h c S r e m m u S z u r C a t n a S Monday, July 16, 2012

First Boosting Algorithms First Boosting Algorithms First Boosting Algorithms First Boosting Algorithms First Boosting Algorithms • [Schapire ’89] : • first provable boosting algorithm • [Freund ’90] : • “optimal” algorithm that “boosts by majority” • [Drucker, Schapire & Simard ’92] : • first experiments using boosting • limited by practical drawbacks • [Freund & Schapire ’95] : • introduced “AdaBoost” algorithm 2 1 0 2 • strong practical advantages over previous boosting l o o h algorithms c S r e m m u S z u r C a t n a S Monday, July 16, 2012

Basic Algorithm and Core Theory Basic Algorithm and Core Theory Basic Algorithm and Core Theory Basic Algorithm and Core Theory Basic Algorithm and Core Theory • introduction to AdaBoost • analysis of training error • analysis of test error and the margins theory 2 1 • experiments and applications 0 2 l o o h c S r e m m u S z u r C a t n a S Monday, July 16, 2012

A Formal Description of Boosting A Formal Description of Boosting A Formal Description of Boosting A Formal Description of Boosting A Formal Description of Boosting • given training set ( x 1 , y 1 ) , . . . , ( x m , y m ) • y i ∈ { − 1 , +1 } correct label of instance x i ∈ X • for t = 1 , . . . , T : • construct distribution D t on { 1 , . . . , m } • find weak classifier (“rule of thumb”) h t : X → { − 1 , +1 } with small error � t on D t : 2 1 0 � t = Pr i ∼ D t [ h t ( x i ) � = y i ] 2 l o o h • output final classifier H final c S r e m m u S z u r C a t n a S Monday, July 16, 2012

AdaBoost AdaBoost AdaBoost AdaBoost AdaBoost [ Freund & Schapire 96] [with Freund] • constructing D t : • D 1 ( i ) = 1 / m • given D t and h t : � e − α t D t ( i ) if y i = h t ( x i ) D t +1 ( i ) = × if y i � = h t ( x i ) e α t Z t D t ( i ) = exp( − α t y i h t ( x i )) Z t where Z t = normalization factor 2 1 � 1 − � t � 0 2 α t = 1 2 ln > 0 l o o � t h c S • final classifier: r e m �� m u S • H final ( x ) = sign α t h t ( x ) z u r C t a t n a S Monday, July 16, 2012

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 2 1 0 2 l o o h c S r e m m weak classifiers = vertical or horizontal half-planes u S z u r C a t n a S Monday, July 16, 2012

Round 1 Round 1 Round 1 Round 1 Round 1 h 1 D 2 " 1 =0.30 ! =0.42 1 2 1 0 2 l o o h c S r e m m u S z u r C a t n a S Monday, July 16, 2012

Round 2 Round 2 Round 2 Round 2 Round 2 h 2 D 3 " 2 =0.21 ! =0.65 2 2 1 0 2 l o o h c S r e m m u S z u r C a t n a S Monday, July 16, 2012

Round 3 Round 3 Round 3 Round 3 Round 3 h 3 " 3 =0.14 ! 3=0.92 2 1 0 2 l o o h c S r e m m u S z u r C a t n a S Monday, July 16, 2012

Final Classifier Final Classifier Final Classifier Final Classifier Final Classifier H = sign 0.42 + 0.65 + 0.92 final = 2 1 0 2 l o o h c S r e m m u S z u r C a t n a S Monday, July 16, 2012

2 1 0 2 l o o h c S r e m m u S http://cseweb.ucsd.edu/~yfreund/adaboost/index.html z u r C a t n a S Monday, July 16, 2012

Basic Algorithm and Core Theory Basic Algorithm and Core Theory Basic Algorithm and Core Theory Basic Algorithm and Core Theory Basic Algorithm and Core Theory • introduction to AdaBoost • analysis of training error • analysis of test error and the margins theory 2 1 • experiments and applications 0 2 l o o h c S r e m m u S z u r C a t n a S Monday, July 16, 2012

Analyzing the Training Error Analyzing the Training Error Analyzing the Training Error Analyzing the Training Error Analyzing the Training Error [ Freund & Schapire 96] [with Freund] • Theorem: • write � t as 1 [ γ t = “edge” ] 2 − γ t • then � � � � training error ( H final ) 2 � t (1 − � t ) ≤ t � � 1 − 4 γ 2 = t t � � � γ 2 exp − 2 ≤ t t 2 1 0 2 • so: if ∀ t : γ t ≥ γ > 0 l o o h then training error ( H final ) ≤ e − 2 γ 2 T c S r e m • AdaBoost is adaptive: m u S • does not need to know γ or T a priori z u r C • can exploit γ t � γ a t n a S Monday, July 16, 2012

Theory and Applications of Boosting Yoav Freund UCSD 2 1 0 2 - PowerPoint PPT Presentation

Theory and Applications of Boosting Yoav Freund UCSD 2 1 0 2 l o o h c S r e m m Many slides from Rob Schapire u S z u r C a t n a S Monday, July 16, 2012 0 1 2 m e r S c h o o l 2 a C r u z S u m

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

Boosting Methods: Implicit Combinatorial Optimization via First-Order Convex Optimization Robert

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

The Boosting Approach to Machine Learning Maria-Florina Balcan 03/16/2015 Boosting General

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten

STK-IN4300 Statistical Learning Methods in Data Science Likelihood-based Boosting introduction

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Boosting, Min-Norm Interpolated Classifiers, and Overparametrization: a precise asymptotic theory

Vadim Lozin DIMAP Center for Discrete Mathematics and its Applications Mathematics Institute

Collaboration among Data Scientists, Statisticians, , and Domain Experts Interface 2015 :

PUMPS FOR ABUSIVE APPLICATIONS Features & Benefits Casing with SPRAY HOLES to agitate settled

Practical Considerations in Atom Probe Tip Making Using FIB-SEM Nicholas Antoniou & Andrew

The critical Z -invariant Ising model via dimers B eatrice de Tili` ere University of

Unsupervised learning of natural language morphology John Goldsmith March 1 , 2010

Number Theory MS-E1110 (5 cr) Course Presentation Lecturer: Camilla Hollanti TA: Taoufiq Damir

DUNE Cold Cable Status J. Kierstead Brookhaven National Laboratory Cold Cable DUNE

Some Available RPKI Tools Benno Overeinder Carlos Martinez Cagnazzo SIDR IETF87 @Berlin 1

Theory and Applications of Boosting Yoav Freund UCSD 2 1 0 2 - PowerPoint PPT Presentation

Theory and Applications of Boosting Yoav Freund UCSD 2 1 0 2 l o o h c S r e m m Many slides from Rob Schapire u S z u r C a t n a S Monday, July 16, 2012 0 1 2 m e r S c h o o l 2 a C r u z S u m

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

Boosting Methods: Implicit Combinatorial Optimization via First-Order Convex Optimization Robert

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

The Boosting Approach to Machine Learning Maria-Florina Balcan 03/16/2015 Boosting General

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib &amp; Torsten

STK-IN4300 Statistical Learning Methods in Data Science Likelihood-based Boosting introduction

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Boosting, Min-Norm Interpolated Classifiers, and Overparametrization: a precise asymptotic theory

Vadim Lozin DIMAP Center for Discrete Mathematics and its Applications Mathematics Institute

Collaboration among Data Scientists, Statisticians, , and Domain Experts Interface 2015 :

PUMPS FOR ABUSIVE APPLICATIONS Features &amp; Benefits Casing with SPRAY HOLES to agitate settled

Practical Considerations in Atom Probe Tip Making Using FIB-SEM Nicholas Antoniou &amp; Andrew

The critical Z -invariant Ising model via dimers B eatrice de Tili` ere University of

Unsupervised learning of natural language morphology John Goldsmith March 1 , 2010

Number Theory MS-E1110 (5 cr) Course Presentation Lecturer: Camilla Hollanti TA: Taoufiq Damir

DUNE Cold Cable Status J. Kierstead Brookhaven National Laboratory Cold Cable DUNE

Some Available RPKI Tools Benno Overeinder Carlos Martinez Cagnazzo SIDR IETF87 @Berlin 1

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten

PUMPS FOR ABUSIVE APPLICATIONS Features & Benefits Casing with SPRAY HOLES to agitate settled

Practical Considerations in Atom Probe Tip Making Using FIB-SEM Nicholas Antoniou & Andrew