Boosting: Foundations and Algorithms Boosting: Foundations and - PowerPoint PPT Presentation

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Rob Schapire

Example: Spam Filtering Example: Spam Filtering Example: Spam Filtering Example: Spam Filtering Example: Spam Filtering • problem: filter out spam (junk email) • gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you review a paper... non-spam From: xa412@hotmail.com Earn money without working!!!! ... spam . . . . . . . . . • goal: have computer learn from examples to distinguish spam from non-spam

Machine Learning Machine Learning Machine Learning Machine Learning Machine Learning • studies how to automatically learn to make accurate predictions based on past observations • classification problems: • classify examples into given set of categories new example labeled classification machine learning training rule algorithm examples predicted classification

Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems • text categorization (e.g., spam filtering) • fraud detection • machine vision (e.g., face detection) • natural-language processing (e.g., spoken language understanding) • market segmentation (e.g.: predict if customer will respond to promotion) • bioinformatics (e.g., classify proteins according to their function) . . .

Back to Spam Back to Spam Back to Spam Back to Spam Back to Spam • main observation: • easy to find “rules of thumb” that are “often” correct • If ‘ viagra ’ occurs in message, then predict ‘ spam ’ • hard to find single rule that is very highly accurate

The Boosting Approach The Boosting Approach The Boosting Approach The Boosting Approach The Boosting Approach • devise computer program for deriving rough rules of thumb • apply procedure to subset of examples • obtain rule of thumb • apply to 2nd subset of examples • obtain 2nd rule of thumb • repeat T times

Key Details Key Details Key Details Key Details Key Details • how to choose examples on each round? • concentrate on “hardest” examples (those most often misclassified by previous rules of thumb) • how to combine rules of thumb into single prediction rule? • take (weighted) majority vote of rules of thumb

Boosting Boosting Boosting Boosting Boosting • boosting = general method of converting rough rules of thumb into highly accurate prediction rule • technically: • assume given “weak” learning algorithm that can consistently find classifiers (“rules of thumb”) at least slightly better than random, say, accuracy ≥ 55% (in two-class setting) [ “weak learning assumption” ] • given sufficient data, a boosting algorithm can provably construct single classifier with very high accuracy, say, 99%

Early History Early History Early History Early History Early History • [Valiant ’84] : • introduced theoretical (“PAC”) model for studying machine learning • [Kearns & Valiant ’88] : • open problem of finding a boosting algorithm • if boosting possible, then... • can use (fairly) wild guesses to produce highly accurate predictions • if can learn “part way” then can learn “all the way” • should be able to improve any learning algorithm • for any learning problem: • either can always learn with nearly perfect accuracy • or there exist cases where cannot learn even slightly better than random guessing

First Boosting Algorithms First Boosting Algorithms First Boosting Algorithms First Boosting Algorithms First Boosting Algorithms • [Schapire ’89] : • first provable boosting algorithm • [Freund ’90] : • “optimal” algorithm that “boosts by majority” • [Drucker, Schapire & Simard ’92] : • first experiments using boosting • limited by practical drawbacks • [Freund & Schapire ’95] : • introduced “AdaBoost” algorithm • strong practical advantages over previous boosting algorithms

A Formal Description of Boosting A Formal Description of Boosting A Formal Description of Boosting A Formal Description of Boosting A Formal Description of Boosting • given training set ( x 1 , y 1 ) , . . . , ( x m , y m ) • y i ∈ {− 1 , +1 } correct label of instance x i ∈ X • for t = 1 , . . . , T : • construct distribution D t on { 1 , . . . , m } • find weak classifier (“rule of thumb”) h t : X → {− 1 , +1 } with small error ǫ t on D t : ǫ t = Pr i ∼ D t [ h t ( x i ) � = y i ] • output final classifier H final

AdaBoost AdaBoost AdaBoost AdaBoost AdaBoost [with Freund] • constructing D t : • D 1 ( i ) = 1 / m • given D t and h t : � e − α t D t ( i ) if y i = h t ( x i ) D t +1 ( i ) = × e α t if y i � = h t ( x i ) Z t D t ( i ) = exp( − α t y i h t ( x i )) Z t where Z t = normalization factor � 1 − ǫ t � α t = 1 2 ln > 0 ǫ t • final classifier: �� • H final ( x ) = sign α t h t ( x ) t

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 weak classifiers = vertical or horizontal half-planes

Round 1 Round 1 Round 1 Round 1 Round 1 h 1 D 2 �� ε 1 =0.30 α =0.42 1

Boosting: Foundations and Algorithms Boosting: Foundations and - PowerPoint PPT Presentation

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering Example: Spam

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

recap to this point foundations foundations foundations foundations genetics =

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

Boosting Methods: Implicit Combinatorial Optimization via First-Order Convex Optimization Robert

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

The Boosting Approach to Machine Learning Maria-Florina Balcan 03/16/2015 Boosting General

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten

STK-IN4300 Statistical Learning Methods in Data Science Likelihood-based Boosting introduction

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Nonlinear Spectroscopy and Bilinear Control Theory Using light to estimate stuff NE 155, Spring

Raise your hand in Zoom Click on Participants Your hand is raised Click hand to lower it

MMI 2: Mobile Human- Computer Interaction Small and Large Display Interaction Prof. Dr. Michael

A Few Bad Votes Too Many? Towards Robust Ranking in Social Media Jiang Bian Georgia Tech

MONGODB A NoSQL , documen t -oriente d databas e DATABASES organized collections of data Databas

NONPROFIT RESERVES & ENDOWMENTS: MANAGING THEM WISELY Please join the conversation! Use the

Java Java Basics Java Program Statements Java Review Conditional statements

Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2

Boosting: Foundations and Algorithms Boosting: Foundations and - PowerPoint PPT Presentation

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering Example: Spam

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

recap to this point foundations foundations foundations foundations genetics =

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

Boosting Methods: Implicit Combinatorial Optimization via First-Order Convex Optimization Robert

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

The Boosting Approach to Machine Learning Maria-Florina Balcan 03/16/2015 Boosting General

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib &amp; Torsten

STK-IN4300 Statistical Learning Methods in Data Science Likelihood-based Boosting introduction

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Nonlinear Spectroscopy and Bilinear Control Theory Using light to estimate stuff NE 155, Spring

Raise your hand in Zoom Click on Participants Your hand is raised Click hand to lower it

MMI 2: Mobile Human- Computer Interaction Small and Large Display Interaction Prof. Dr. Michael

A Few Bad Votes Too Many? Towards Robust Ranking in Social Media Jiang Bian Georgia Tech

MONGODB A NoSQL , documen t -oriente d databas e DATABASES organized collections of data Databas

NONPROFIT RESERVES &amp; ENDOWMENTS: MANAGING THEM WISELY Please join the conversation! Use the

Java Java Basics Java Program Statements Java Review Conditional statements

Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten

NONPROFIT RESERVES & ENDOWMENTS: MANAGING THEM WISELY Please join the conversation! Use the