Actual Class Adapted from Fawcett (2003} P N Estimated Class T - PowerPoint PPT Presentation

A DA B OOST AND R ANK B OOST Josiah Yoder School of Electrical and Computer Engineering RVL Seminar, 4 Feb 2011 1/31

O VERVIEW ROC & Precision-Recall Curves AdaBoost RankBoost Not really any of my research 2/31

ROC & P RECISION -R ECALL ROC – Receiver Operating Precision-Recall Curve Characteristics Good for evaluating Good with skewed ranked search results Distribution Consistent for comparing Good with unequal costs performance Good for visualizing maximum performance with any given threshold 3/31

P ERFORMANCE M ETRICS Actual Class Adapted from Fawcett (2003} P N Estimated Class T rue F alse P P ositive P ositive F alse T rue N N egitive N egitive 4/31 P N

T HE A XES ROC Curve P-R (Precision-Recall) Curve Actual Class Actual Class P N P N Estimated Class Estimated Class TP FP TP FP FN TN FN TN precision = tp rate TP / (TP + FP) = TP / P fp rate = FP / N ‘ ‘ recall = TP / P 5/31

D ISCRETE CLASSIFIERS ROC Points P-R (Precision-Recall) Points Adapted from Fawcett (2003} Adapted from Fawcett (2003} D D B A B C A tp rate recall C = E E TP / P fp rate = FP / N precision ‘ ‘ (50% P, 50% N) 6/31

N UMERIC C LASSIFIERS Output: Numeric value (score, probability, rank . . . ) Thresholding output gives a discrete score — single point on ROC or PR curve “Sweeping” the threshold from ∞ to − ∞ traces out a curve There is a more efficient way to do it Note that ROC and PR curves are parametric 7/31

D ISCRETE CLASSIFIERS ROC Curve P-R (Precision-Recall) Curve Adapted from Fawcett (2003} precision tp recall ‘ fp ‘ (50% P, 50% N) GT 1 0 1 0 1 feat 0.9 0.7 0.6 0.5 0.2 8/31

D ISCRETE CLASSIFIERS ROC Curve P-R (Precision-Recall) Curve Adapted from Fawcett (2003} precision tp ‘ recall fp ‘ (20% P, 80% N) GT 1 0 0 0 0 1 0 0 0 0 1 feat 0.9 0.7 0.7 0.7 0.7 0.6 0.5 0.5 0.5 0.5 0.2 9/31

W HICH IS BETTER ? ROC doesn’t change if ratio P/N changes Makes it easy to visualize maximum performance when ratio of true to false changes P-R illustrates search results Clearly illustrates effect of false positives on performance 10/31

A DA B OOST AND R ANK B OOST AdaBoost RankBoost Classification Ranking ROC P-R 11/31

A DA B OOST What is Boosting? Linear combination of weak classifiers with proven bounds on performance A weak classifier is one that gets more than random guessing right. AdaBoost h ( x ) — weak binary classifier, h : X → {− 1 , 1 } � ∑ T � t = 1 α t h t ( x ) H ( x ) — strong classifier, H ( x ) = sign 12/31

N ONLINEAR SEPARATION Combination of weak classifiers is linear, but weak classifiers are non-linear Example x 2 � � x 1 x = x x 2 1 H ( x ) = sign ( 0 . 5 [[ x 1 > 1 ]]+ 0 . 5 [[ x 2 > 2 ]] ) 13/31

H OW TO PICK WEAK CLASSIFIERS AND THEIR WEIGHTS Greedy: Pick the best weak classifier repeatedly Focus on the points misclassified by the previous classifier (Not on overall performance) Following frames from http://cseweb.ucsd.edu/~yfreund/adaboost/index.html 14/31

H OW TO REWEIGHT THE TRAINING DATA How to reweight the training data D 1 = 1 / m where m is the number of training points. D t + 1 ( i ) = D t ( i ) ·♠ Z t What to use for ♠ ? (dropping t subscript for a while. . . ) Idea #1: ♠ = [[ y i � = h ( x i )]] Problem: correctly classified data is totally forgotten � y i � = h ( x i ) a Idea #2: ♠ = 1 / a y i = h ( x i ) Tada! This is what AdaBoost does! 20/31

A DABOOST ’ S ♠ � y i � = h ( x i ) a Idea #2: ♠ = 1 / a y i = h ( x i ) � e α y i � = h ( x i ) AdaBoost: ♠ = e − α y i = h ( x i ) or equivalently, ♠ = exp [ − α y i h ( x i )] What is α ? How do you choose it? We will come back to this. . . We shall see that the classifier has nice properties no matter how we choose α 21/31

T HE B ASIC A DA B OOST A LGORITHM ( t subscripts are back) D 1 = 1 / m for i = 1 ,..., T Choose h t ( x i ) based on data weighted by D t Reweight D t + 1 ( i ) = D t ( i ) · exp [ α t y i h t ( x i )] Z t Z t = ∑ D t ( i ) · exp [ − α t y i h t ( x i )] i � ∑ α t h t ( x i ) � H ( x ) = sign f ( x ) = ∑ α t h t ( x i ) 22/31

W HY CAN WE JUST USE α AS THE WEIGHT ? It allows a nice bound on the error. 1 [[ H ( x i ) � = y i ]] ≤ 1 m ∑ m ∑ exp [ − y i f ( x i )] i i [[ H ( x i ) � = y i ]] ≤ exp [ − y i f ( x i )] [[ f ( x i ) y i ≤ 0 ]] ≤ exp [ − y i f ( x i )] 1 [[ H ( x i ) � = y i ]] ≤ 1 exp [ − y i f ( x i )] = ∏ m ∑ m ∑ Z t t i i That looks pretty. But what does it mean? Z t is the normalizing constant for the weights on each point Roughly, Z t ≈ ∑ i D t ( i )[[ y i � = h t ( x i )]] , the cost of misclassifying the weighted points in the t th round. 23/31

T HAT ’ S A PRETTY NICE BOUND . How do we know it’s true? D 1 exp [ α 1 yih 1 ( xi )] exp [ α 2 y i h 2 ( x i )] Z 1 exp [ α 3 y i h 3 ( x i )] Z 2 D 4 ( i ) = Z 3 1 1 m exp [ ∑ t − α t y i h t ( x i )] m exp [ y i f ( x i )] D T + 1 ( i ) = D 1 ∏ t exp [ − α t y i h t ( x i )] = = ∏ t Z t ∏ t Z t ∏ t Z t D T + 1 ( i ) = 1 1 = ∑ m ∑ exp [ y i f ( x i )] / ∏ Z t i i t Z t = 1 err train ≤ ∏ ∏ m ∑ exp [ y i f ( x i )] so Z t t i t 24/31

W E CAN ALMOST IMPLEMENT THIS . N OW WHAT ABOUT α ? Choose α t to minimize Z t ∂ ∂α Z t = ... � � 1 − ε t α t = 2ln where ε t is the weighted error of the classifier ε t at the t th stage And to compute the weak learners, create an ROC curve using each dimension of the data alone. Take the best point on the ROC curve and use that as your classifier. This completes the algorithm. Thinking back to ROC curves: The goal of AdaBoost is to minimize the classifier error. Same thing as trying to make the best performance of the ROC curve as close to the top left corner as possible. How can we maximize the area under the P-R curve? (MAP 25/31 — mean average precision)

R ANK B OOST Goal is to find ordering, not “quality” of each point. Does not attempt to directly maximize MAP, but is successful at doing this anyhow Error is defined in terms of the number of data pairs which are out of order D ( x 0 , x 1 ) = c if x 0 should be ranked below x 1 , D ( x 0 , x 1 ) = 0 otherwise. ∑ x 0 , x 1 D ( x 0 , x 1 ) = 1 More complicated D can be use to emphasize really important pairs. 26/31

E XAMPLE e.g. 1. True rank should be e.g. 2. True rank should be rank ( x 1 ) < rank ( x 2 ) < rank ( x 3 ) . rank ( x 1 ) < rank ( x 2 ) = rank ( x 3 ) . D x 1 x 2 x 3 D x 1 x 2 x 3 0 1/3 1/3 0 1/3 1/3 x 1 x 1 0 0 1/3 0 0 0 x 2 x 2 0 0 0 0 0 0 x 3 x 3 27/31

B ASIC R ANK B OOST A LGORITHM Given: Rank matrix D with true ranks of the training data D 1 = D For t = 1 ,..., T Train h t : X → {− 1 , 1 } Choose α t (more suspense...) Update D t + 1 ( x 0 , x 1 ) = D t ( x 0 , x 1 ) exp [ α t ( h t ( x 0 ) − h t ( x 1 ))] Z t Final ranking: H ( x ) = ∑ t α t h t 28/31

B OUNDS AND α As before, err train ≤ ∏ t Z t � 1 + r Selection of α is different. Now α = 1 � where 2 ln 1 − r r = W i − W + = ∑ D ( x 0 , x 1 )( h ( x 1 ) − h ( x 0 )) and W + is the weight of the pairs for which h ( x 0 ) > h ( x 1 ) and W − is the weights of the pairs for which h ( x 0 ) < h ( x 1 ) . Selection of weak classifier also needs to be redone. Turns out we need to maximize r , which can be written as r = ∑ v ( x ′ ) ∑ h ( x ) s ( x ) v ( x ) x x ′ ∈ s ( x ) � = s ( x ′ ) where v ( x ) [SHOULD BE] given above and � + 1 x ∈ X 0 s ( x ) = − 1 x ∈ X 1 29/31

S ORRY ! I ran out of time! 30/31

A CKNOWLEDGEMENTS I wish to thank my advisor, Prof. Avi Kak for the helpful observation that ROC curves and PR curves are parametric. His presentation on retrieval in the summer of 2011 also presents an entertaining comparison of PR-curves for random and ideal retrieval. I also wish to thank my lab-members for pointing out errors in this presentation that have hopefully all been corrected in this version. Note that we covered RankBoost in 5 or 10 minutes so errors probably remain on slides 11 through 30. 31/31

Actual Class Adapted from Fawcett (2003} P N Estimated Class T - PowerPoint PPT Presentation

A DA B OOST AND R ANK B OOST Josiah Yoder School of Electrical and Computer Engineering RVL Seminar, 4 Feb 2011 1/31 O VERVIEW ROC & Precision-Recall Curves AdaBoost RankBoost Not really any of my research 2/31 ROC & P RECISION -R

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class

Fiscal Year 2020 Budget General Fund Revenue Budget to Actual FY19 FY19 Budget Actual

Clean Code in Small Companies Stock photo, not actual developer Stock photo, not actual developer

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Electing Your Membership Class Class TG, Class TH, or Class DC As a school employee who

TwissOptics Class Joschua Dilly TwissOptics Class 2 The TwissOptics Class Resonance Driving

3/14/16 Review Class/Object Type Class Keyword class class Point

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Self Com pacting Concrete Self Com pacting Concrete Actual status Actual status ( and future

The use The use of of synthesised synthesised or or actual actual wind wind turbine

ROADSHOW PRESENTATION 30.06.2016 2016 half-year highlights FINANCIAL RESULTS CONFIRMING FY

Kane County Sheriff's Office General Fund Revenue & Expenditures Summary Report by

Panel 3 - Making good actual expenses ASA Conference on Shaping Arbitral Proceedings to Deal

YEAR RESULTS ANALYST CALL PRESENTATION 2016 HALF-YEAR HIGHLIGHTS - Financial results confirming

Helium Processing Progress M. Drury SRF Schedule vs. Actual Actual Schedule Zone Baseline

Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s / 't dek p e'tr

Advanced SAT-Techniques for Bounded Model Checking of Blackbox Designs Marc Herbstritt (joint

N 3 PDF Machine Learning PDFs QCD Introduction NISQ era We are in a Noisy

Malicious Overjoining in Multicast Problem and proposed solution draft-jholland-cb-assisted-cc

www.cornwall-insight.com Tim Dixon Alex Wynn HELPING YOU MAKE SENSE OF THE HELPING YOU MAKE

61A Extra Lecture 13 Announcements Prediction Regression Given a set of (x, y) pairs, find a

Phonons II - Thermal Properties (Kittel Ch. 5) Heat Capacity C Approaches classical limit 3 N k

L ine s Surfa c e s 3D o b je c ts Po ints P. J. Be sl a nd N. D. Mc K a y. A me tho d

Actual Class Adapted from Fawcett (2003} P N Estimated Class T - PowerPoint PPT Presentation

A DA B OOST AND R ANK B OOST Josiah Yoder School of Electrical and Computer Engineering RVL Seminar, 4 Feb 2011 1/31 O VERVIEW ROC & Precision-Recall Curves AdaBoost RankBoost Not really any of my research 2/31 ROC & P RECISION -R

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class

Fiscal Year 2020 Budget General Fund Revenue Budget to Actual FY19 FY19 Budget Actual

Clean Code in Small Companies Stock photo, not actual developer Stock photo, not actual developer

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Electing Your Membership Class Class TG, Class TH, or Class DC As a school employee who

TwissOptics Class Joschua Dilly TwissOptics Class 2 The TwissOptics Class Resonance Driving

3/14/16 Review Class/Object Type Class Keyword class class Point

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Self Com pacting Concrete Self Com pacting Concrete Actual status Actual status ( and future

The use The use of of synthesised synthesised or or actual actual wind wind turbine

ROADSHOW PRESENTATION 30.06.2016 2016 half-year highlights FINANCIAL RESULTS CONFIRMING FY

Kane County Sheriff's Office General Fund Revenue &amp; Expenditures Summary Report by

Panel 3 - Making good actual expenses ASA Conference on Shaping Arbitral Proceedings to Deal

YEAR RESULTS ANALYST CALL PRESENTATION 2016 HALF-YEAR HIGHLIGHTS - Financial results confirming

Helium Processing Progress M. Drury SRF Schedule vs. Actual Actual Schedule Zone Baseline

Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s / 't dek p e'tr

Advanced SAT-Techniques for Bounded Model Checking of Blackbox Designs Marc Herbstritt (joint

N 3 PDF Machine Learning PDFs QCD Introduction NISQ era We are in a Noisy

Malicious Overjoining in Multicast Problem and proposed solution draft-jholland-cb-assisted-cc

www.cornwall-insight.com Tim Dixon Alex Wynn HELPING YOU MAKE SENSE OF THE HELPING YOU MAKE

61A Extra Lecture 13 Announcements Prediction Regression Given a set of (x, y) pairs, find a

Phonons II - Thermal Properties (Kittel Ch. 5) Heat Capacity C Approaches classical limit 3 N k

L ine s Surfa c e s 3D o b je c ts Po ints P. J. Be sl a nd N. D. Mc K a y. A me tho d

Kane County Sheriff's Office General Fund Revenue & Expenditures Summary Report by