Actual Class Adapted from Fawcett (2003} P N Estimated Class T - - PowerPoint PPT Presentation

actual class
SMART_READER_LITE
LIVE PREVIEW

Actual Class Adapted from Fawcett (2003} P N Estimated Class T - - PowerPoint PPT Presentation

A DA B OOST AND R ANK B OOST Josiah Yoder School of Electrical and Computer Engineering RVL Seminar, 4 Feb 2011 1/31 O VERVIEW ROC & Precision-Recall Curves AdaBoost RankBoost Not really any of my research 2/31 ROC & P RECISION -R


slide-1
SLIDE 1

ADABOOST AND RANKBOOST

Josiah Yoder

School of Electrical and Computer Engineering

RVL Seminar, 4 Feb 2011 1/31

slide-2
SLIDE 2

OVERVIEW

ROC & Precision-Recall Curves AdaBoost RankBoost Not really any of my research 2/31

slide-3
SLIDE 3

ROC & PRECISION-RECALL

ROC – Receiver Operating Characteristics Good with skewed Distribution Good with unequal costs Good for visualizing maximum performance with any given threshold Precision-Recall Curve Good for evaluating ranked search results Consistent for comparing performance 3/31

slide-4
SLIDE 4

PERFORMANCE METRICS

True Positive False Positive True Negitive False Negitive P N Actual Class Estimated Class P N P N

Adapted from Fawcett (2003}

4/31

slide-5
SLIDE 5

THE AXES

ROC Curve

TP FP TN FN P N Actual Class Estimated Class

fp rate = FP / N tp rate = TP / P

‘ P-R (Precision-Recall) Curve

TP FP TN FN P N Actual Class Estimated Class

recall = TP / P precision = TP / (TP + FP)

‘ 5/31

slide-6
SLIDE 6

DISCRETE CLASSIFIERS

ROC Points

fp rate = FP / N tp rate = TP / P

A B E C D

Adapted from Fawcett (2003}

‘ P-R (Precision-Recall) Points

precision recall

A B E C D

Adapted from Fawcett (2003}

‘ (50% P, 50% N) 6/31

slide-7
SLIDE 7

NUMERIC CLASSIFIERS

Output: Numeric value (score, probability, rank . . . ) Thresholding output gives a discrete score — single point on ROC or PR curve “Sweeping” the threshold from ∞ to −∞ traces out a curve There is a more efficient way to do it Note that ROC and PR curves are parametric 7/31

slide-8
SLIDE 8

DISCRETE CLASSIFIERS

ROC Curve

fp tp

Adapted from Fawcett (2003}

‘ P-R (Precision-Recall) Curve

recall precision

‘ (50% P, 50% N) GT 1 1 1 feat 0.9 0.7 0.6 0.5 0.2 8/31

slide-9
SLIDE 9

DISCRETE CLASSIFIERS

ROC Curve

fp tp

Adapted from Fawcett (2003}

‘ P-R (Precision-Recall) Curve

recall precision

‘ (20% P, 80% N) GT 1 1 1 feat 0.9 0.7 0.7 0.7 0.7 0.6 0.5 0.5 0.5 0.5 0.2 9/31

slide-10
SLIDE 10

WHICH IS BETTER?

ROC doesn’t change if ratio P/N changes

Makes it easy to visualize maximum performance when ratio of true to false changes

P-R illustrates search results

Clearly illustrates effect of false positives on performance

10/31

slide-11
SLIDE 11

ADABOOST AND RANKBOOST

AdaBoost Classification ROC RankBoost Ranking P-R 11/31

slide-12
SLIDE 12

ADABOOST

What is Boosting?

Linear combination of weak classifiers with proven bounds on performance A weak classifier is one that gets more than random guessing right.

AdaBoost h(x) — weak binary classifier, h : X → {−1,1} H(x) — strong classifier, H(x) = sign

  • ∑T

t=1 αtht(x)

  • 12/31
slide-13
SLIDE 13

NONLINEAR SEPARATION

Combination of weak classifiers is linear, but weak classifiers are non-linear Example

x =

  • x1

x2

  • H(x) = sign(0.5[[x1 >

1]]+0.5[[x2 > 2]])

x

x

2 1

13/31

slide-14
SLIDE 14

HOW TO PICK WEAK CLASSIFIERS AND THEIR WEIGHTS

Greedy: Pick the best weak classifier repeatedly Focus on the points misclassified by the previous classifier

(Not on overall performance)

Following frames from http://cseweb.ucsd.edu/~yfreund/adaboost/index.html 14/31

slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

HOW TO REWEIGHT THE TRAINING DATA

How to reweight the training data

D1 = 1/m where m is the number of training points. Dt+1(i) = Dt(i)·♠

Zt

What to use for ♠? (dropping t subscript for a while. . . )

Idea #1: ♠ = [[yi = h(xi)]]

Problem: correctly classified data is totally forgotten

Idea #2: ♠ =

  • a

yi = h(xi) 1/a yi = h(xi)

Tada! This is what AdaBoost does!

20/31

slide-21
SLIDE 21

ADABOOST’S ♠

Idea #2: ♠ =

  • a

yi = h(xi) 1/a yi = h(xi) AdaBoost: ♠ =

yi = h(xi) e−α yi = h(xi)

  • r equivalently, ♠ = exp[−αyih(xi)]

What is α? How do you choose it? We will come back to

  • this. . .

We shall see that the classifier has nice properties no matter how we choose α

21/31

slide-22
SLIDE 22

THE BASIC ADABOOST ALGORITHM

(t subscripts are back) D1 = 1/m for i = 1,...,T

Choose ht(xi) based on data weighted by Dt Reweight Dt+1(i) = Dt(i)·exp[αtyiht(xi)] Zt

Zt = ∑

i

Dt(i)·exp[−αtyiht(xi)] H(x) = sign ∑αtht(xi)

  • f(x) = ∑αtht(xi)

22/31

slide-23
SLIDE 23

WHY CAN WE JUST USE α AS THE WEIGHT?

It allows a nice bound on the error. 1 m ∑

i

[[H(xi) = yi]] ≤ 1 m ∑

i

exp[−yif(xi)] [[H(xi) = yi]] ≤ exp[−yif(xi)] [[f(xi)yi ≤ 0]] ≤ exp[−yif(xi)] 1 m ∑

i

[[H(xi) = yi]] ≤ 1 m ∑

i

exp[−yif(xi)] = ∏

t

Zt That looks pretty. But what does it mean?

Zt is the normalizing constant for the weights on each point Roughly, Zt ≈ ∑i Dt(i)[[yi = ht(xi)]], the cost of misclassifying the weighted points in the tth round.

23/31

slide-24
SLIDE 24

THAT’S A PRETTY NICE BOUND.

How do we know it’s true? D4(i) =

D1exp[α1yih1(xi)] Z1

exp[α2yih2(xi)] Z2

exp[α3yih3(xi)] Z3 DT+1(i) = D1 ∏t exp[−αtyiht(xi)] ∏t Zt =

1 m exp[∑t −αtyiht(xi)]

∏t Zt =

1 mexp[yif(xi)]

∏t Zt 1 = ∑

i

DT+1(i) = 1 m ∑

i

exp[yif(xi)]/∏

t

Zt

t

Zt = 1 m ∑

i

exp[yif(xi)] so errtrain ≤ ∏

t

Zt 24/31

slide-25
SLIDE 25

WE CAN ALMOST IMPLEMENT THIS. NOW WHAT ABOUT α?

Choose αt to minimize Zt ∂ ∂α Zt = ... αt = 2ln

  • 1−εt

εt

  • where εt is the weighted error of the classifier

at the tth stage And to compute the weak learners, create an ROC curve using each dimension of the data alone. Take the best point on the ROC curve and use that as your classifier. This completes the algorithm. Thinking back to ROC curves:

The goal of AdaBoost is to minimize the classifier error. Same thing as trying to make the best performance of the ROC curve as close to the top left corner as possible. How can we maximize the area under the P-R curve? (MAP — mean average precision)

25/31

slide-26
SLIDE 26

RANKBOOST

Goal is to find ordering, not “quality” of each point. Does not attempt to directly maximize MAP, but is successful at doing this anyhow Error is defined in terms of the number of data pairs which are

  • ut of order

D(x0,x1) = c if x0 should be ranked below x1, D(x0,x1) = 0

  • therwise.

∑x0,x1 D(x0,x1) = 1 More complicated D can be use to emphasize really important pairs. 26/31

slide-27
SLIDE 27

EXAMPLE

e.g. 1. True rank should be rank(x1) < rank(x2) < rank(x3). D x1 x2 x3 x1 1/3 1/3 x2 1/3 x3 e.g. 2. True rank should be rank(x1) < rank(x2) = rank(x3). D x1 x2 x3 x1 1/3 1/3 x2 x3 27/31

slide-28
SLIDE 28

BASIC RANKBOOST ALGORITHM

Given: Rank matrix D with true ranks of the training data D1=D For t = 1,...,T

Train ht : X → {−1,1} Choose αt (more suspense...) Update

Dt+1(x0,x1) = Dt(x0,x1)exp[αt(ht(x0)−ht(x1))] Zt Final ranking: H(x) = ∑t αtht 28/31

slide-29
SLIDE 29

BOUNDS AND α

As before, errtrain ≤ ∏t Zt Selection of α is different. Now α = 1

2 ln

1+r

1−r

  • where

r = Wi −W+ = ∑D(x0,x1)(h(x1)−h(x0)) and W+ is the weight

  • f the pairs for which h(x0) > h(x1) and W− is the weights of

the pairs for which h(x0) < h(x1). Selection of weak classifier also needs to be redone. Turns out we need to maximize r, which can be written as r = ∑

x

h(x)s(x)v(x)

x′∈s(x)=s(x′)

v(x′) where v(x) [SHOULD BE] given above and s(x) =

  • +1

x ∈ X0 −1 x ∈ X1 29/31

slide-30
SLIDE 30

SORRY!

I ran out of time! 30/31

slide-31
SLIDE 31

ACKNOWLEDGEMENTS

I wish to thank my advisor, Prof. Avi Kak for the helpful

  • bservation that ROC curves and PR curves are parametric.

His presentation on retrieval in the summer of 2011 also presents an entertaining comparison of PR-curves for random and ideal retrieval. I also wish to thank my lab-members for pointing out errors in this presentation that have hopefully all been corrected in this version.

Note that we covered RankBoost in 5 or 10 minutes so errors probably remain on slides 11 through 30.

31/31