Revisiting the Area under the ROC Berry de Bruijn Institute for - - PowerPoint PPT Presentation

revisiting the area
SMART_READER_LITE
LIVE PREVIEW

Revisiting the Area under the ROC Berry de Bruijn Institute for - - PowerPoint PPT Presentation

Revisiting the Area under the ROC Berry de Bruijn Institute for Information Technology National Research Council, Canada Personalize with title, slogan or I/B/P name in master slide Purpose Take a look at the Area under the ROC curve from a


slide-1
SLIDE 1

Personalize with title, slogan or I/B/P name in master slide

Revisiting the Area under the ROC

Berry de Bruijn

Institute for Information Technology National Research Council, Canada

slide-2
SLIDE 2

Revisiting the Area Under the ROC

Purpose

  • Take a look at the Area under the ROC curve from

a different perspective…

  • give it an additional interpretation…
  • which might lead to options for extending the AUC.

So, an old story with a new twist…

slide-3
SLIDE 3

Revisiting the Area Under the ROC

Introduction: Tests and Classifiers

30-second tutorial

slide-4
SLIDE 4

Revisiting the Area Under the ROC

Introduction: Tests and Classifiers

Fresh or not fresh!?!?…. Sniff test !!

slide-5
SLIDE 5

Revisiting the Area Under the ROC

Introduction: Classifiers

Sniff test.. all subjects sniffed & scored

0.999 0.801

0.722

0.879 0.544 0.666

0.8

0.305

… then rank ordered

slide-6
SLIDE 6

Revisiting the Area Under the ROC

Introduction: Classifiers

All subjects - rank order by score, then apply a threshold

0.879 0.999 0.801 0.722

0.6

Fresh Not Fresh

Eaten

TP FP

Not Eaten

FN TN

Sensitivity = fresh shrimps eaten / all fresh shrimps; Specificity = non-fresh shrimps not-eaten / all non-fresh shrimps;

slide-7
SLIDE 7

Revisiting the Area Under the ROC

Introduction: Classifiers

All sensitivity/specificity pairs form the ROC curve

slide-8
SLIDE 8

Revisiting the Area Under the ROC

Introduction: Classifiers

   AUC   

All sensitivity/specificity pairs form the ROC curve AUC = 0.9332 à One metric about the performance of the classifier or test..

slide-9
SLIDE 9

Revisiting the Area Under the ROC

The new part….

Our classifier can be modeled with a stochastic process: model - sampling, without replacement, from a biased urn with marbles î marbles do not have equal chance to be drawn distribution: Fisher Non-Central Hypergeometric Distribution. TP = f(k, Pos, Neg, bias).

slide-10
SLIDE 10

Revisiting the Area Under the ROC

Statistical modeling

‘cond.-vs.-poss.’ data:

Observed:

  • 1054 cases

171 positives 883 negatives

  • AUC = 0.9332

Fisher NCHypG distr. curve

TP = f(Pos, Neg, k, bias)

  • k

= [0 .. 1054]

  • Pos = 171,
  • Neg = 883,
  • bias = 0.9332*
slide-11
SLIDE 11

Revisiting the Area Under the ROC

Statistical modeling

See the paper for actual and synthesized ROCs from other data sets.

slide-12
SLIDE 12

Revisiting the Area Under the ROC

Conclusions

AUC + non-central hypergeometric distribution = new? interpretation of AUC, stronger theoretical support. Additional statistical properties can be useful for comparing classifiers on the same data set Opens door to extensions for multi-class classification and non-uniform populations. Tusen takk - Thank you

slide-13
SLIDE 13

Revisiting the Area Under the ROC

Bonus features…

‘binormal’ approximation