Revisiting the Area under the ROC Berry de Bruijn Institute for - - PowerPoint PPT Presentation

▶

Sep 16, 2022 188 likes •328 views

Revisiting the Area under the ROC Berry de Bruijn Institute for Information Technology National Research Council, Canada Personalize with title, slogan or I/B/P name in master slide Purpose Take a look at the Area under the ROC curve from a

SLIDE 1

Personalize with title, slogan or I/B/P name in master slide

Revisiting the Area under the ROC

Berry de Bruijn

Institute for Information Technology National Research Council, Canada

SLIDE 2

Revisiting the Area Under the ROC

Purpose

Take a look at the Area under the ROC curve from

a different perspective…

give it an additional interpretation…
which might lead to options for extending the AUC.

So, an old story with a new twist…

SLIDE 3

Revisiting the Area Under the ROC

Introduction: Tests and Classifiers

30-second tutorial

SLIDE 4

Revisiting the Area Under the ROC

Introduction: Tests and Classifiers

Fresh or not fresh!?!?…. Sniff test !!

SLIDE 5

Revisiting the Area Under the ROC

Introduction: Classifiers

Sniff test.. all subjects sniffed & scored

0.999 0.801

0.722

0.879 0.544 0.666

0.8

0.305

… then rank ordered

SLIDE 6

Revisiting the Area Under the ROC

Introduction: Classifiers

All subjects - rank order by score, then apply a threshold

0.879 0.999 0.801 0.722

0.6



Fresh Not Fresh

Eaten

TP FP

Not Eaten

FN TN

Sensitivity = fresh shrimps eaten / all fresh shrimps; Specificity = non-fresh shrimps not-eaten / all non-fresh shrimps;

SLIDE 7

Revisiting the Area Under the ROC

Introduction: Classifiers

All sensitivity/specificity pairs form the ROC curve

SLIDE 8

Revisiting the Area Under the ROC

Introduction: Classifiers

   AUC   

All sensitivity/specificity pairs form the ROC curve AUC = 0.9332 à One metric about the performance of the classifier or test..

SLIDE 9

Revisiting the Area Under the ROC

The new part….

Our classifier can be modeled with a stochastic process: model - sampling, without replacement, from a biased urn with marbles î marbles do not have equal chance to be drawn distribution: Fisher Non-Central Hypergeometric Distribution. TP = f(k, Pos, Neg, bias).

SLIDE 10

Revisiting the Area Under the ROC

Statistical modeling

‘cond.-vs.-poss.’ data:

Observed:

1054 cases

171 positives 883 negatives

AUC = 0.9332

Fisher NCHypG distr. curve

TP = f(Pos, Neg, k, bias)

= [0 .. 1054]

Pos = 171,
Neg = 883,
bias = 0.9332*

SLIDE 11

Revisiting the Area Under the ROC

Statistical modeling

See the paper for actual and synthesized ROCs from other data sets.

SLIDE 12

Revisiting the Area Under the ROC

Conclusions

AUC + non-central hypergeometric distribution = new? interpretation of AUC, stronger theoretical support. Additional statistical properties can be useful for comparing classifiers on the same data set Opens door to extensions for multi-class classification and non-uniform populations. Tusen takk - Thank you

SLIDE 13

Revisiting the Area Under the ROC

Revisiting the Area under the ROC Berry de Bruijn Institute for - - PowerPoint PPT Presentation

Revisiting the Area under the ROC

Berry de Bruijn

Purpose

a different perspective…

So, an old story with a new twist…

Introduction: Tests and Classifiers

30-second tutorial

Introduction: Tests and Classifiers

Fresh or not fresh!?!?…. Sniff test !!

Introduction: Classifiers

Sniff test.. all subjects sniffed & scored

… then rank ordered

Introduction: Classifiers

All subjects - rank order by score, then apply a threshold



TP FP

FN TN

Introduction: Classifiers

All sensitivity/specificity pairs form the ROC curve

Introduction: Classifiers

   AUC   

All sensitivity/specificity pairs form the ROC curve AUC = 0.9332 à One metric about the performance of the classifier or test..

The new part….

Our classifier can be modeled with a stochastic process: model - sampling, without replacement, from a biased urn with marbles î marbles do not have equal chance to be drawn distribution: Fisher Non-Central Hypergeometric Distribution. TP = f(k, Pos, Neg, bias).

Statistical modeling

‘cond.-vs.-poss.’ data:

Fisher NCHypG distr. curve

Statistical modeling

Conclusions

Bonus features…

‘binormal’ approximation