# of true positives true positive rate = # of known positives - - PowerPoint PPT Presentation

of true positives true positive rate of known positives
SMART_READER_LITE
LIVE PREVIEW

# of true positives true positive rate = # of known positives - - PowerPoint PPT Presentation

True positive rate (Sensitivity) # of true positives true positive rate = # of known positives (Proportion of actual positives that are correctly identified) True negative rate (Specificity) # of true negatives true negative rate = # of known


slide-1
SLIDE 1

True positive rate (Sensitivity)

true positive rate = # of true positives # of known positives

(Proportion of actual positives that are correctly identified)

slide-2
SLIDE 2

True negative rate (Specificity)

true negative rate = # of true negatives # of known negatives

(Proportion of actual negatives that are correctly identified)

slide-3
SLIDE 3

False positive rate (1 – Specificity)

false positive rate = # of false positives # of known negatives

(Proportion of actual negatives that are incorrectly identified)

slide-4
SLIDE 4

Sensitivity and specificity depend on a chosen cutoff

cutoff malignant benign false positives false negatives

slide-5
SLIDE 5

Sensitivity and specificity depend on a chosen cutoff

cutoff malignant benign false negatives false positives

slide-6
SLIDE 6

Do Part 1 of the worksheet now

slide-7
SLIDE 7

We usually plot the true pos. rate vs. the false

  • pos. rate for all possible cutoffs

ROC curve Receiver Operating Characteristic curve

slide-8
SLIDE 8

Image from: http://en.wikipedia.org/wiki/Receiver_operating_characteristic

slide-9
SLIDE 9

The area under the curve tells us how good a model’s predictions are

worst case good perfect

slide-10
SLIDE 10

Let’s look at the performance of several different models for the biopsy data set

slide-11
SLIDE 11

Predictor M1 clump_thickness ✔ normal_nucleoli marg_adhesion bare_nuclei uniform_cell_shape bland_chromatin

slide-12
SLIDE 12

Predictor M1 M2 clump_thickness ✔ ✔ normal_nucleoli ✔ marg_adhesion bare_nuclei uniform_cell_shape bland_chromatin

slide-13
SLIDE 13

Predictor M1 M2 M3 clump_thickness ✔ ✔ ✔ normal_nucleoli ✔ ✔ marg_adhesion ✔ bare_nuclei uniform_cell_shape bland_chromatin

slide-14
SLIDE 14

Predictor M1 M2 M3 M4 clump_thickness ✔ ✔ ✔ ✔ normal_nucleoli ✔ ✔ ✔ marg_adhesion ✔ ✔ bare_nuclei ✔ uniform_cell_shape bland_chromatin

slide-15
SLIDE 15

Predictor M1 M2 M3 M4 M5 clump_thickness ✔ ✔ ✔ ✔ ✔ normal_nucleoli ✔ ✔ ✔ ✔ marg_adhesion ✔ ✔ ✔ bare_nuclei ✔ ✔ uniform_cell_shape ✔ bland_chromatin ✔

slide-16
SLIDE 16

Model Area Under Curve (AUC) M1 0.909 M2 0.968 M3 0.985 M4 0.995 M5 0.996

slide-17
SLIDE 17

Things usually look much worse in real life

Keller, Mis, Jia, Wilke. Genome Biol. Evol. 4:80-88, 2012

Best AUC (solid line): 0.70

slide-18
SLIDE 18

Calculating ROC curves in R

slide-19
SLIDE 19

Using geom_roc() from the plotROC package

slide-20
SLIDE 20

Using geom_roc() from the plotROC package

# fit a logistic regression model glm_out <- glm(outcome ~ clump_thickness, data = biopsy, family = binomial)

slide-21
SLIDE 21

Using geom_roc() from the plotROC package

# fit a logistic regression model glm_out <- glm(outcome ~ clump_thickness, data = biopsy, family = binomial) # prepare data for ROC plotting df <- data.frame(predictor = predict(glm_out, biopsy), known_truth = biopsy$outcome, model = 'M1')

slide-22
SLIDE 22

Using geom_roc() from the plotROC package

# fit a logistic regression model glm_out <- glm(outcome ~ clump_thickness, data = biopsy, family = binomial) # prepare data for ROC plotting df <- data.frame(predictor = predict(glm_out, biopsy), known_truth = biopsy$outcome, model = 'M1') # the aesthetic names are not the most intuitive # `d` (disease) holds the known truth # `m` (marker) holds the predictor values p <- ggplot(df, aes(d = known_truth, m = predictor)) + geom_roc(n.cuts = 0) + coord_fixed() p # make plot

slide-23
SLIDE 23

Calculating the area under the curve (AUC)

# the function calc_auc needs to be called on a plot object # that uses geom_roc(): calc_auc(p) # PANEL group AUC # 1 1 -1 0.908878 # Warning message: # In verify_d(data$d) : # D not labeled 0/1, assuming benign = 0 and malignant = 1!

slide-24
SLIDE 24

Do Part 2 of the worksheet now