[PPT] - A Reader Study on a 14head Microscope Brandon D. Gallas, Qi Gong PowerPoint Presentation

SLIDE 1

A Reader Study on a 14‐head Microscope

Brandon D. Gallas, Qi Gong FDA/CDRH/OSEL/DIDSR, Silver Spring, MD, US Jamal Benhamida, Matthew G. Hanna, S. Joseph Sirintrapun, Kazuhiro Tabata, Yukako Yagi Memorial Sloan Kettering Cancer Center (MSKCC), Pathology Informatics, New York, NY, US Partha P. Mitra Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, US

SLIDE 2

A Reader Study on a 14‐head Microscope

Purpose

Purpose of this work

– Demonstration … proof of concept … technology demonstration … method development

Technology evaluation, not clinical performance

Task‐based evaluation of image quality – Task: Detection and classification of mitotic figures (MFs) – Images: Glass slides and WSI – Readers: Pathologists – Performance: Within‐ and Between‐Reader Agreement

5/23/2018 www.fda.gov 2

Clinically relevant task Part of every pathologist’s training Challenging task (substantial reader variability) Convenient samples Agreement … No ground truth Count differences (calibration) Pairwise Concordance (correlation) “MRMC” analyses account for variability from Multiple Readers and Multiple Cases

SLIDE 3

Microscope still the gold standard

Remove search from technology evaluation

Eliminate location variability for faster and

more precise results.

eeDAP: Evaluation Environment for Digital and Analog Pathology

eeDAP: Evaluation Environment for Digital

and Analog Pathology

Registration allows pathologists to

evaluate the same fields of view

5/23/2018 www.fda.gov 3

Clinical practice Pathologists choose Fields of View to evaluate

Pathologist 1 Pathologist 2 Pathologist 3 Pathologist 4

Technology Evaluation All pathologists evaluate same Fields of View Camera image

f glass slide

WSI Patch

SLIDE 4

NIH Mitotic Counting Study

NIH Study data (Mark Simpson)

– FOV locations saved for each pathologist in digital mode – Preliminary agreement results given during WSIWG meeting

5/23/2018 www.fda.gov 4

Counts come from different tissue! Clinical practice vs. technology evaluation

pHH3 20x H&E 20x H&E 40x

SLIDE 5

eeDAP on the road last year …

5/23/2018 www.fda.gov 5

Monitor, Computer, motorized stage with joystick, microscope with mounted camera, reticle in eyepiece

SLIDE 6

Mitotic Counting and Classification

Install, Demo, Train at MSKCC Study Design

4 slides from Mark Simpson at

NCI

– HE: canine oral melanoma

10 ROIs per slide from tumor

– ROI – 800 x 800 pixels @ 0.25um/pixel 200um x 200um 17% of the entire FOV (0.24 mm2)

4 pathologists from MSKCC

5/23/2018 www.fda.gov 6 eeDAP on loan to MSKCC

SLIDE 7

Quick look at first study

Circles: mitotic figures

identified by pathologists.

“Candidate MFs” =

marked cells

Each color corresponds to

a different pathologist.

5/23/2018 www.fda.gov 7 WSI image

SLIDE 8

Readers per Candidate MF

Readers Per Candidate, total = 92

readersPerCandidate1 Density 1 2 3 4 0.0 0.1 0.2 0.3 0.4 45 12 14 21

45/92 = 49% marked by only one
21/92 = 23% unanimously

marked

Build these candidate MFs into

next study: Classification task

Need some low‐probability

candidates from ROIs with zero or

ne candidates ‐> yield 34

5/23/2018 www.fda.gov 8

SLIDE 9

Can we use eeDAP on this multi‐headed microscope?

Same microscope frame …

14 heads!

Stage mounts fine
Camera mounts fine
Let’s do it.

5/23/2018 www.fda.gov 9

SLIDE 10

Mitotic counting and Classification: Multi‐head microscope

High‐throughput reader study Study Design

5/23/2018 www.fda.gov 10

4 slides from Mark Simpson at NCI

– HE: canine oral melanoma

10 ROIs per slide from tumor

– ROI = 800 x 800 pixels @ 0.25um/pixel = 200um x 200um = 17% of the entire FOV (0.24 mm2)

126 (=92+34) Candidate MFs
10 pathologists*
Collect data on paper

– ~1 hour training – ~2 hours for data collection

SLIDE 11

Mitotic counting and Classification: Multi‐head microscope

High‐throughput reader study Workflow

5/23/2018 www.fda.gov 11

Mark and count in ROI
Classify candidates in same ROI

SLIDE 12

Readers per Candidate: Multi‐head study

Similar characteristics as

before

79/158 = 49% marked by only
ne
21/158 = 13% unanimously

marked

– 13 agree with previous, 8 new

nes

5/23/2018 www.fda.gov 12

Readers Per Candidate, total = 158

readersPerCandidate2 Density 2 4 6 8 10 0.0 0.1 0.2 0.3 0.4 0.5 79 12 7 6 5 9 4 9 6 21

SLIDE 13

Counting Results

Each point =

– One ROI and a pair of readers – Appears twice (transpose x,y) – Has noise added for visualization

How do we summarize this?

5/23/2018 www.fda.gov 13

2 4 6 8 2 4 6 8

Between-reader Scatter Plot micro14 vs. micro14

Reader counts: 14-head Microscope Reader counts: 14-head Microscope

Agreement … No ground truth Count differences (calibration) Pairwise Concordance (correlation) “MRMC” analyses account for variability from Multiple Readers and Multiple Cases

SLIDE 14

Results: Count Differences

Rotate 45 and

rescale x‐axis

‐> Bland‐Altman plot

5/23/2018 www.fda.gov 14

2 4 6 8 2 4 6 8

Between-reader Scatter Plot micro14 vs. micro14

Reader counts: 14-head Microscope Reader counts: 14-head Microscope

SLIDE 15

Results: Count differences

2 4 6 8

6
4
2

2 4 6

Between-Reader Differences: micro14 N = 3420

Count Averages Count Differences

Rotate 45 and

rescale x‐axis

‐> Bland‐Altman plot

Limits of agreement

– Characterize spread of differences – σ = 1.07

Not the standard error

– SE characterizes the spread of the mean difference

5/23/2018 www.fda.gov 15 σ = 1.07

“MRMC” analyses: account for variability from Multiple Readers and Multiple Cases

SLIDE 16

Results: Count Differences

Study 1:

– More MFs with microscope – Count differences were larger with digital

Study 2:

– Microscope results consistent with Study 1

5/23/2018 www.fda.gov 16

Study 1: Average Counts SE Average Counts Std of Between‐Reader Count Differences Digital 1.22 0.23 1.29 Microscope 1.48 0.27 1.12 Microscope ‐ Digital 0.26 0.12 1.20

SLIDE 17

Results: Count Differences

Study 1:

– More MFs with microscope – Count differences were larger with digital

Study 2:

– Microscope results consistent with Study 1

5/23/2018 www.fda.gov 17

Study 1: Average Counts SE Average Counts Std of Between‐Reader Count Differences Digital 1.22 0.23 1.29 Microscope 1.48 0.27 1.12 Microscope ‐ Digital 0.26 0.12 1.20

SLIDE 18

Results: Count Differences

Study 1:

– More MFs with microscope – Count differences were larger with digital

Study 2:

– Microscope results consistent with Study 1

5/23/2018 www.fda.gov 18

Study 1: Average Counts SE Average Counts Std of Between‐Reader Count Differences Digital 1.22 0.23 1.29 Microscope 1.48 0.27 1.12 Microscope – Digital 0.26 0.12 1.20 Study 2: 14‐head Microscope 1.54 0.25 1.07

SLIDE 19

Pairwise Concordance

A probability that tracks with correlation

Select two ROIs
Consider the counts from two pathologists

– Pathologist 1: X1, X2 – Pathologist 2: Y1, Y2

Possible outcomes

5/23/2018 www.fda.gov 19

Concordance X1>X2, Y1>Y2 Discordance X1>X2, Y1<Y2 Tie for Pathologist 1 X1=X2, Y1≠Y2 Tie for Pathologist 2 X1≠X2, Y1=Y2 Tie for both pathologists X1=X2, Y1=Y2

2 4 6 8 2 4 6 8

Between-reader Scatter Plot micro14 vs. micro14

Reader counts: 14-head Microscope Reader counts: 14-head Microscope

No time for concordance results

SLIDE 20

Classification scores

20 40 60 80 100 20 40 60 80 100

Between-reader Scatter Plot micro14 vs. micro14

Reader classification scores Reader classification scores

Summarize with concordance
For some cases

– “Definitely Not a MF” – “Definitely Is a MF”

Can this impact AI training?
Can still binarize this data

– 15% in the red zone

5/23/2018 www.fda.gov 20 No time for concordance results

SLIDE 21

Generalize to evaluating computational pathology

FDA qualification of images with annotations

– MDDT: Medical Device Development Tools – Support FDA submissions of computational pathology

Generate candidates from

– Pathologists AND – Algorithm(s)

Candidates cover

range in likelihood the candidate is a MF

Use same agreement measures

5/23/2018 www.fda.gov 21

Readers Per Candidate, total = 158

readersPerCandidate2 Density 2 4 6 8 10 0.0 0.1 0.2 0.3 0.4 0.5 79 12 7 6 5 9 4 9 6 21

Reduces bias in the comparison Similar plot AI likelihood instead of readers per candidate

SLIDE 22

Summary

Collected and analyzing:

– MF counts, locations, and classifications

Agreement analyses

– MRMC analysis – Calibration – Correlation – Unit of analysis: cells > ROIs > slides

Limitations

– Anecdotal feedback

Pathologists felt rushed
Focus handling not perfect
No reticles in eyepieces

– No Ground Truth

Future work

– Generalize to other ROIs? – Generalize to other specimens (organs)?

Evaluate AI algorithms

– Use similar study design – Use similar analysis tools – Need “candidates” from algorithms and pathologists for unbiased evaluation

FDA qualification of images with annotations

– MDDT: Medical Device Development Tools – Test sets for FDA submissions of computational pathology

5/23/2018 www.fda.gov 22