[PPT] - TITLE Statistical analysis of labelling patterns of mammary PowerPoint Presentation

SLIDE 1

TITLE Statistical analysis of labelling patterns of mammary carcinoma cell nuclei on histological sections AUTHORS Torsten Mattfeldt1 Stefanie Eckel2 Frank Fleischer3 Volker Schmidt2 DEPARTMENTS

1Department of Pathology, Ulm University 2Institute of Stochastics, Ulm University 3Medical Data Services/Biostatistics, Boehringer-Ingelheim Pharma GmbH & Co. KG

Gemeinsame Arbeitstagung der AGs Bayes-Methodik, R¨ aumliche Statistik und ¨ Okologie und Umwelt 27.–29. September 2007, Schloss Reisensburg

SLIDE 2

OVERVIEW OF THE LECTURE

Explorative point pattern analysis
Point process modelling
Distance-dependent Simpson indices
Monte Carlo rank tests

SLIDE 3

Ductal invasive mammary carcinoma Immunohistochemistry for MIB-1

SLIDE 4

Ductal invasive mammary carcinoma Immunohistochemistry for MIB: Detection of nuclei

SLIDE 5

Case 3, Image 1: Point pattern of unlabelled nuclei

SLIDE 6

Case 3, Image 1: Point pattern of labelled nuclei

SLIDE 7

Case 3, Image 1: Point pattern of unlabelled and labelled nuclei

SLIDE 8

Ductal invasive mammary carcinoma Immunohistochemistry for MIB

SLIDE 9

Ductal invasive mammary carcinoma Immunohistochemistry for MIB: Detection of nuclei

SLIDE 10

MATERIAL AND METHODS CASES

Breast cancer: 20 routine cases
Operative specimens
Domains with invasive ductal adenocarcinomas

MICROSCOPY

Paraffin sections
Light microscopy
Immunohistochemistry

Ki 67: nuclear protein associated with proliferation MIB-1: monoclonal antibody versus Ki 67

SLIDE 11

IMAGE EVALUATION Sampling Two rectangular fields per case Size: 1240 × 1000 pixels = 440 µm × 354 µm Explorative point pattern analysis Interactive detection of centres of tumour cell nuclei on sections Labelled and unlabelled nuclei → marked point pattern Estimation of g(r) for r = 1–200 pixels using kernel methods Epanechnikov kernel Bandwidth: h = 0.1/

ˆ

λ Software Library geostoch under Java (Mayer et al., 2004; http://www.geostoch.de) Package spatstat under R 2.2.0 under Linux (Baddeley & Turner, 2005)

SLIDE 12

EXPLORATIVE ANALYSIS OF PLANAR POINT PATTERNS

Stationary planar point process X = {Xn} with intensity λ
Second order K-function, reduced second moment function K(r)

K(r) = E(number of other points with distance ≤ r |(x, y) ∈ X) λ KPoi(r) = πr2

Pair correlation function g(r)

g(r) =̺(2)(r) λ2 = 1 2πr dK(r) dr gPoi(r) = 1

SLIDE 13

ESTIMATION OF THE PAIR CORRELATION FUNCTION

Estimation of the product density
̺(2)(r) =

1 2πr

Xi,Xj∈W i=j

kh(r − ||Xi − Xj||) |WXi ∩ WXj| kh(x) = 3 4h(1 − x2 h2 )1(−h,h)(x)

Estimation of the squared intensity
λ2 = X(W)(X(W) − 1)

|W|2

Estimation of g(r)
g(r) =
̺(2)(r)
λ2

SLIDE 14

Case 3, Image 1: g-function of unlabelled nuclei

SLIDE 15

Case 3, Image 1: g-function of labelled nuclei

SLIDE 16

Case 3, Image 1: g-functions of  unlabelled and --- labelled nuclei

SLIDE 17

Case 3: Mean g-functions of  unlabelled and --- labelled nuclei

SLIDE 18

Estimated g-functions of  unlabelled and --- labelled nuclei Mean values of all 20 cases

SLIDE 19

Local comparisons of g-functions Mean values for labelled and unlabelled nuclei Unlabelled Labelled D Level of r ¯

g(r)

¯

g(r)

significance 5 0.00000 0.29258 0.29258 p < 0.001 10 0.00127 0.70074 0.69947 p < 0.05 15 0.24768 1.33178 1.08410 p < 0.001 20 1.13593 1.94351 0.80758 p < 0.001 25 1.46384 2.33754 0.87370 p < 0.001 30 1.35353 2.29150 0.93797 p < 0.001 35 1.22940 2.00732 0.77792 p < 0.001 40 1.17549 1.77813 0.60265 p < 0.001 45 1.16746 1.61141 0.44395 N.S. 50 1.15192 1.50813 0.35621 N.S. 55 1.13997 1.42883 0.28886 N.S. 60 1.13603 1.35491 0.21888 N.S. 65 1.12911 1.30302 0.17391 N.S. 70 1.11672 1.28056 0.16385 N.S. 75 1.10675 1.27387 0.16712 N.S. 80 1.09886 1.28003 0.18117 N.S. 85 1.10246 1.26078 0.15832 N.S. 90 1.10575 1.22666 0.12091 N.S. 95 1.08833 1.17965 0.09132 N.S. 100 1.10102 1.13606 0.03504 N.S. 150 1.05846 1.13965 0.08120 N.S. 200 1.04896 1.07587 0.02691 N.S.

SLIDE 20

(rmax, gmax)

+

(rmin, gmin)

+ ∆ = gmax − gmin

M = (gmax − gmin)/(rmin − rmax)

SLIDE 21

Group comparisons of explorative summary characteristics Estimate Unlabelled nuclei Labelled nuclei Level of ¯ x SD ¯ x SD significance N(nucl/field) 741 217 89 47 p < 0.001 λ(points/pixel2) 0.0005957 0.000175 0.00007177 0.0000379 p < 0.001 r0 (pixel) 14.75 1.18 18.25 1.81 p < 0.001 rmax (pixel) 25.70 3.55 27.13 5.66 N.S. gmax (pixel) 1.582 0.295 2.559 0.655 p < 0.001 rmin (pixel) 40.46 6.53 53.18 12.69 p < 0.001 gmin (pixel) 1.113 0.162 1.244 0.377 N.S. M 0.035 0.017 0.051 0.030 N.S. ∆g 0.469 0.191 1.314 0.699 p < 0.001

SLIDE 22

Estimated K-functions of  unlabelled and --- labelled nuclei Mean values of all 20 cases versus Poisson process

SLIDE 23

POINT PROCESS MODELLING Model Gibbs processes Stationary Strauss hard core process Methods Package spatstat under R 2.2.0 under Linux (Baddeley & Turner, 2005) Explorative statistics (λ, r0, g(r)) Parametric model fitting

SLIDE 24

FITTING OF THE STATIONARY STRAUSS HARD CORE MODEL PROPERTIES Parameters λ, r0, R, γ r < hc No point pairs within minimal interpoint distance r0 hc ≤ r < R Interval of interaction distances if (γ < 1): Repulsion if (γ = 1): Classical hard core point process if (γ > 1): Clustering r ≥ R ’Some radius beyond which influence is inconceivable’ IRREGULAR PARAMETERS Hard core distance r0 Estimator: minimum interpoint distance Interaction radius R Method: profile maximum pseudolikelihood REGULAR PARAMETER Interaction parameter γ

SLIDE 25

Group comparisons of model parameters Unlabelled nuclei Labelled nuclei Level of ¯ x SD ¯ x SD significance Intensity N(nucl/field) 741 217 89 47 p < 0.001 λ(points/pixel2) 0.0005957 0.000175 0.00007177 0.0000379 p < 0.001 Strauss hard core model r0 (pixel) 14.75 1.18 18.25 1.81 p < 0.001 R (pixel) 39.12 18.94 44.52 14.28 N.S. γ 0.874 0.334 3.164 2.010 p < 0.001

SLIDE 26

Distance-independent characteristics of diversity Simpson index D D = 1 −

m

i=1

λ2

i

λ2

Definition

Probability to select a point pair at random belonging to different components

Measure of diversity
Distance-independent → Generalisation to distance-dependent Simpson indices

SLIDE 27

Distance-dependent characteristics of diversity α(r) = 1 −

n

i=1

λ2

i Kii(r)

λ2K(r)

Probability to select a point pair at random belonging to different components

conditional to the event that it has distance less than r

Random labelling → α(r) = D
α(r) < D: point pattern has smaller diversity

for distances below r than in the case of random labelling

α(r) > D: point pattern has larger diversity

for distances below r than in the case of random labelling

SLIDE 28

Distance-dependent characteristics of diversity β(r) = 1 −

n

i=1

λ2

i gii(r)

λ2g(r)

Probability to select a point pair at random belonging to different components

conditional to the event that it has distance r

Random labelling → β(r) = D
β(r) < D: point pattern has smaller diversity at distance r

than in the case of random labelling

β(r) > D: point pattern has larger diversity at distance r

than in the case of random labelling

SLIDE 29

Distance-dependent characteristics α(r) Simpson index D, 95% CI of α(r), α(r) for pattern1 of case 1 of mammary cancer

SLIDE 30

Distance-dependent characteristics α(r) Simpson index D, 95% CI of α(r), α(r) for 20 cases of mammary cancer

SLIDE 31

Distance-dependent characteristics β(r) Simpson index D, 95% CI of β(r), β(r) for 20 cases of mammary cancer

SLIDE 32

Testing individual point patterns on random labelling

Random labelling

K11(r) = K22(r)

Test statistic with equal weights

T (1) =

s

l=1

| K11(rl) − K22(rl)|

Weighted test statistic

T (2) =

s

l=1

w(rl)( K11(rl) − K22(rl))2 w(rl) = ( V ar(K11(rl) − K22(rl)))−1

SLIDE 33

Monte Carlo rank test on random labelling

Simulate 9999 realisations based on independent labelling

Given the realisation of locations and the observed values of the marks, assign the marks completely randomly to the locations

Estimate the functions K11 and K22 at distances r1, ..., rs
Compute values of the test statistic T1, ..., T9999 for the 9999 simulations

and T for the single real pattern

Sort the resulting 10000 values in ascending order
Determine the rank of T in this sequence
Reject the hypothesis of random labelling, if the rank of T ∈ [9501, 10000]

SLIDE 34

Simulation of random labelling Case 3, image 1 Case 3, image 1 Real pattern Simulated pattern #1 of 9999 Coordinates identical Labelling fraction identical Labels random

SLIDE 35

Case 3, image 1 K-functions of unlabelled and labelled nuclei Estimated from simulation #1 of 9999 Case 3, image 1 K-functions of unlabelled and labelled nuclei Estimated from the real pattern

SLIDE 36

Case 3, image 1 g-functions of unlabelled and labelled nuclei Estimated from the real pattern Case 3, image 1 g-functions of unlabelled and labelled nuclei Estimated from simulation #1 of 9999

SLIDE 37

Monte Carlo tests on random labelling

Test statistic Number of Number of rejected accepted patterns patterns K-function Absolute integral deviation, 22 18 equal weights Squared deviations, 25 15 weighting related to variance g-function Absolute integral deviation, 28 12 equal weights

SLIDE 38

SUMMARY OF RESULTS

Explorative second-order statistics

mean K-function for labelled nuclei ↑ mean g-function for labelled nuclei ↑

Parameters of the stationary Strauss hard core model

mean interaction parameter γ for labelled nuclei ↑

Distance-dependent Simpson indices
bserved labelling is significantly different from random labelling

positive spatial correlation of the labelled points

Monte Carlo tests

null hypothesis of random labelling rejected for the majority of patterns

SLIDE 39

Statistical inference from K(r) and g(r) Case 3, Image 1 Labelled nuclei K-function Case 3, Image 1 Labelled nuclei g-function Nonparametric evaluation rmax, gmax, rmin, gmin, ∆g, M, ...

SLIDE 40

Linux cluster pacioli.mathematik.uni-ulm.de 16 nodes with 2 AMD Opterons 8 GB RAM

SLIDE 41

CONCLUSIONS

Labelling of nuclei

not purely random tendency towards spatial clustering

Statistical methodology

Concordant results due to 4 approaches Battery of methods → more reliability Practical advantages of g(r) as compared to K(r)

Computational aspects

Data structure in biomedical studies: many samples Modelling, simulation in spatial statistics → computer-intensive methods Parallel computations on multiple patterns in computer clusters Convential programs in batch mode (e.g. PBS) Parallel programming of source code not mandatory

SLIDE 42

REFERENCES Diggle, P.J. (2003) Statistical Analysis of Spatial Point Patterns. Second edition. London, Arnold. Eckel, S., Fleischer, F. Grabarnik, P. & Schmidt, V. (2007) An investigation on the spatial correlations for relative purchasing power in Baden-W¨ urttemberg.

Adv. Statist. Anal. (Preprint).

Mattfeldt, T. Eckel, S., Fleischer, F. & Schmidt, V. (2006) Statistical analysis of reduced pair correlation functions of capillaries in the prostate gland.

J. Microsc. 223, 107–119.

Mattfeldt, T., Eckel, S., Fleischer, F. & Schmidt, V. (2007) Statistical modelling of the geometry of planar sections of prostatic capillaries on the basis of stationary Strauss hard-core processes.

J. Microsc. (in press).