SLIDE 1
TITLE Statistical analysis of labelling patterns of mammary carcinoma cell nuclei on histological sections AUTHORS Torsten Mattfeldt1 Stefanie Eckel2 Frank Fleischer3 Volker Schmidt2 DEPARTMENTS
1Department of Pathology, Ulm University 2Institute of Stochastics, Ulm University 3Medical Data Services/Biostatistics, Boehringer-Ingelheim Pharma GmbH & Co. KG
Gemeinsame Arbeitstagung der AGs Bayes-Methodik, R¨ aumliche Statistik und ¨ Okologie und Umwelt 27.–29. September 2007, Schloss Reisensburg
SLIDE 2 OVERVIEW OF THE LECTURE
- Explorative point pattern analysis
- Point process modelling
- Distance-dependent Simpson indices
- Monte Carlo rank tests
SLIDE 3
Ductal invasive mammary carcinoma Immunohistochemistry for MIB-1
SLIDE 4
Ductal invasive mammary carcinoma Immunohistochemistry for MIB: Detection of nuclei
SLIDE 5
Case 3, Image 1: Point pattern of unlabelled nuclei
SLIDE 6
Case 3, Image 1: Point pattern of labelled nuclei
SLIDE 7
Case 3, Image 1: Point pattern of unlabelled and labelled nuclei
SLIDE 8
Ductal invasive mammary carcinoma Immunohistochemistry for MIB
SLIDE 9
Ductal invasive mammary carcinoma Immunohistochemistry for MIB: Detection of nuclei
SLIDE 10 MATERIAL AND METHODS CASES
- Breast cancer: 20 routine cases
- Operative specimens
- Domains with invasive ductal adenocarcinomas
MICROSCOPY
- Paraffin sections
- Light microscopy
- Immunohistochemistry
Ki 67: nuclear protein associated with proliferation MIB-1: monoclonal antibody versus Ki 67
SLIDE 11 IMAGE EVALUATION Sampling Two rectangular fields per case Size: 1240 × 1000 pixels = 440 µm × 354 µm Explorative point pattern analysis Interactive detection of centres of tumour cell nuclei on sections Labelled and unlabelled nuclei → marked point pattern Estimation of g(r) for r = 1–200 pixels using kernel methods Epanechnikov kernel Bandwidth: h = 0.1/
λ Software Library geostoch under Java (Mayer et al., 2004; http://www.geostoch.de) Package spatstat under R 2.2.0 under Linux (Baddeley & Turner, 2005)
SLIDE 12 EXPLORATIVE ANALYSIS OF PLANAR POINT PATTERNS
- Stationary planar point process X = {Xn} with intensity λ
- Second order K-function, reduced second moment function K(r)
K(r) = E(number of other points with distance ≤ r |(x, y) ∈ X) λ KPoi(r) = πr2
- Pair correlation function g(r)
g(r) =̺(2)(r) λ2 = 1 2πr dK(r) dr gPoi(r) = 1
SLIDE 13 ESTIMATION OF THE PAIR CORRELATION FUNCTION
- Estimation of the product density
- ̺(2)(r) =
1 2πr
kh(r − ||Xi − Xj||) |WXi ∩ WXj| kh(x) = 3 4h(1 − x2 h2 )1(−h,h)(x)
- Estimation of the squared intensity
- λ2 = X(W)(X(W) − 1)
|W|2
- Estimation of g(r)
- g(r) =
- ̺(2)(r)
- λ2
SLIDE 14
Case 3, Image 1: g-function of unlabelled nuclei
SLIDE 15
Case 3, Image 1: g-function of labelled nuclei
SLIDE 16
Case 3, Image 1: g-functions of unlabelled and --- labelled nuclei
SLIDE 17
Case 3: Mean g-functions of unlabelled and --- labelled nuclei
SLIDE 18
Estimated g-functions of unlabelled and --- labelled nuclei Mean values of all 20 cases
SLIDE 19 Local comparisons of g-functions Mean values for labelled and unlabelled nuclei Unlabelled Labelled D Level of r ¯
¯
significance 5 0.00000 0.29258 0.29258 p < 0.001 10 0.00127 0.70074 0.69947 p < 0.05 15 0.24768 1.33178 1.08410 p < 0.001 20 1.13593 1.94351 0.80758 p < 0.001 25 1.46384 2.33754 0.87370 p < 0.001 30 1.35353 2.29150 0.93797 p < 0.001 35 1.22940 2.00732 0.77792 p < 0.001 40 1.17549 1.77813 0.60265 p < 0.001 45 1.16746 1.61141 0.44395 N.S. 50 1.15192 1.50813 0.35621 N.S. 55 1.13997 1.42883 0.28886 N.S. 60 1.13603 1.35491 0.21888 N.S. 65 1.12911 1.30302 0.17391 N.S. 70 1.11672 1.28056 0.16385 N.S. 75 1.10675 1.27387 0.16712 N.S. 80 1.09886 1.28003 0.18117 N.S. 85 1.10246 1.26078 0.15832 N.S. 90 1.10575 1.22666 0.12091 N.S. 95 1.08833 1.17965 0.09132 N.S. 100 1.10102 1.13606 0.03504 N.S. 150 1.05846 1.13965 0.08120 N.S. 200 1.04896 1.07587 0.02691 N.S.
SLIDE 20
(rmax, gmax)
+
(rmin, gmin)
+ ∆ = gmax − gmin
M = (gmax − gmin)/(rmin − rmax)
SLIDE 21
Group comparisons of explorative summary characteristics Estimate Unlabelled nuclei Labelled nuclei Level of ¯ x SD ¯ x SD significance N(nucl/field) 741 217 89 47 p < 0.001 λ(points/pixel2) 0.0005957 0.000175 0.00007177 0.0000379 p < 0.001 r0 (pixel) 14.75 1.18 18.25 1.81 p < 0.001 rmax (pixel) 25.70 3.55 27.13 5.66 N.S. gmax (pixel) 1.582 0.295 2.559 0.655 p < 0.001 rmin (pixel) 40.46 6.53 53.18 12.69 p < 0.001 gmin (pixel) 1.113 0.162 1.244 0.377 N.S. M 0.035 0.017 0.051 0.030 N.S. ∆g 0.469 0.191 1.314 0.699 p < 0.001
SLIDE 22
Estimated K-functions of unlabelled and --- labelled nuclei Mean values of all 20 cases versus Poisson process
SLIDE 23
POINT PROCESS MODELLING Model Gibbs processes Stationary Strauss hard core process Methods Package spatstat under R 2.2.0 under Linux (Baddeley & Turner, 2005) Explorative statistics (λ, r0, g(r)) Parametric model fitting
SLIDE 24
FITTING OF THE STATIONARY STRAUSS HARD CORE MODEL PROPERTIES Parameters λ, r0, R, γ r < hc No point pairs within minimal interpoint distance r0 hc ≤ r < R Interval of interaction distances if (γ < 1): Repulsion if (γ = 1): Classical hard core point process if (γ > 1): Clustering r ≥ R ’Some radius beyond which influence is inconceivable’ IRREGULAR PARAMETERS Hard core distance r0 Estimator: minimum interpoint distance Interaction radius R Method: profile maximum pseudolikelihood REGULAR PARAMETER Interaction parameter γ
SLIDE 25
Group comparisons of model parameters Unlabelled nuclei Labelled nuclei Level of ¯ x SD ¯ x SD significance Intensity N(nucl/field) 741 217 89 47 p < 0.001 λ(points/pixel2) 0.0005957 0.000175 0.00007177 0.0000379 p < 0.001 Strauss hard core model r0 (pixel) 14.75 1.18 18.25 1.81 p < 0.001 R (pixel) 39.12 18.94 44.52 14.28 N.S. γ 0.874 0.334 3.164 2.010 p < 0.001
SLIDE 26 Distance-independent characteristics of diversity Simpson index D D = 1 −
m
λ2
i
λ2
Probability to select a point pair at random belonging to different components
- Measure of diversity
- Distance-independent → Generalisation to distance-dependent Simpson indices
SLIDE 27 Distance-dependent characteristics of diversity α(r) = 1 −
n
λ2
i Kii(r)
λ2K(r)
- Probability to select a point pair at random belonging to different components
conditional to the event that it has distance less than r
- Random labelling → α(r) = D
- α(r) < D: point pattern has smaller diversity
for distances below r than in the case of random labelling
- α(r) > D: point pattern has larger diversity
for distances below r than in the case of random labelling
SLIDE 28 Distance-dependent characteristics of diversity β(r) = 1 −
n
λ2
i gii(r)
λ2g(r)
- Probability to select a point pair at random belonging to different components
conditional to the event that it has distance r
- Random labelling → β(r) = D
- β(r) < D: point pattern has smaller diversity at distance r
than in the case of random labelling
- β(r) > D: point pattern has larger diversity at distance r
than in the case of random labelling
SLIDE 29
Distance-dependent characteristics α(r) Simpson index D, 95% CI of α(r), α(r) for pattern1 of case 1 of mammary cancer
SLIDE 30
Distance-dependent characteristics α(r) Simpson index D, 95% CI of α(r), α(r) for 20 cases of mammary cancer
SLIDE 31
Distance-dependent characteristics β(r) Simpson index D, 95% CI of β(r), β(r) for 20 cases of mammary cancer
SLIDE 32 Testing individual point patterns on random labelling
K11(r) = K22(r)
- Test statistic with equal weights
T (1) =
s
| K11(rl) − K22(rl)|
T (2) =
s
w(rl)( K11(rl) − K22(rl))2 w(rl) = ( V ar(K11(rl) − K22(rl)))−1
SLIDE 33 Monte Carlo rank test on random labelling
- Simulate 9999 realisations based on independent labelling
Given the realisation of locations and the observed values of the marks, assign the marks completely randomly to the locations
- Estimate the functions K11 and K22 at distances r1, ..., rs
- Compute values of the test statistic T1, ..., T9999 for the 9999 simulations
and T for the single real pattern
- Sort the resulting 10000 values in ascending order
- Determine the rank of T in this sequence
- Reject the hypothesis of random labelling, if the rank of T ∈ [9501, 10000]
SLIDE 34
Simulation of random labelling Case 3, image 1 Case 3, image 1 Real pattern Simulated pattern #1 of 9999 Coordinates identical Labelling fraction identical Labels random
SLIDE 35
Case 3, image 1 K-functions of unlabelled and labelled nuclei Estimated from simulation #1 of 9999 Case 3, image 1 K-functions of unlabelled and labelled nuclei Estimated from the real pattern
SLIDE 36
Case 3, image 1 g-functions of unlabelled and labelled nuclei Estimated from the real pattern Case 3, image 1 g-functions of unlabelled and labelled nuclei Estimated from simulation #1 of 9999
SLIDE 37
Monte Carlo tests on random labelling
Test statistic Number of Number of rejected accepted patterns patterns K-function Absolute integral deviation, 22 18 equal weights Squared deviations, 25 15 weighting related to variance g-function Absolute integral deviation, 28 12 equal weights
SLIDE 38 SUMMARY OF RESULTS
- Explorative second-order statistics
mean K-function for labelled nuclei ↑ mean g-function for labelled nuclei ↑
- Parameters of the stationary Strauss hard core model
mean interaction parameter γ for labelled nuclei ↑
- Distance-dependent Simpson indices
- bserved labelling is significantly different from random labelling
positive spatial correlation of the labelled points
null hypothesis of random labelling rejected for the majority of patterns
SLIDE 39
Statistical inference from K(r) and g(r) Case 3, Image 1 Labelled nuclei K-function Case 3, Image 1 Labelled nuclei g-function Nonparametric evaluation rmax, gmax, rmin, gmin, ∆g, M, ...
SLIDE 40
Linux cluster pacioli.mathematik.uni-ulm.de 16 nodes with 2 AMD Opterons 8 GB RAM
SLIDE 41 CONCLUSIONS
not purely random tendency towards spatial clustering
Concordant results due to 4 approaches Battery of methods → more reliability Practical advantages of g(r) as compared to K(r)
Data structure in biomedical studies: many samples Modelling, simulation in spatial statistics → computer-intensive methods Parallel computations on multiple patterns in computer clusters Convential programs in batch mode (e.g. PBS) Parallel programming of source code not mandatory
SLIDE 42 REFERENCES Diggle, P.J. (2003) Statistical Analysis of Spatial Point Patterns. Second edition. London, Arnold. Eckel, S., Fleischer, F. Grabarnik, P. & Schmidt, V. (2007) An investigation on the spatial correlations for relative purchasing power in Baden-W¨ urttemberg.
- Adv. Statist. Anal. (Preprint).
Mattfeldt, T. Eckel, S., Fleischer, F. & Schmidt, V. (2006) Statistical analysis of reduced pair correlation functions of capillaries in the prostate gland.
- J. Microsc. 223, 107–119.
Mattfeldt, T., Eckel, S., Fleischer, F. & Schmidt, V. (2007) Statistical modelling of the geometry of planar sections of prostatic capillaries on the basis of stationary Strauss hard-core processes.