A tutorial in spatial statistics for microscopy data analysis Ed - - PowerPoint PPT Presentation
A tutorial in spatial statistics for microscopy data analysis Ed - - PowerPoint PPT Presentation
A tutorial in spatial statistics for microscopy data analysis Ed Cohen Department of Mathematics, Imperial College London wwwf.imperial.ac.uk/ eakc07 QBI 2019 Spatial Statistics Spatial Statistics: Statistical theory and methodology for
Spatial Statistics
Spatial Statistics: Statistical theory and methodology for modelling and analysing spatial data. Fluorescence microscopy is concerned with imaging objects, We are interested in understanding spatial
- rganisation of objects to inform our
understanding of biological mechanisms and processes. Therefore we will restrict ourselves to spatial point patterns.
Spatial Point Pattern
Data in the form of a set points, irregularly distributed in a region of space are called a spatial point pattern. Arise in many different contexts, e.g.
◮ Location of trees in a forest ◮ Location of ants nests in compact geographical
region
◮ Location of a particular protein of interest in a
cellular environment.
Spatial Point Pattern
Mathematically, we can represent a spatial point pattern as a set of locations Φ = {s1, s2, ...}, with each event si belongs to X, a (locally compact subset) of Rd. For example: in fluorescence microscopy imaging, X is typically a square region in R2, and each fluorophore/event has a true position si = (xi, yi).
Spatial point processes
Informally, a point process is a stochastic mechanism that generates a countable set of events - i.e. a spatial point pattern. It is the probabilistic framework that governs how many events there are and where they
- ccur.
Analogous to a probability distribution for random variables.
STOCHASTIC MECHANISM
ROLLING TWO DICE BIVARIATE DISCRETE UNIFORM DISTRIBUTION ON {1,2,3,4,5,6} x {1,2,3,4,5,6}
REALIZATIONS
CLATHARIN COATED PITS ON CELL MEMBRANCE POISSON PROCESS WITH INTENSITY !
Describing and characterizing spatial point processes
We typically represent a spatial point process by N, where N(A) is a random number indicating the number of events within some set A. !" # = 8 !& # = 0 !( # = 4 It is the probability distribution of N(A) for all (nice) sets A that characterizes a spatial point process.
Characterizing spatial point processes
Intensity (localized rate of events): λ(s) = lim
|ds|↓0
E{N(ds)} |ds| . The second-order intensity of a spatial point process N at points s, u ∈ X is γ(s, u) = lim
|ds||du|↓0
E{N(ds)N(du)} |ds||du| . The second-order covariance of a spatial point process N at points s, u ∈ X is c(s, u) = γ(s, u) − λ(u)λ(s) cov(X, Y ) = E(XY ) − E(X)E(Y ). Pair correlation function: g(s, u) = γ(s, u) λ(s)λ(u).
Characterizing first and second order moments of spatial point processes
Homogeneity: λ(s) is constant for all s ∈ X. Translates as: the chance of getting an event at any particular point in spaces is the same across X. Stationarity and isotropic: γ(s, u) = γ(||s − u||) = γ(r). Translates as: The covariance between any two points in space having an event or not depends only on the distance between them. These assumptions are not as restrictive as they first seem. If the heterogeneity is itself random then these notions can still hold.
Poisson process
Spatial point process N is Poisson if the following hold: For every (nice) subset A ⊂ X, the number of events is Poisson distributed with expected value µ(A) =
- A λ(s)ds
For any collection A1, ..., An, the random variables N(A1), ..., N(An) are independent of one another. Poisson processes have memoryless property - all events are independent of eachother. Homogeneous Poisson processes are known as completely spatial random (CSR).
Complete spatial randomness
Complete spatial randomness
0 0 1 0 0 0 1 1 0 0 1 1 0 1 0 1 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 1 1 1 0 0 1 0 1 0 1 0 1 0 0 1 0 1 1 0 1 0 1 0 0 0 1 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 0 1 1 1 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 1 1 1 1 0 0 1 1 0 0 0 0 1 0 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1 0 1 0 0 1 0 0 0 0 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 1 1 1 1 1 0 1 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 1 0 1 1 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0 1 1 0 1 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 0 1 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1 1 0 1 1 0 1 0 0 0 1 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 0 0 0 1 1 1 0 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1 1 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 0 1 1 1 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 1 0 1 0 0 1 1 1 1 1 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 1 1 0 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 0 0 1 1 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 1 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 1 1 1 0 1 1 1 0 1 0 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 1 0 1 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 0 0 1 0 0 0 1 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 0 0 1 0 1 1 0 1 1 1 0 1 1 0 1 0 0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0
CSR vs Clustering vs Inhibition/Regularity
CSR Clustered Inhibited
CSR vs Clustering vs Inhibition/Regularity
CSR Clustered Inhibited
Ripley’s K-function
Ripley’s K-function is used extensively across the sciences, including microscopy to detect and characterize clustering behavior. It is a theoretical property of the point process. K(r) ≡ r 2πr′g(r′)dr′ = λ−1E{number of events within distance r of an arbitrary event}. Its widespread use lies in its interpretability and the ease at which it can be estimated with robust, well studied estimators from a single point pattern. Many of the recent developments in spatial data analysis have this function at their heart. !
Worked example: Poisson process
The second-order intensity of a homogeneous N at points s, u ∈ X is γ(s, u) = lim
|ds||du|↓0
E{N(ds)N(du)} |ds||du| = lim
|ds||du|↓0
E{N(ds)}{N(du)} |ds||du| = λ(s)λ(u) g(s, u) = γ(s, u) λ(s)λ(u) = 1 K(r) = πr2 L(r) − r ≡
- K(r)/π − r = 0.
!
K-function for different types of process
CSR Clustered Inhibited
L(r) - r r L(r) - r r L(r) - r r
Estimation
REMEMBER: we are interested in knowing the properties of the PROCESS. We need to estimate them from the pattern - typically we only get one pattern with which to estimate them. ˆ K(r) = A n(n − 1)
n
- i=1
n
- j=1
wijI(0 < dij < r)
Estimation
REMEMBER: we are interested in knowing the properties of the PROCESS. We need to estimate them from the pattern - typically we only get one pattern with which to estimate them. ˆ K(r) = A n(n − 1)
n
- i=1
n
- j=1
wijI(0 < dij < r)
Estimation
REMEMBER: we are interested in knowing the properties of the PROCESS. We need to estimate them from the pattern - typically we only get one pattern with which to estimate them. ˆ K(r) = A n(n − 1)
n
- i=1
n
- j=1
wijI(0 < dij < r)
Estimation
REMEMBER: we are interested in knowing the properties of the PROCESS. We need to estimate them from the pattern - typically we only get one pattern with which to estimate them. ˆ K(r) = A n(n − 1)
n
- i=1
n
- j=1
wijI(0 < dij < r)
Testing
Inference is typically performed through hypothesis testing: H0 : the process is CSR vs HA : the process is not CSR For this we need a test statistics and its distribution under the null (CSR). T = max
r {|ˆ
L(r) − r|}. Lagache et al, Analysis of the Spatial Organization of Molecules with Robust Statistics, PLOS One, 2013.
Testing
0.01 0.02 0.03
r
- 0.03
- 0.02
- 0.01
0.01 0.02 0.03
L(r)-r
0.1 0.2 0.3
r
- 0.03
- 0.02
- 0.01
0.01 0.02 0.03 0.1 0.2 0.3
r
- 0.03
- 0.02
- 0.01
0.01 0.02 0.03
Clustering
Clustering: the act of identifying and characterising clusters size shape number of events in a cluster. Caution should be taken trying to extract these properties from the K-function Rubin-Delanchy et al, Bayesian cluster identification in single-molecule localization microscopy data, Nature Methods 2015. Griffi´ e et al, 3D Bayesian cluster analysis of super-resolution data reveals LAT recruitment to the T cell synapse Staszowska et al, The R´ enyi divergence enables accurate and precise cluster analysis for localization microscopy
Colocalization
Inference is typically performed through hypothesis testing: H0 : the two processes are independent vs HA : the two process are not independent INDEPENDENT COLOCALIZED
Colocalization
Test statistic based on the estimator of the cross K-function K12(r) = λ−1
2 E{number of events of type 2 within distance r of an arbitrary event of type 1}
!
For two independent processes, K12(r) = πr2. However, it is notoriously troublesome to get the distribution of ˆ K12(r) under the null. Legache et al, Mapping molecular assemblies with fluorescence microscopy and object-based spatial statistics, Nature Communications, 2018.
Parametric models
Poisson cluster process Mat´ ern Neyman-Scott process Thomas process Markov point process Strauss process Cox process log-Gaussian-Cox Fibre driven Cox
Resources
Books
◮ P. Diggle. Statistical Analysis of Spatial and Spatio-Temporal Point Patterns. ◮ J. Illian et al. Statistical Analysis and Modelling of Spatial Point Patterns. ◮ N. Cressie. Statistics for Spatial Data. ◮ Chiu and Stoyan. Stochastic Geometry and its applications.