An unbiased logistic regression estimating function for spatial - PowerPoint PPT Presentation

An unbiased logistic regression estimating function for spatial Gibbs point processes Rasmus Waagepetersen Department of Mathematical Sciences Aalborg University joint work (in progress !) with Adrian Baddeley, Jean-Francois Coeurjolly and Ege Rubak 1 / 22

Gibbs point process and conditional intensity Point process X : random point pattern. Conditional intensity λ ( u , X ): for small region A and u ∈ A , | A | λ ( u , X ) ≈ P ( X has a point in A | X \ A ) GNZ-formula: � � f ( u , X \ u ) = E [ f ( u , X ) λ ( u , X )] d u E u ∈ X for non-negative functions f . X observed in bounded region W . 2 / 22

Parametric model λ θ ( u , X ) for conditional intensity. Strauss: λ θ ( u , X ) = βγ n R ( u , X ) , β > 0 , 0 < γ ≤ 1 u 1[ � u − v � ≤ R ]: number of neighboring points n R ( u , X ) = � v ∈ X \ within distance R from u . Log linear: λ θ ( u , X ) = exp[ θ · t ( u , X )] for some statistic t ( u , X ) E.g. (Strauss): t ( u , X ) = (1 , n R ( u , X )) 3 / 22

Pseudo-likelihood Disjoint subdivision W = ∪ m i =1 C i in ‘cells’ C i . Random indicator variables: N i = 1[ X ∩ C i � = ∅ ] (presence/absence of points in C i ). P ( N i = 1 | X \ C i ) ≈ | C i | λ θ ( u i , X ), u i ∈ C i \ X . 4 / 22

log binary pseudo-likelihood based on N i ’s takes logistic regression form: m � N i log[ | C i | λ θ ( u i , X )] + (1 − N i ) log[1 − | C i | λ θ ( u i , X )] ≈ i =1 m | C i | λ θ ( u i , X ) 1 � N i log 1 + | C i | λ θ ( u i , X ) + (1 − N i ) log 1 + | C i | λ θ ( u i , X ) i =1 Log-linear case | C i | λ θ ( u i , X ) exp( θ · t ( u i , X ) 1 + | C i | λ θ ( u i , X ) = exp( θ · t ( u i , X ) + | C i | − 1 5 / 22

binary pseudo-likelihood converges to spatial point process pseudo-likelihood ( | C i | → 0): � � log λ θ ( u , X \ u ) − λ θ ( u , X ) d u W u ∈ X with score λ ′ θ ( X \ u ) � � λ ′ λ θ ( u , X \ u ) − θ ( u , X ) d u W u ∈ X (unbiased due to GNZ formula) 6 / 22

Bias problems with pseudo-likelihood ◮ Binary pseudo-likelihood: biased due to approximation P ( N i = 1 | X \ C i ) ≈ | C i | λ θ ( u i , X ) ◮ spatial point process pseudo-likelihood: score requires numerical approximation λ ′ θ ( u , X \ u ) � � λ ′ λ θ ( u , X \ u ) − θ ( u , X ) d u ≈ W u ∈ X λ ′ θ ( u , X \ u ) � � λ ′ λ θ ( u , X \ u ) − θ ( v , X ) w ( v ) u ∈ X v ∈ Q for quadrature weights and points w ( v ) , v ∈ Q ⇒ bias. 7 / 22

Unbiased ‘logistic regression’ estimating function ′ ′ θ ( u , X \ u ) ρ ( u ) λ λ θ ( u , X ) � � s ( θ ) = λ θ ( u , X \ u )[ λ θ ( u , X \ u ) + ρ ( u )] − λ θ ( u , X ) + ρ ( u ) u ∈ X u ∈ D D : dummy point process of intensity ρ ( · ) independent of X (random ‘quadrature points’). s ( θ ) derivative of ‘logistic regression’ λ θ ( u , X \ u ) 1 � � log λ θ ( u , X \ u ) + ρ ( u ) + log λ θ ( u , X ) + ρ ( u ) u ∈ X u ∈ D 8 / 22

Advantages: ◮ unbiased by GNZ and Campbell formulae (later slide) ◮ formally logistic regression score - computation easy using glm with logistic link function. ◮ tractable asymptotic distribution of parameter estimate in the stationary case ◮ fast computation - parametric bootstrap feasible in inhomogeneous case 9 / 22

Dummy point process Should be easy to simulate and mathematically tractable. Possibilities: Stratified: + + 1. Poisson process + + 2. binomial point process (fixed number + + + of independent points) + + 3. stratified binomial point process + + + ( stratrand() in spatstat ) + + + + 10 / 22

Relation to pseudo-likelihood We can rewrite logistic regression score ′ ′ λ θ ( u , X \ u ) λ θ ( u , X \ u ) � � s ( θ ) = λ θ ( u , X \ u ) − λ θ ( u , X \ u ) + ρ ( u ) u ∈ X u ∈ ( X ∪ D ) By GNZ and Campbell: ′ λ θ ( u , X \ u ; θ ) � � ′ E λ θ ( u , X \ u ) + ρ ( u ) = E λ θ ( u , X ) d u . (1) W u ∈ ( X ∪ D ) Hence ′ θ ( u , X \ u ; θ ) λ � λ θ ( u , X \ u ) + ρ ( u ) u ∈ ( X ∪ D ) unbiased Monte Carlo approximation of last term in pseudo-likelihood score: λ ′ θ ( u , X \ u ) � � λ ′ λ θ ( u , X \ u ) − θ ( u , X ) d u W u ∈ X 11 / 22

Decomposition of logistic regression score ′ ′ ρ ( u ) λ θ ( u , X \ u ) θ ( u , X ) λ � � s ( θ ) = λ θ ( u , X \ u )[ λ θ ( u , X \ u ) + ρ ( u )] − λ θ ( u , X ) + ρ ( u ) u ∈ X u ∈ D ′ ′ ρ ( u ) λ θ ( u , X \ u ) ρ ( u ) λ θ ( u , X ) � � = λ θ ( u , X \ u )[ λ θ ( u , X \ u ) + ρ ( u )] − λ θ ( u , X ) + ρ ( u ) d u W u ∈ X ′ ′ ρ ( u ) λ θ ( u , X ) λ θ ( u , X ) � � + λ θ ( u , X ) + ρ ( u ) d u − λ θ ( u , X ) + ρ ( u ) W u ∈ D = T 1 + T 2 ′ ′ ρ ( u ) λ θ ( u , X ) θ ( u , X ) � λ � λ θ ( u , X ) + ρ ( u ) d u = E [ λ θ ( u , X ) + ρ ( u ) | X ] ⇒ E [ T 2 | X ] = 0 W u ∈ D 12 / 22

By GNZ formula for X ′ ′ θ ( u , X \ u ) ρ ( u ) λ ρ ( u ) λ θ ( u , X ) � � λ θ ( u , X \ u )[ λ θ ( u , X \ u ) + ρ ( u )] = E E λ θ ( u , X ) + ρ ( u ) d u W u ∈ X so E T 1 = ′ ′ θ ( u , X \ u ) ρ ( u ) λ ρ ( u ) λ θ ( u , X ) � � E [ λ θ ( u , X \ u )[ λ θ ( u , X \ u ) + ρ ( u )] − λ θ ( u , X ) + ρ ( u ) d u ] = 0 W u ∈ X 13 / 22

T 1 only depends on X and E [ T 2 | X ] = 0 ⇒ T 1 and T 2 uncorrelated: C ov [ T 1 , T 2 ] = EC ov [ T 1 , T 2 | X ] + C ov [ E [ T 1 | X ] , E [ T 2 | X ]] = 0 14 / 22

Approximate distribution of parameter estimate Parameter estimate ˆ θ solution of s ( θ ) = 0 First order Taylor approximation: s ( θ ) ≈ S (ˆ θ − θ ) ⇔ ˆ θ ≈ θ + S − 1 s ( θ ) where S = − E [ d d θ s ( θ )] 15 / 22

Approximate distribution of parameter estimate Parameter estimate ˆ θ solution of s ( θ ) = 0 First order Taylor approximation: s ( θ ) ≈ S (ˆ θ − θ ) ⇔ ˆ θ ≈ θ + S − 1 s ( θ ) where S = − E [ d d θ s ( θ )] Thus θ ≈ S − 1 V ar s ( θ ) S − 1 = S − 1 V ar T 1 S − 1 + S − 1 V ar T 2 S − 1 = Σ 1 +Σ 2 V ar ˆ NB: T 1 depends only on X while T 2 involves both X and D . V ar T 2 → 0 if ρ ( · ) → ∞ (dense D ) Hence Σ 2 extra variance due to D . 16 / 22

Asymptotic normality Restrict attention to stationary X and increasing observation window W . T 1 asymptotically N (0 , Σ 1 ) by CLT for Gibbs point process innovations (Coeurjolly et al. , 2012). T 2 | X asymptotically normal N (0 , Σ 2 ) by CLT for independent but not identically distributed random variables. NB: Σ 2 does not depend on X ! 17 / 22

Asymptotic normality Restrict attention to stationary X and increasing observation window W . T 1 asymptotically N (0 , Σ 1 ) by CLT for Gibbs point process innovations (Coeurjolly et al. , 2012). T 2 | X asymptotically normal N (0 , Σ 2 ) by CLT for independent but not identically distributed random variables. NB: Σ 2 does not depend on X ! Case of stratified points: m ′ ′ θ ( U i , X ) ρ ( u ) λ θ ( u , X ) λ � � T 2 = − [ λ θ ( U i , X ) + ρ ( U i ) − λ θ ( u , X ) + ρ ( u ) d u ] C i i =1 where W = ∪ i C i and U i uniform on C i . 18 / 22

Asymptotic normality Restrict attention to stationary X and increasing observation window W . T 1 asymptotically N (0 , Σ 1 ) by CLT for Gibbs point process innovations (Coeurjolly et al. , 2012). T 2 | X asymptotically normal N (0 , Σ 2 ) by CLT for independent but not identically distributed random variables. NB: Σ 2 does not depend on X ! Case of stratified points: m ′ ′ θ ( U i , X ) ρ ( u ) λ θ ( u , X ) λ � � T 2 = − [ λ θ ( U i , X ) + ρ ( U i ) − λ θ ( u , X ) + ρ ( u ) d u ] C i i =1 where W = ∪ i C i and U i uniform on C i . Conclusion ˆ θ ≈ θ + S − 1 T 1 + S − 1 T 2 ≈ N ( θ, S − 1 [Σ 1 + Σ 2 ] S − 1 ) 19 / 22

Preliminary simulation results: Strauss process Strauss process with β = 200 γ = 0 . 5 R = 0 . 05 on unit square. ◮ Compare numerically approximated pseudo-likelihood (implementation in spatstat ) with logistic regression score. ◮ variance decomposition Mean number of Poisson quadrature/dummy-points: 625 , 25000 , 10000 , 40000. 20 / 22

Bias and variance Mean and std. dev. of parameter estimates and proportion of variance due to D in % for logistic regression score. Numbers in [] are with spatstat implementation of pseudo-likelihood. β = 200 γ = 0 . 5 E ˆ sd ˆ ρ β β % E ˆ γ sd ˆ γ % 625 204.2 [168.9] 37.8 13 0.502 [0.62] 0.114 5.8 2500 203.4 [188.9] 35.9 4 0.502 [0.55] 0.112 1.5 100 2 203.4 [202.1] 35.3 1 0.502 [0.51] 0.111 0.4 200 2 203.3 [205.5] 35.2 0 0.502 [0.505] 0.111 0.1 Bias small and does not depend on intensity of dummy points for logistic regression score. Even smaller variance contributions from D if stratified dummy points used ! Bias problems with spatstat ! 21 / 22

To be done/work in progress: ◮ further simulation studies ◮ applications to data examples ◮ implementation of estimation procedure and computation of asymptotic covariance matrix in spatstat is on the way ! Thanks for your attention ! 22 / 22

An unbiased logistic regression estimating function for spatial - PowerPoint PPT Presentation

An unbiased logistic regression estimating function for spatial Gibbs point processes Rasmus Waagepetersen Department of Mathematical Sciences Aalborg University joint work (in progress !) with Adrian Baddeley, Jean-Francois Coeurjolly and Ege

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Finite Projective Planes http://math.uwyo.edu/moorhouse/pub/planes/ Eric Moorhouse Mutually

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

t rsr r r ss t

System Identification General Aspects and Structure M. Deistler University of Technology, Vienna

Asymptotic Spectral Analysis of Growing Graphs Nobuaki Obata Graduate School of Information

Lecture 3, Estimation and model validation Erik Lindstrmm Maximum likelihood, recap (1)

Computing rational Gauss-Chebyshev quadrature formulas with complex poles Karl Deckers Joris Van

Asymptotic behaviour of the weighted Shannon differential entropy in a Bayesian problem Mark

Central and non-central limit theorems for statistical functionals based on weakly and strongly

The normal matrix model with monomial potential and multi-orthogonality on a star A.B.J. Kuijlaars