Pseudo Bayesian inference for intensity-dependent point processes - - PowerPoint PPT Presentation

pseudo bayesian inference for intensity dependent point
SMART_READER_LITE
LIVE PREVIEW

Pseudo Bayesian inference for intensity-dependent point processes - - PowerPoint PPT Presentation

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen 1 1 Department of Mathematical Sciences Aalborg University Joint work with Mari Myllym aki Avignon, May 2012 1/35 Pseudo Bayesian inference for


slide-1
SLIDE 1

1/35

Pseudo Bayesian inference for intensity-dependent point processes

Kasper K. Berthelsen1

1Department of Mathematical Sciences

Aalborg University Joint work with Mari Myllym¨ aki

Avignon, May 2012

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-2
SLIDE 2

2/35

Motivation

We are considering marked point patterns {(xi, mi)}, where {xi} denotes the locations of objects (trees) in “window” W , and {mi} denotes the corresponding marks (stem diameter at breast height (DBH)).

20 40 60 80 100 120 20 40 60 80 100 5 10 15 20 25 30 35 0.7 0.8 0.9 1.0 1.1

Stoyan’s kmm(r)

r kmm(r)

◮ We want to construct a reasonable model for the marking

(and distribution) of points

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-3
SLIDE 3

3/35

Hainich data

Data: Location of 650 trees marked by dbh in a 118.5m × 93.75m

  • region. The trees belong to a mixed broad-leaved forest in Hainich

in Western Thuringia (Germany), as so-called selection forest (Plenterwald).

20 40 60 80 100 120 20 40 60 80 100 0.02 0.04 0.06 0.08 0.10 10 20 30 40 50 60 70 estimated intensity marks

◮ Left plot suggests inhomogeneous point distribution. ◮ Right plot suggests mark distribution depends on point

intensity.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-4
SLIDE 4

4/35

Intensity dependent marking

We consider a situation where there is a relation between the marks and the intensity of the point pattern. Two examples where this is relevant

◮ Preferential sampling: One makes more measurements where

the measured value (i.e. the mark) is high, e.g. pollution, see [Diggle et al., 2010]

◮ Density-dependence in plant ecology: In areas with relatively

many trees the trees tend to be small, and vice versa. See [Myllym¨ aki and Penttinen, 2009].

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-5
SLIDE 5

5/35

The model

We consider a model with a density π((xi, mi}|β) = 1 c(β, θm, θϕ)

n

  • i=1

β(xi)π(mi|β(xi)) ×

  • i<j

ϕ((xi, mi), (xj, mj); θϕ), (1) w.r.t. a Poisson process on W × R+. β : W → R+ is the first order term. Conditional on β and {xi} the marks are then distributed as mi|xi, β ∼ π(mi|β(xi), θm), i.e. the distribution of mark mi depends on β evaluated at the location xi and parameters θm. Here where ϕ : (W × M) × (W × M) → [0, 1] is the interaction function.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-6
SLIDE 6

6/35

Specifying the interaction function ϕ

Specifically we choose ϕ((xi, mi), (xj, mj)) =

  • γ

if xi − xj ≤ R(mi + mj) 1

  • therwise,

where R ≥ 0 controls the interaction range and γ ∈ [0, 1] controls the strength of the interaction. Interpretation:

◮ Circular influence zones, where the diameter of the influence

zone centred at xi is proportional to mi (DBH).

◮ The interaction parameter γ specifies the degree of “penalty”

  • n each pair of overlapping influence zones.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-7
SLIDE 7

7/35

Mark distribution

Regarding the mark distribution, we assume mi − m0|θ, β(xi) ∼ Γ

  • c, 1

c

  • a +

b

  • β(xi)
  • ,

where m0 ≥ 0 is the minimum mark size, and Γ(k, θ) denotes the gamma distribution with shape parameter k and scale parameter θ. Hence E[mi−m0|θ, β(xi)] = a+ b

  • β(xi)

and Var[mi − m0|θ, β(xi)] (E[mi − m0])2 = 1 c . The special case, where m0 = 0 and a = 0 we obtain a situation which is similar to location dependent scaling considered by [Hahn et al., 2003].

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-8
SLIDE 8

8/35

Bayesian inference

We perform Bayesian posterior inference for

◮ β

the first order term

◮ a, b, c

parameters of the mark distribution

◮ R, γ

the interaction parameters Priors

◮ For a, b, c, R and γ we assume uniform priors on a bounded

interval.

◮ For β we assume a non-parametric approach

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-9
SLIDE 9

9/35

Prior distribution for β

As a prior on β we use a shot noise style prior β(x) =

  • c∈C

λK(x − c), where λ > 0, C is a Poisson process on R2 and K is a kernel, i.e. a probability density on R2. This is the prior used by [Berthelsen and Møller, 2008] (in the 1-dimensional case). One alternative is a log Gaussian random field. This is the prior considered by H&S (2008) and M&P (2009)

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-10
SLIDE 10

10/35

Approximative prior

For the remainder we focus of the shot-noise prior: β(x) =

  • c∈C

λK(x − c). For simulation purposes we replace the Poisson process C on R2 by a Poisson process C+ on an extended window W+ = {x ∈ R2 : δ(x, W ) ≤ ∆}, ∆ ≥ 0, where δ(A, B) = inf

x∈A,y∈B x − y ,

A, B ⊆ R2. Further, we assume C+ has intensity βC, and that K is the density

  • f a bivariate normal distribution with covariance matrix σ2I.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-11
SLIDE 11

11/35

How to choose ∆

The prior mean of β is [β(x)] = λβC. When restricting C to W+ the prior mean is (obviously) reduced. But by how much? Let D denoted the (missed) contribution for kernels centred

  • utside W+:

D =

  • W
  • c∈C\W+

λK(x, c)dc.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-12
SLIDE 12

12/35

How to choose ∆

The prior mean of β is [β(x)] = λβC. When restricting C to W+ the prior mean is (obviously) reduced. But by how much? Let D denoted the (missed) contribution for kernels centred

  • utside W+:

D =

  • W
  • c∈C\W+

λK(x, c)dc. Then the expected value of D is E[D] = λβC

  • R2\W+
  • W

K(x, c)dxdc

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-13
SLIDE 13

13/35

How to choose ∆

The prior mean of β is [β(x)] = λβC. When restricting C to W+ the prior mean is (obviously) reduced. But by how much? Let D denoted the (missed) contribution for kernels centred

  • utside W+:

D =

  • W
  • c∈C\W+

λK(x, c)dc. Then the expected value of D is E[D] = λβC

  • R2\W+
  • W

K(x, c)dxdc ≤ λβC

  • R2\W+
  • W

k(x, c)dxdc, where k(x, c) ≥ K(x, c) for all (x, c) ∈ W × (R2\W+) is chosen to make integration easier.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-14
SLIDE 14

14/35

Choosing k

Following “B&M 2008”, the function k(x, c) is chosen so that it is constant on W : k(x, c) = 1 2πσ2 exp

  • −δ(c, W )2

2σ2

  • Illustration of the 1-dimensional case:

x ∆ W ∆ c K(x, c) k(x, c) Note: The 1-dimensional case is consider by B&M (2008) where the introduction of bounding function k is not needed.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-15
SLIDE 15

15/35

Bound on E[D]

E[D] ≤ λβC

  • R2\W+
  • W

k(x, c)dxdc = λβC|W | ∞

  • 2(a + b)/(2πσ2) + r/σ2

e−r2/(2σ2)dr

a b ∆ r

The proportion contribution missed: E[D]

  • W E[β(x)]dx =

  • 2(a + b)/(2πσ2) + r/σ2

e−r2/(2σ2)dr Finally, ∆ is determined using numerical methods.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-16
SLIDE 16

16/35

Posterior simulations

We want to explore the posterior distribution, π(θ, β|x) ∝ π(x|θ, β)π(θ, β), using MCMC. For convenience we write the likelihood as π((x, m)|θ, β) = c−1(θ, β)f (x|θ, β), where c−1 is the unknown normalising constant of f (y|θ).

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-17
SLIDE 17

17/35

Posterior simulations

We want to explore the posterior distribution, π(θ, β|x) ∝ π(x|θ, β)π(θ, β), using MCMC. For convenience we write the likelihood as π((x, m)|θ, β) = c−1(θ, β)f (x|θ, β), where c−1 is the unknown normalising constant of f (y|θ). Using (conventional) Metropolis-Hastings updates involves evaluating the Hastings ratio: H(θ, θ′) = c−1(θ′, β′)f (x; θ′, β′)π(θ′, β′)q(θ′, β′; θ, β) c−1(θ, β)f (x; θ, β)π(θ, β)q(θ, β; θ′, β′) Notice this involves evaluating a ratio of unknown normalising constants.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-18
SLIDE 18

18/35

Difficulty of avoiding the normalising constant

◮ There are several ways of circumventing the problem of ratios

  • f unknown normalising constants, e.g. the approaches by

[Møller et al., 2006] or [Murray et al., 2006].

◮ For both of these approaches, each MCMC step involves

simulating (perfectly) a realisation of the mark point process conditional on the proposed values of β, γ, R, a and b.

◮ These perfect realisations be achieved by perfect sampling

(dominating coupling from the past [Kendall and Møller, 2000]).

◮ For many relevant problems however, perfect sampling is

  • infeasible. Instead we we consider a Pseudo Bayesian

approach.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-19
SLIDE 19

19/35

The Pseudo likelihood

In a pseudo Bayesian approach the likelihood is (simply) replaced by the pseudo likelihood: PL(θ|(x, m)) =

n

  • i=1

λθ((xi, mi); (x, m))× exp

  • W
  • M

λθ((y, l); (x, m))dldy

  • ,

where λθ is the Papangelou conditional intensity: λθ((y, l), (x, m)) = β(y)π(l|β(y), θ) ×

n

  • i=1

ϕ((y, l), (xi, mi)). Usually the integral in the PL-function is approximated using some discretisation scheme.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-20
SLIDE 20

20/35

Integral over W

◮ A discretisation of the (location) space W is done by dividing

W into disjoint “cells” Wj and associating each cell with a dummy point yj ∈ W , j = 1, . . . , J.

◮ The integral over W is then approximated by assuming that

the integrand is constant on each cell, with a value obtained at the corresponding dummy point.

◮ Notice that the usual “practical pseudo-likelihood approach”

  • f including the data point in the grid of dummy points

introduces bias (unless you do a clever correction).

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-21
SLIDE 21

21/35

Integral over M

The integral over the markspace M = R+ is

  • M

π(l|β(x), θm)

n

  • i=1

ϕ((yj, l), (xi, mi))dl The product of interaction functions can be written as

n

  • i=1

ϕ((yj, l), (xi, mi)) = γSR((yj,l),(x,m)).(∗∗) where SR((yj, l), (x, m)) =

n

  • i=1

1 yj − xi mi + l < R

  • In other word, (**) is a decreasing step function of l, where each

step is a factor γ lower than the previous step.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-22
SLIDE 22

22/35

Steps

In summary

n

  • i=1

ϕ((yj, l), (xi, mi)) = γSR((yj,l),(x,m)) is a step function. For the dummy point yj steps happen at dj,1, dj,2, . . . , dj,n, where dj,i = max

  • 0, xi − yj

R − mi

  • In the following we assume that dj,1 ≤ dj,2 ≤ . . . ≤ dj,n, and

dj,0 = 0 and dj,n+1 = ∞.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-23
SLIDE 23

23/35

Rewriting integral

Assume F(m|θm, β(x)) is the distribution function corresponding to the mark density π(m|θm, β(x)). The integral

  • M

π(l|β(x), θm)

n

  • i=1

ϕ((yj, l), (xi, mi))dl can now be written as

n+1

  • i=1

(F(dj,i; β(yj), θm) − F(dj,i−1; β(yj), θm)) γi−1. The dj,is are to be pre-calculated for each dummy point.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-24
SLIDE 24

24/35

Approximate pseudo likelihood

The pseudo likelihood can now be approximated by PL(θ) ≈

n

  • i=1

λθ((xi, mi), (x, m))× exp

J

  • j=1

|Wj|β(yj) n+1

  • i=1

(F(dj,i; β(yj), θm) − F(dj,i−1; β(yj), θm))γi−1 With this in place we turn to an example.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-25
SLIDE 25

25/35

Simulation example

In this example W = [0, 118.5] × [0, 93.7], a = 0.2, b = 2, c = 2.5, γ = 0.1 R = 0.02, λ = 20, βC = 0.003. Data β mi vs β(xi)

20 40 60 80 100 120 20 40 60 80 100 V2 20 40 60 80 100 20 40 60 80 100 yseq 0.02 0.04 0.06 0.08 0.10

0.03 0.03 . 3 0.04 0.04 0.04 0.04 . 5 0.05 . 5 0.06 . 6 . 6 . 7 0.07 . 8 0.09

0.04 0.06 0.08 10 20 30 40 50 60 70 sim_par[, 6] * (sim_par[, 4] + sim_par[, 5]/sqrt(beta.seq2))

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-26
SLIDE 26

26/35

Posterior distribution: Interaction

γ R c

0.1 0.2 0.3 0.4 1000 2000 3000 4000 5000 6000 7000 0.018 0.019 0.020 0.021 0.022 0.023 2000 4000 6000 8000 10000 12000 2.2 2.4 2.6 2.8 3.0 3.2 2000 4000 6000 8000 10000

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-27
SLIDE 27

27/35

In details: Posterior distribution of R

Recall pseudo likelihood:

PL(θ) ≈

n

  • i=1

λθ((xi, mi), (x, m))× exp

J

  • j=1

|Wj|β(yj) n+1

  • i=1

(F(dj,i; . . .) − F(dj,i−1; . . .))γi−1

As a function of R, n

i=1 λθ((xi, mi), (x, m)) is a decreasing

stepfunction.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-28
SLIDE 28

28/35

In details: Posterior distribution of R

Recall pseudo likelihood:

PL(θ) ≈

n

  • i=1

λθ((xi, mi), (x, m))× exp

J

  • j=1

|Wj|β(yj) n+1

  • i=1

(F(dj,i; . . .) − F(dj,i−1; . . .))γi−1

As a function of R, n

i=1 λθ((xi, mi), (x, m)) is a decreasing

  • stepfunction. On the other hand, as a function of R

n

  • i=1

(F(dj,i; β(x), θm) − F(dj,i−1; β(x), θm))γi−1 is a continuous, decreasing function. As a result PL becomes “saw toothed”.

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-29
SLIDE 29

29/35

Posterior distribution: β

True β: Posterior mean β:

20 40 60 80 100 20 40 60 80 100 yseq 0.02 0.04 0.06 0.08 0.10

0.03 0.03 . 3 0.04 0.04 0.04 0.04 . 5 0.05 . 5 0.06 . 6 . 6 . 7 0.07 . 8 0.09

20 40 60 80 100 20 40 60 80 100 yseq 0.02 0.04 0.06 0.08 0.10 0.12

0.025 0.03 0.035 0.04 0.04 . 4 0.045 . 4 5 0.045 0.05 0.05 . 5 0.055 0.055 . 5 5 0.06 . 6 0.065 0.07 0.075 . 8 0.085

  • Std. error

(ˆ β − β)/std. error

20 40 60 80 100 20 40 60 80 100 yseq 0.02 0.03 0.04 0.05 0.06

0.015 . 2 0.025 0.025 0.03 0.03 . 3 . 3 5 0.035 0.035 0.035 0.035 . 3 5 . 3 5 0.04 0.04 0.04 . 4 . 4 . 4 0.045 . 5 . 5 5

20 40 60 80 100 20 40 60 80 100 yseq −2 −1 1 2 3 4

−0.01 −0.01 − . 1 −0.01 −0.005 −0.005 −0.005 0.005 . 5 0.005 0.005 0.01 0.01 . 1 0.01 0.015 0.015 . 2

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-30
SLIDE 30

30/35

Posterior distribution: β dependence

a

1 2 3 4 5 5000 10000 15000

b

1.0 1.5 2.0 2000 4000 6000 8000

mi vs ˆ β(x) E[mi] vs a +

b

ˆ β

0.04 0.06 0.08 10 20 30 40 50 60 70 sim_par[, 6] * (sim_par[, 4] + sim_par[, 5]/sqrt(beta.seq2))

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-31
SLIDE 31

31/35

Hainich data

Data: Location of 650 trees marked by dbh in a 118.5m × 93.75m

  • region. The trees belong to a mixed broad-leaved forest in Hainich

in Western Thuringia (Germany), as so-called selection forest (Plenterwald).

20 40 60 80 100 120 20 40 60 80 100 0.02 0.04 0.06 0.08 0.10 10 20 30 40 50 60 70 estimated intensity marks

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-32
SLIDE 32

32/35

Posterior distribution: Interaction

γ R c

0.0 0.2 0.4 0.6 0.8 1.0 1000 2000 3000 4000 5000 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 5000 10000 15000 20000 25000 2.0 2.2 2.4 2.6 2.8 3.0 1000 2000 3000 4000 5000 6000

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-33
SLIDE 33

33/35

Posterior distribution: β

Data Posterior mean β:

  • Std. error

20 40 60 80 100 120 20 40 60 80 100 V2 20 40 60 80 100 20 40 60 80 100 yseq 0.03 0.04 0.05 0.06 0.07 0.08

0.04 . 4 . 4 . 4 5 . 4 5 . 4 5 . 5 . 5 . 5 5 . 6 0.065 0.07

20 40 60 80 100 20 40 60 80 100 yseq 0.015 0.020 0.025 0.030 0.035 0.040

0.015 . 2 0.02 0.02 . 2 0.025 . 2 5 0.025 0.025 0.025 0.025 0.03 . 3 0.03 0.035

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-34
SLIDE 34

34/35

Posterior distribution: β dependence

a

2 4 6 8 2000 4000 6000 8000

b

0.0 0.5 1.0 1.5 2.0 2.5 2000 4000 6000 8000 10000

a

0.03 0.04 0.05 0.06 0.07 0.08 0.09 10 20 30 40 50 60 70 mean(theta[, 7]) * (mean(theta[, 1]) + mean(theta[, 2])/sqrt(beta.seq2))

Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen

slide-35
SLIDE 35

35/35

Berthelsen, K. K. and Møller, J. (2008). Non-parametric bayesian inference for inhomogeneous Markov point processes. Australian & New Zealand Journal of Statistics, 50:257–272. Diggle, P. J., Menezes, R., and Su, T.-L. (2010). Geostatistical inference under preferential sampling. Journal of the Royal Statistical Society, Ser. B, 59:191–232. Hahn, U., Jensen, E. B. V., van Lieshout, M.-C., and Nielsen, L. S. (2003). Inhomogeneous spatial point processes by location dependent scaling.

  • Adv. Appl. Prob., 35:319–336.

Ho, L. P. and Stoyan, D. (2008). Modelling marked point patterns by intensity-marked cox processes. Statistics and Probability Letters, 78:1194–1199. Kendall, W. S. and Møller, J. (2000). Perfect simulation using dominating processes on ordered spaces, with application to locally stable point processes.

  • Adv. Appl. Prob., 32:844–865.

Møller, J., Pettitt, A. N., Reeves, R., and Berthelsen, K. K. (2006). An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93:451–458. Murray, I., Ghahramani, Z., and MacKay., D. J. C. (2006). MCMC for doubly-intractable distributions. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), Arlington,

  • Virginia. AUAI Press.

Myllym¨ aki, M. and Penttinen, A. (2009). Pseudo Bayesian inference for intensity-dependent point processes Kasper K. Berthelsen