Log-Gaussian Cox Process for London crime data Jan Povala with - - PowerPoint PPT Presentation

log gaussian cox process for london crime data
SMART_READER_LITE
LIVE PREVIEW

Log-Gaussian Cox Process for London crime data Jan Povala with - - PowerPoint PPT Presentation

Log-Gaussian Cox Process for London crime data Jan Povala with Louis Ellam Dr Seppo Virtanen Prof Mark Girolami July 24, 2018 Outline Motivation Methodology Results Current work, Next steps Motivation 2 Aims and Objectives


slide-1
SLIDE 1

Log-Gaussian Cox Process for London crime data

Jan Povala

with

Louis Ellam Dr Seppo Virtanen Prof Mark Girolami July 24, 2018

slide-2
SLIDE 2

Outline

Motivation Methodology Results Current work, Next steps

Motivation 2

slide-3
SLIDE 3

Aims and Objectives

◮ Modelling of crime and short-term forecasting. ◮ Two stages:

  • 1. Inference - what is the underlying process that generated the
  • bservations?
  • 2. Prediction - use the inferred process’s properties to forecast future

values.

Motivation 3

slide-4
SLIDE 4

Burglary

Motivation 4

5 10 15 20 25 30 35 40 45

slide-5
SLIDE 5

Theft from the person

Motivation 5

20 40 60 80 100 120 140 160

slide-6
SLIDE 6

Outline

Motivation Methodology Results Current work, Next steps

Methodology 6

slide-7
SLIDE 7

Cox Process

Cox process is a natural choice for an environmentally driven point process (Diggle et al., 2013).

Definition

Cox process Y(x) is defined by two postulates:

  • 1. Λ(x) is a nonnegative-valued stochastic process;
  • 2. conditional on the realisation λ(x) of the process Λ(x), the point

process Y(x) is an inhomogeneous Poisson process with intensity λ(x).

Methodology 7

slide-8
SLIDE 8

Log-Gaussian Cox Process

◮ Cox process with intensity driven by a fixed component Z⊤ x β and a

latent function f(x): Λ(x) = exp

  • Z⊤

x β + f(x)

  • ,

where f(x) ∼ GP(0, kθ(·, ·)), Zx are socio-economic indicators, and β are their coefficients.

◮ Discretised version of the model:

yi ∼ Poisson

  • exp
  • Z⊤

xiβ + f(xi)

  • .

Methodology 8

slide-9
SLIDE 9

Inference

We would like to infer the posterior distributions of β, θ, and f: p(f, β, θ|y) = p(y|f, β)p(f|θ)p(θ)p(β) p(y) , where p(y) =

  • p(y|f, β)p(f|θ)p(β)p(θ)dθdβdf,

which is intractable.

Solutions

  • 1. Laplace approximation
  • 2. Markov Chain Monte Carlo sampling
  • 3. . . .

Methodology 9

slide-10
SLIDE 10

Markov Chain Monte Carlo (MCMC)

◮ Sampling from the joint posterior distribution:

p(f, β, θ|y) ∝ p(y|f, β)p(f|θ)p(θ)p(β), using Hamiltonian Monte Carlo (HMC).

◮ Challenges:

– θ, f, and β are strongly correlated. – High dimensionality of f - every iteration requires the inverse and the determinant of K. – Choosing the mass matrix in the HMC algorithm.

Methodology 10

slide-11
SLIDE 11

Computation

Flaxman et al. (2015), Saatçi (2012)

◮ The calculations above require O

  • n3
  • perations and O
  • n2

space.

◮ Cheaper linear algebra available if separable kernel functions are

assumed, e.g. in D = 2 dimensions: k((x1, x2), (x′

1, x′ 2)) = k1(x1, x′ 1)k2(x2, x′ 2)

implies that K = K1 ⊗ K2.

◮ Applying the above properties, the inference can be performed using

O

  • Dn

D+1 D

  • perations and O
  • Dn

2 D

  • space.

Methodology 11

slide-12
SLIDE 12

Outline

Motivation Methodology Results Current work, Next steps

Results 12

slide-13
SLIDE 13

Experiment Model

◮ Factorisable covariance function (product of two Matérns). ◮ Uninformative prior for θ. ◮ N(0, 10I) prior for β.

Dataset

◮ Burglary, Theft from the person data for 2016. ◮ Grid: 59x46, one cell is an area of 1km by 1km. ◮ Missing locations are treated with a special noise model.

Inferred random variables

◮ Coefficients (β) for various socio-economic indicators. ◮ Two hyperparameters θ: lengthscale(ℓ), marginal variance (σ2). ◮ Latent field f.

Results 13

slide-14
SLIDE 14

Socio-economic indicators

3 4 5 6 1000

intercept

0.005 0.010 0.015 1000

pop_density

0.020 0.015 0.010 0.0050.000 2000

dwelling_houses_proportion

0.04 0.02 0.00 1000

immigrants_proportion

0.00000 0.00002 0.00004 0.00006 1000

median_household_income

0.10 0.05 2000

age_median

0.01 0.00 0.01 0.02 0.03 1000

edu_4_and_above_proportion

5 5 1000

night_economy_places_per_pop

burglary theft-from-the-person

Results 14

slide-15
SLIDE 15

Hyperparameters

0.25 0.00 0.25 0.50 0.75 250 500 750 1000 1250 1500 1750 2000

log variance

0.4 0.2 0.0 500 1000 1500 2000

log lengthscale

burglary theft-from-the-person

Results 15

slide-16
SLIDE 16

Latent field - Burglary

3 2 1 1 2

(a) mean

0.2 0.4 0.6 0.8 1.0

(b) standard deviation

Results 16

slide-17
SLIDE 17

Latent field - Theft from the person

1.5 0.0 1.5 3.0 4.5

(c) mean

0.25 0.50 0.75 1.00 1.25

(d) standard deviation

Results 17

slide-18
SLIDE 18

Model Fit - RMSE

We compare our model with inferences made using Poisson regression (GLM) using the root mean square error metric:

Burglary

MCMC 6.59224 GLM 30.39759

Theft from the person

MCMC 4.71420 GLM 69.61551

Results 18

slide-19
SLIDE 19

Discussion

◮ Effects missing in the GLM model are spatially correlated. This

could imply two possibilities:

– Model is missing a covariate that is spatially correlated. – The true process driving criminal activity is spatially correlated.

◮ Socio-economic indicators from the census data are ’static’ and

might struggle to explain more ’dynamic’ crime types, e.g. burglary

  • vs. violence against person.

Results 19

slide-20
SLIDE 20

Outline

Motivation Methodology Results Current work, Next steps

Current work, Next steps 20

slide-21
SLIDE 21

Next steps

◮ Benchmark against INLA (Lindgren, Rue, and Lindström, 2011). ◮ Looking at a possibility to extend it into spatio-temporal case.

Current work, Next steps 21

slide-22
SLIDE 22

Bibliography I

Diggle, Peter J. et al. (2013). “Spatial and Spatio-Temporal Log-Gaussian Cox Processes: Extending the Geostatistical Paradigm”.

  • en. In: Statistical Science 28.4, pp. 542–563. ISSN: 0883-4237. DOI:

10.1214/13-STS441. URL: http://projecteuclid.org/euclid.ss/1386078878. Flaxman, Seth et al. (2015). “Fast Kronecker Inference in Gaussian Processes with non-Gaussian Likelihoods”. In: Proceedings of the 32nd International Conference on International Conference on Machine

  • Learning. Vol. 37. ICML’15. Lille, France: JMLR.org, pp. 607–616.

Bibliography 22

slide-23
SLIDE 23

Bibliography II

Lindgren, Finn, Håvard Rue, and Johan Lindström (2011). “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach”. en. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73.4,

  • pp. 423–498. ISSN: 1467-9868. DOI:

10.1111/j.1467-9868.2011.00777.x. URL: http://onlinelibrary.wiley.com/doi/10.1111/j.1467- 9868.2011.00777.x/abstract. Saatçi, Yunus (2012). “Scalable inference for structured Gaussian process models”. PhD Thesis. Citeseer. Wilson, Andrew Gordon et al. (2014). “Fast Kernel Learning for Multidimensional Pattern Extrapolation”. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14. Cambridge, MA, USA: MIT Press, pp. 3626–3634. URL: http://dl.acm.org/citation.cfm?id=2969033.2969231.

Bibliography 23

slide-24
SLIDE 24

β traceplots

4 5

intercept

0.0050 0.0075

pop_density

0.005 0.000

dwelling_houses_proportion

0.03 0.02 0.01

immigrants_proportion

0.000000 0.000025

median_household_income

0.050 0.025

age_median

10000 20000 0.01 0.00 0.01

edu_4_and_above_proportion

10000 20000 5 5

night_economy_places_per_pop Extra slides 24

slide-25
SLIDE 25

θ traceplots

0.2 0.0 0.2

log variance

5000 10000 15000 20000 25000 0.15 0.10 0.05 0.00 0.05 0.10

log lengthscale Extra slides 25

slide-26
SLIDE 26

f traceplots

2

Component 1188

1

Component 918

2

Component 191

5000 10000 15000 20000 25000 1.0 1.5

Component 775 Extra slides 26

slide-27
SLIDE 27

Laplace Approximation

Flaxman et al. (2015)

◮ For simplicity, we assume non-parametric model (no fixed term), and

treat θ as a point estimate got by maximising marginal likelihood.

◮ Approximate the posterior distribution of the latent surface by:

p(f|y, θ) ≈ N

  • ˆ

f, −

  • ∇∇Ψ(f)|ˆ

f

−1 , where Ψ(f) := log p(f|y, θ)

const

= log p(y|f, θ) + log p(f|θ) is unnormalised log posterior, and ˆ f is the mode of the distribution.

◮ Newton’s method to find ˆ

f.

Extra slides 27

slide-28
SLIDE 28

Matérn Covariance Function

k(r) = 21−ν Γ(ν) √ 2νr ℓ ν Kν √ 2νr ℓ

  • We fix ν = 2.5 as it is difficult to jointly estimate ℓ and ν due to

identifiability issues.

Extra slides 28

slide-29
SLIDE 29

Kronecker Algebra

Saatçi (2012)

◮ Matrix-vector multiplication (⊗dAd) b in O(n) time and space. ◮ Matrix inverse: (A ⊗ B)−1 = A−1 ⊗ B−1 ◮ Let Kd = QdΛdQ⊤ d be the eigendecomposition of Kd. Then, the

eigendecomposition of K = ⊗dKd is given by QΛQ⊤, where Q = ⊗dQd, and Λ = ⊗dΛd. The number of steps required is O

  • Dn

3 D

  • .

Extra slides 29

slide-30
SLIDE 30

Incomplete grids

Wilson et al. (2014)

We have that yi ∼ Poisson(exp(fi)). For the points of the grid that are not in the domain, we let yi ∼ N(fi, ǫ−1) and ǫ → 0. Hence, p(y|f) =

  • i∈D
  • efiyi e−efi

yi!

  • i/

∈D

1 √ 2πǫ−1 e

−ǫ(yi−fi)2 2

The log-likelihood is thus:

  • i∈D

[yifi − exp(fi) + const] − 1 2

  • i/

∈D

ǫ(yi − fi)2 We now take the gradient of the log-likelihood as ∇ log p(y|f)i =

  • yi − exp(fi),

if i ∈ D ǫ(yi − fi), if i / ∈ D and the hessian of the log-likelihood as ∇∇ log p(y|f)ii =

  • − exp(fi),

if i ∈ D −ǫ if i / ∈ D .

Extra slides 30