Likelihood inference in complex settings Nancy Reid with Uyen - - PowerPoint PPT Presentation

likelihood inference in complex settings
SMART_READER_LITE
LIVE PREVIEW

Likelihood inference in complex settings Nancy Reid with Uyen - - PowerPoint PPT Presentation

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods Likelihood inference in complex settings Nancy Reid with Uyen Hoang, Wei Lin, Ximing Xu 1 / 30 Likelihood inference for simple


slide-1
SLIDE 1

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Likelihood inference in complex settings

Nancy Reid with Uyen Hoang, Wei Lin, Ximing Xu

1 / 30

slide-2
SLIDE 2

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

2 / 30

slide-3
SLIDE 3

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Why likelihood?

  • likelihood function depends on data only through sufficient

statistics

  • “likelihood map is sufficient”

Fraser & Naderi, 2006

  • provides summary statistics with known limiting distribution
  • leading to approximate pivotal functions, based on normal

distribution

  • in some models the likelihood function gives exact

inference

  • “likelihood function as pivotal”

Hinkley, 1980

  • likelihood function + sample space derivative gives better

approximate inference

3 / 30

slide-4
SLIDE 4

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Summary statistics and approximate pivotals

  • model

f(y; θ), y ∈ Rn, θ ∈ Rd

  • log-likelihood function

ℓ(θ; y) = log f(y; θ) + a(y)

  • score function

u(θ) = ∂ℓ(θ; y)/∂θ

  • maximum likelihood estimate

ˆ θ = arg supθ ℓ(θ; y)

  • log-likelihood ratio

w(θ) = 2{ℓ(ˆ θ; y) − ℓ(θ; y)}

4 / 30

slide-5
SLIDE 5

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Approximate pivotals

√n(ˆ θ − θ) . ∼ Nd{0, j−1(ˆ θ )} w(θ) = 2{ℓ(ˆ θ ) − ℓ(θ)} . ∼ χ2

d

1 √nU(θ) . ∼ Nd{0, j(ˆ θ )} 1 √nU(θ)

L

− → Nd{0, I(θ)} j(ˆ θ ) = −ℓ′′(ˆ θ )/n I(θ) = E{j(θ)}

5 / 30

slide-6
SLIDE 6

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

...approximate pivotals

16 17 18 19 20 21 22 23 −4 −3 −2 −1

log−likelihood function

θ log−likelihood 6 / 30

slide-7
SLIDE 7

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

...approximate pivotals

16 17 18 19 20 21 22 23 −4 −3 −2 −1

log−likelihood function

θ log−likelihood

θ

7 / 30

slide-8
SLIDE 8

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

...approximate pivotals

16 17 18 19 20 21 22 23 −4 −3 −2 −1

log−likelihood function

θ log−likelihood

θ θ − θ

8 / 30

slide-9
SLIDE 9

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

...approximate pivotals

16 17 18 19 20 21 22 23 −4 −3 −2 −1

log−likelihood function

θ log−likelihood

θ θ − θ

9 / 30

slide-10
SLIDE 10

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

...approximate pivotals

16 17 18 19 20 21 22 23 −4 −3 −2 −1

log−likelihood function

θ log−likelihood

θ θ − θ

1.92 w/2 10 / 30

slide-11
SLIDE 11

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

...approximate pivotals

w(θ) = 2{ℓ(ˆ θ) − ℓ(θ)} . ∼ χ2

d

(a) M (a) M (a)

m

  • 4
  • 4

M

  • 4

M

  • 4
  • 3
  • 3

M

  • 3

M

  • 3
  • 2
  • 2

M

  • 2

M

  • 2
  • 1
  • 1

M

  • 1

M

  • 1

M M 1 M 1 M 1 2 M 2 M 2 m

  • 4
  • 4

M

  • 4

M

  • 4
  • 3
  • 3

M

  • 3

M

  • 3
  • 2
  • 2

M

  • 2

M

  • 2
  • 1
  • 1

M

  • 1

M

  • 1

M M 1 M 1 M 1 2 M 2 M 2 11 / 30

slide-12
SLIDE 12

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Likelihood as pivotal

  • Example: location model f(y; θ) = n

i=1 f0(yi − θ),

θ ∈ R

  • Fisher (1934)

f(ˆ θ | a; θ) = exp{ℓ(θ; y)}

  • exp{ℓ(θ; y)}dθ
  • (y1, . . . , yn) ←

→ (ˆ θ, a1, . . . , an) ai = yi − ˆ θ

  • exact (conditional) distribution of maximum likelihood

estimator given by renormalized likelihood function

  • p∗ approximation:

p∗(ˆ θ | a; θ) = c(θ, a)|j(ˆ θ)|1/2 exp{ℓ(θ; ˆ θ, a) − ℓ(ˆ θ; ˆ θ, a)}

12 / 30

slide-13
SLIDE 13

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

A simpler approach

  • avoid

(y1, . . . , yn) ← → (ˆ θ, a)

  • define a derivative

ϕ(θ) ≡ ℓ;V(θ; y0) = ∂ ∂V(y)ℓ(θ; y)

  • y=y0
  • a directional derivative on the sample space
  • along with ℓ(θ; y0) the observed log-likelihood function
  • can be extended to derivative of mean likelihood – usable

in wider context

Fraser/R Bka 2009

13 / 30

slide-14
SLIDE 14

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Tangent exponential model

  • A continuous model f(y; θ) on Rn can be approximated by

an exponential family model on Rd: fTEM(s; θ)ds = exp{ϕ(θ)′s + ℓ0(θ)}h(s)ds (1)

  • s is a score variable on Rd

s(y) = −ℓϕ(ˆ θ0; y)

  • ℓ0(θ) = ℓ(θ; y0) is the observed log-likelihood function
  • ϕ(θ) = ϕ(θ; y0) is the directional derivative ℓ;V(θ; y0)
  • (1) approximates original model to O(n−1)
  • gives approximation to the p-value for testing θ
  • p-value is accurate to O(n−3/2)

14 / 30

slide-15
SLIDE 15
  • 4
  • 2

2 4 0.05 0.10 0.15 0.20 0.25 0.30

Cauchy density and TEM approximation

y density

slide-16
SLIDE 16

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Example: microscopic fluorescence

  • “tracking of microscopic fluorescent particles attached to

biological specimens”

Hughes et al., AOAS, 2010

  • “CCD (charge-coupled device) camera attached to a

microscope used to observe the specimens repeatedly”

  • “we introduce an improved technique for analyzing such

images over time”

  • Model for counts:

Zi ∼ N(fi, fi+ψ), fi ≃ B+

  • j

Aj exp

  • −(xi − xj)2 + (yi − yj)2

S2

  • fi developed from a model for photon emission; Normal

approximation to Poisson; ψ catches the instrument error

16 / 30

slide-17
SLIDE 17

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

... microscopic fluorescence

  • “Our method, which applies maximum likelihood principles,

improves the fit to the data, derives accurate standard errors from the data with minimal computation, and uses model-selection criteria to “count” the fluorophores in an image”

  • “likelihood ratio tests are used to select the final model”
  • potential for improved inference using likelihood methods?

17 / 30

slide-18
SLIDE 18

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

... a simpler model

Yi ∼ N(µi, µi + ψ), µi = exp(β0 + β1xi) approximate pivot r ∗ constructed from ℓ(θ; y0), ϕ(θ; y0) should follow a N(0, 1) distribution – simulations

  • 3
  • 2
  • 1

1 2 3

  • 3
  • 2
  • 1

1 2 3

Normal Q-Q Plot

Sample Quantiles 18 / 30

slide-19
SLIDE 19

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

More realistic models

  • for example for analytic inferences for survey data
  • stochastic processes in space or space-time
  • extremes in several dimensions
  • frailty models in survival data
  • longitudinal data
  • family-based genetic data and other forms of clustering
  • estimation of recombination rates from SNP data
  • ...

19 / 30

slide-20
SLIDE 20

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Example: Gaussian random field

  • scalar output y at p−dimensional input x = (x1, . . . , xp)
  • y(x) = φ(x)Tβ + Z(x),

Z(x) Gaussian process on Rp

  • Cov{Z(x1), Z(x2)} = σ2

p

  • i=1

R(|x1i − x2i|; θ)

  • R(|x1i − x2i|) = exp{−γi|x1i − x2i|α}
  • anisotropic covariance matrix for inputs on different scales
  • application to computer experiments

Ximing Xu,U Toronto; Derek Bingham, SFU

20 / 30

slide-21
SLIDE 21

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

... Gaussian random field

yn = (y1, . . . , yn) = {y(x1), . . . , y(xn)}, at n locations xi in Rp ℓ(β, σ, θ) = −1 2{n log σ2+log |R(θ)|+ 1 σ2 (yn−Φβ)TR−1(θ)(yn−Φβ)}, computation of R−1 is O(n3), n typically 100s or 1000s solution – make the correlation matrix sparse solution – simplify the likelihood function

21 / 30

slide-22
SLIDE 22

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Example: spatial GLM

  • generalized linear geostatistical model

E{Y(x) | Z(x)} = g{φ(x)Tβ + Z(x)}, x ∈ R2 or R3

  • random intercept Z(x) a stationary Gaussian process
  • observed at n locations y(xi), i = 1, . . . , n
  • joint density

f(y; θ) =

  • Rn

n

  • i=1

f(yi | zi; θ)f(z; θ)dz1 . . . dzn

  • all random effects are correlated
  • simulation methods to evaluate integral – MCMC, etc.
  • simplify the likelihood function using bivariate integrals

22 / 30

slide-23
SLIDE 23

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Composite likelihood

  • an m-dimensional vector variable Y with model f(y; θ)
  • a set of marginal or conditional events {A1, . . . , AK} with

associated “sub” log-likelihood ℓk(θ; y) = log f(y ∈ Ak) + a(y)

  • composite log-likelihood

ℓC(θ; y) =

K

  • k=1

ℓk(θ; y) + a

  • inference function obtained by pretending sub-models are

independent Lindsay, 1988

  • a set of non-negative weights w1, . . . , wk
  • ℓC(θ; y) = K

i=1 wkℓk(θ; y)

23 / 30

slide-24
SLIDE 24

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

... composite likelihood

  • Example: pairwise log-likelihood

ℓpair(θ) =

m

  • r=1
  • s>r

log f2(yr, ys; θ)

  • Example: Besag’s pseudo-likelihood

ℓpseudo(θ) =

m

  • r=1

log f(yr | {ys : ys neighbour of yr}; θ)

  • Example: Gaussian random field,

σ2 = 1 −1 2

n−1

  • r=1

n

  • s=r+1
  • log |Rr,s| + (yr,s − Φr,sβ)TR−1

r,s (yr,s − Φr,sβ)

  • ,
  • yr,s = (yr, ys), with 2 × 2 correlation matrix Rr,s

24 / 30

slide-25
SLIDE 25

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Estimation from composite likelihood

  • ℓC(θ) = K

k=1 ℓk(θ; y)

  • UC(θ) = ℓ′

C(θ) is an unbiased estimating function

  • estimate ˆ

θC from UC(ˆ θc) = 0 is asymptotically normally distributed: ˆ θC

.

∼ N{θ, G−1(θ)}

  • asymptotic variance given by Godambe information

G(θ) = E{−U′

C(θ)}Var{UC(θ)}E{−U′ C(θ)}

25 / 30

slide-26
SLIDE 26

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Inference from composite likelihood

  • inference function ℓC(θ)
  • “log-likelihood ratio statistic”

wC(θ) = 2{ℓC(ˆ θC) − ℓC(θ)}

  • complicated asymptotic distribution

wC(θ) . ∼

d

  • i=1

λiχ2

1i

  • λ are eigenvalues of H−1(θ)G(θ)
  • H(θ) = E{−U′

C(θ)}; G(θ) = H(θ)J−1(θ)H(θ)

  • rescaling based on score function can restore χ2

d

distribution for wC

Pace, Salvan, Sartori, 2011

26 / 30

slide-27
SLIDE 27

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Connections to inference from surveys?

  • descriptive parameters defined through estimating

equation

i∈P Ui(θP) = 0

  • estimating equation might be motivated by model, e.g.

superpopulation model

  • “model assisted inference”
  • estimating equation from sample n

i=1 wiUi(ˆ

θ) = 0

  • for example, wi = 1/πi or wi = 1/(πiqi)
  • sandwich estimate of variance
  • it’s all in the weights...

Wei Lin, Changbao Wu

27 / 30

slide-28
SLIDE 28

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Guidance from composite likelihood?

  • in composite likelihood inference, some surprises
  • optimal weights may be non-computable
  • or even negative

Lindsay, Yi, Sun

  • choice of sub-likelihoods needs some care
  • in some models including more sub-likelihood terms leads

to poorer inference

  • in some models including higher dimensional

sub-components leads to poorer inference Ximing Xu

  • both choice of weights and choice of component

likelihoods needs care

28 / 30

slide-29
SLIDE 29

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Approximate likelihood inference in survey inference

  • example: empirical likelihood for nonparametric models
  • ℓ(F) = log pi, with constraints

pi > 0, pi = 1, piyi = θ

  • for inference about θ = EF(Y), or more generally for

parameters defined by estimating functions

  • Chen, Sitter, Wu: pseudo-empirical likelihood
  • design assisted modelling
  • replace log pi by log piwi, and constraint by

post-stratification such as n

i=1 pixi = ¯

XP

  • confidence intervals using a profile pseudo-empirical

likelihood

  • needs adjustment to have asymptotic χ2 distribution
  • rescaling by the design effect

29 / 30

slide-30
SLIDE 30

Likelihood inference for simple problems Higher order approximation Harder problems Approximations to likelihoods

Likelihood for complex models

  • Approximate Bayesian Computation
  • “an essential tool for the analysis of complex stochastic

models”

Robert et al. 2011 PNAS

  • generate θ′ from the prior π(θ)
  • generate z from the model p(z | θ′)
  • compare S(z) to S(y) using some distance measure

ρ{S(z), S(y)}; if ρ < ǫ then θ′ is a sample from the posterior π(θ | y)

  • actually from π(θ | y, z), but this is assume ≈ π(θ | y)
  • Robert et al. show that the method can be poor if “S(·) is

far from sufficient”

  • especially for choosing between models

30 / 30