Compound Random Measures Jim Griffin (joint work with Fabrizio - - PowerPoint PPT Presentation

compound random measures
SMART_READER_LITE
LIVE PREVIEW

Compound Random Measures Jim Griffin (joint work with Fabrizio - - PowerPoint PPT Presentation

Compound Random Measures Jim Griffin (joint work with Fabrizio Leisen) University of Kent Introduction: Two clinical studies CALGB8881 CALGB9160 3 3 2 2 1 1 1 1 0 0 1 1 5 0 5 5 0 5 0 0 Infinite mixture


slide-1
SLIDE 1

Compound Random Measures

Jim Griffin (joint work with Fabrizio Leisen)

University of Kent

slide-2
SLIDE 2

Introduction: Two clinical studies

−5 5 −1 1 2 3 β0 β1 CALGB8881 −5 5 −1 1 2 3 β0 β1 CALGB9160

slide-3
SLIDE 3

Infinite mixture models This data could be analysed using two infinite mixture models CALGB8881: f1(y) =

  • j=1

w(1)

j

k(y|θj) and CALGB9160: f2(y) =

  • j=1

w(2)

j

k(y|θj) where

  • w(k)

j

> 0 for j = 1, 2, . . . and ∞

j=1 w(k) j

= 1 for k = 1, 2.

  • k(y|θ) is a p.d.f. for y with parameter θ.

We need to put a prior on the w(1), w(2) and θ (random probability measure).

slide-4
SLIDE 4

Some dependent random probability measures: stick-breaking θ are i.i.d. and w(k)

j

= V (k)

j

  • i<j
  • 1 − V (k)

i

  • Hierarchical Dirichlet Process (Teh et al, 2006):

V (k)

j

∼ Be

  • α0βj, α0
  • 1 − j

l=1 βl

  • ,

β′

j ∼ Be(1, γ),

βj = β′

j j−1

  • l=1

(1 − β′

l),

  • Probit stick-breaking processes, etc.:
  • V (1)

j

, V (2)

j

  • are

correlated and independent of

  • V (1)

i

, V (2)

i

  • for i = j.
slide-5
SLIDE 5

Completely random measures ˜ µ is a completely random measure (CRM) on Θ if, for any disjoint subsets A1, . . . , An, ˜ µ(A1) . . . , ˜ µ(An) are mutually independent. We concentrate on completely random measures (CRM’s) which can be represented in terms of jump sizes Ji and jump locations θi as ˜ µ =

  • i=1

Jiδθi where δ is Dirac’s delta function and have Lévy-Khintchine representation E

  • e−
  • f(θ)˜

µ(dθ)

= e−

∞ [1−e−sf(θ)]α(dθ)ρ(ds)

where α and ρ are measures for which

  • α(dθ) < ∞.
slide-6
SLIDE 6

Completely random measures Poisson process with intensity α(dθ)ρ(ds).

0.2 0.4 0.6 0.8 1

3

0.5 1 1.5

J

slide-7
SLIDE 7

Examples of CRM’s Many processes that we use in Bayesian nonparametrics are CRM’s

  • Gamma process - ρ(ds) = s−1 exp{−s} ds.
  • Beta process - ρ(ds) = βs−1(1 − s)β−1 ds.
  • r can be derived from CRM’s
  • Normalizing a Gamma process, i.e. taking ˜

p = ˜ µ/˜ µ(Θ), leads to a Dirichlet process.

  • A beta process prior for p1, p2, . . . can be used to define an

Indian buffet process.

slide-8
SLIDE 8

Vectors of CRMs It is useful to define d related CRM’s. Suppose that ˜ µ1, . . . , ˜ µd are CRM’s on Θ with marginal Lévy intensities ¯ νj(ds, dθ) = νj(ds)α(dθ) Then ˜ µ1, . . . , ˜ µd are a vector of CRM’s if there is a Lévy-Khintchine representation of the form E

  • e−˜

µ(f1)−···−˜ µd(fd)

= e−ψ⋆

ρ,d(f1,...,fd)

where

ψ⋆

ρ,d(f1, . . . , fd) =

  • (R+)d

1 − e−s1f1(θ)−···−sdfd(θ) α(dθ)ρd(ds1, . . . dsd)

and νj(ds) =

  • ρd(ds1, . . . , dsd).
slide-9
SLIDE 9

Compound Random Measures: Definition A compound random measure (CoRM) is a vector of CRM’s with intensity ρd(ds1, . . . , dsd) =

  • z−dh

s1 z , . . . , sd z

  • ds1 . . . dsd ν⋆(dz)

where

  • s1, . . . , sd are called scores.
  • H is a score distribution with density h.
  • ν⋆ is the Lévy intensity of a directing Lévy process.

which satisfies the condition

  • min(1, s )z−dh

s1

z , . . . , sd z

  • ν⋆(dz) < ∞ where s is the

Euclidean norm of the vector s = (s1, . . . , sd).

slide-10
SLIDE 10

A representation of a CoRM Realizations of a CoRM can be expressed as ˜ µj =

  • i=1

mj,i Ji δθi where

  • m1,i, . . . , md,i

i.i.d.

∼ H

  • ˜

η = ∞

i=1 Ji δθi is a CRM with Lévy intensity ν⋆(ds) α(dθ).

slide-11
SLIDE 11

CoRMs with independent gamma scores We will concentrate on the class of CoRMs for which h(s1/z, . . . , sd/z) =

d

  • j=1

f(sj/z) where f is the p.d.f. of a gamma distribution with shape φ, f(x) =

1 Γ(φ)xφ−1 exp{−x}.

slide-12
SLIDE 12

Properties of CoRMs with independent score distributions

  • The Lévy copula can be expressed as a univariate integral.
  • Let Mf

z(t) =

  • etsz−1f(s/z)ds be the moment generating

function of z−1f(s/z) then

ψρ,d(λ1, . . . , λd) =

  • (R+)d

1 − e−s1λ1−···−sdλd ρd(ds1, . . . dsd) = ψρ,d(λ1, . . . , λd) =  1 −

d

  • j=1

Mf

z(−λj)

  ν⋆(z)dz

  • This expression can be used to calculate quantities such as

Corr(˜ µk(A), ˜ µm(A)).

slide-13
SLIDE 13

CoRMs with gamma distributed scores Consider a CoRM process with independent Ga(φ, 1) distributed scores. If the CoRM process has gamma process marginals then ρd(s1, . . . , sd) = (d

j=1 sj)φ−1

[Γ(φ)]d−1 |s|− dφ+1

2 e− |s| 2 W (d−2)φ+1 2

,− dφ

2 (|s|)

(1) where |s| = s1 + · · · + sd and W is the Whittaker function. If the CoRM process has σ-stable process marginals then ρd(s1, . . . , sd) = (d

j=1 sj)φ−1

[Γ(φ)]d−1 σΓ(σ + dφ) Γ(σ + φ)Γ(1 − σ)|s|−σ−dφ. (2)

slide-14
SLIDE 14

CoRMs with exponentially distributed scores Consider a CoRM process with independent exponentially distributed scores. If the CoRM has gamma process marginals we recover the multivariate Lévy intensity of Leisen et al (2013), ρd(s1, . . . , sd) =

d−1

  • j=0

(d − 1)! (d − 1 − j)!|s|−j−1e−|s|. Otherwise, if σ-stable marginals are considered then we recover the multivariate vector introduced in Leisen and Lijoi (2011) and Zhu and Leisen (2014), ρd(s1, . . . , sd) = (σ)d Γ(1 − σ)|s|−σ−d.

slide-15
SLIDE 15

CoRMs with independent gamma scores: specific marginals The Lévy intensity of ˜ µj νj(ds) =

  • z−1f(s/z)ds ν⋆(dz) = ν(ds).

If we have independent gamma scores, the directing Lévy intensity ν⋆ is linked to the marginal Lévy intensity by ν⋆ 1 t

  • = t2−φL−1

Γ(φ) sφ−1 ν(s)

  • (t)

where L−1 is the inverse Laplace transform.

slide-16
SLIDE 16

CoRMs with independent gamma scores: specific marginals The intensity of the directing Lévy process is ν⋆(z) = z−1(1 − z)φ−1, 0 < z < 1 leads to a marginal gamma process for which ν(s) = s−1 exp{−s}, s > 0 Remarks

  • ν⋆ is the the Lévy intensity of a beta process.
  • If ν⋆ is the Lévy intensity of a Stable-Beta process (Teh

and Görür, 2009), the marginal process is a generalized gamma process.

slide-17
SLIDE 17

NCoRM: Gamma marginal, φ = 1 DLP

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

  • Dim. 1
  • Dim. 2
  • Dim. 3

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

slide-18
SLIDE 18

NCoRM: Gamma marginal, φ = 10 DLP

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01

  • Dim. 1
  • Dim. 2
  • Dim. 3

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.02 0.04 0.06 0.08 0.1 0.12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.02 0.04 0.06 0.08 0.1 0.12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.02 0.04 0.06 0.08 0.1 0.12

slide-19
SLIDE 19

NCoRM: Gamma marginal, φ = 50 DLP

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 x 10

−3
  • Dim. 1
  • Dim. 2
  • Dim. 3

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

slide-20
SLIDE 20

Marginal beta process A CoRM with beta process marginals (ν(s) = βs−1(1 − s)β−1) can be constructed using

  • A beta score distribution with parameters α and 1
  • A directing Lévy intensity

ν⋆(z) = βz−1(1 − z)β−1 + β(β − 1) α (1 − z)β−2 i.e. a superposition of a beta process and a compound Poisson process with beta jump distribution.

slide-21
SLIDE 21

Links to other processes Other processes can be expressed as CoRM’s:

  • Superpositions/Thinning: e.g. Griffin et al (2013), Chen et

al (2014), Lijoi and Nipoti (2014), Lijoi et al (2014a, b) using mixture score distributions h(s) = πδs=0 + (1 − π)h⋆(s).

  • Lévy copulae: e.g. Leisen and Lijoi (2011), Leisen et

al (2013), Zhu and Leisen (2014).

slide-22
SLIDE 22

Normalized Compound Random Measures (NCoRM) A vector of random probability measures can be defined by normalizing each dimension of the CoRM so that pk = ˜ µk ˜ µk(Θ) =

  • j=1

w(k)

j

δθj.

slide-23
SLIDE 23

CoRMs on more general spaces For a more general space X, we define ˜ µ(·; x) to be a completely random measure for x ∈ X. The collection {˜ µ(·; x)|x ∈ X} can be given a CoRM prior with ˜ µ(·; x) =

  • j=1

mj(x) Jj δθj where mk(x) is a realisation of a random process on X. Example X = Rp, mk(x) = exp{rk(x)} where rk(x) is given a zero-mean Gaussian process prior (see Ranganath and Blei, 2015).

slide-24
SLIDE 24

Inference with NCoRM’s We assume that the data are (x1, y1), . . . , (xn, yn) and are modelled as yi|ζi

ind.

∼ k(yi|ζi), ζi ∼ p(·; xi) = ˜ µ(·; xi) ˜ µ(Θ; xi), i = 1, 2, . . . , n where k(y|θ) is a probability density function for y with parameter θ and {p(·; x)|x ∈ X} is given an NCoRM prior.

slide-25
SLIDE 25

MCMC inference for infinite mixture models Introducing allocation variables c1, . . . , cn, the posterior is proportional p(y, c|m, J, θ) = n

  • i=1

k (yi|θci) Jci mci(xi) ∞

l=1 Jl ml(xi)

  • .

This form is not tractable due to the infinite sum in the denominator of each term. This can be addressed using the identity 1 ∞

l=1 Jl ml(xi) =

∞ exp

  • −vi

  • l=1

Jl ml(xi)

  • dvi
slide-26
SLIDE 26

MCMC inference for infinite mixture models Introducing latent variables vi leads to a suitable form of augmented posterior for MCMC p(y, c, v|m, J, θ) =

n

  • i=1
  • k (yi|θci) Jci mci(xi) exp
  • −vi

  • l=1

Jl ml(xi)

  • =

K

  • j=1

 

{i|ci=j}

k

  • yi|θj
  • J

aj j

  • {i|ci=j}

mj(xi)   exp

  • l=1

Jl

n

  • i=1

vi ml(xi)

  • where there are K distinct values of ci and aj = n

i=1 I(ci = j).

slide-27
SLIDE 27

MCMC inference for infinite mixture models: Finite X, independent scores In this case, we can define a marginal sampler (e.g. Favaro and Teh, 2013) by integrating over J and m.

Ja ν⋆(J) dJ is typical for marginal samplers of normalized random measure mixtures.

  • Integrals of

{i|ci=j} mj(xi) will be a product of moments of

the scored distribution.

  • E[exp
  • − ∞

l=1 Jl

n

i=1 vi ml(xi)

  • ] can be evaluated either

exactly or as a univariate integral.

slide-28
SLIDE 28

MCMC inference for infinite mixture models: General X Pseudo-marginal methods (Andrieu and Roberts, 2009) are useful for a target density of the form π(θ) ∝ f(θ) g(θ) where g(θ) cannot be directly evaluated. Samples from the target density ˆ π(θ) ∝ f(θ) ˆ g(θ) where E[ˆ g(θ)] = g(θ) will have the distribution π. In our target, the problem is evaluating E[exp

  • − ∞

l=1 Jl

n

i=1 vi ml(xi)

  • ] = exp{−ψ(v)}
slide-29
SLIDE 29

Unbiased estimation of the Laplace transform The Poisson estimator (see Papaspiliopoulos, 2011) of Lφ = exp

  • D φ(x) dx
  • is

ˆ Lφ =

K

  • i=1
  • 1 −

φ(xi) a C κ(xi)

  • where κ is a p.d.f. on D, C > φ(x)

κ(x) for x ∈ D, a > 1,

K ∼ Pn(a C) and xi

i.i.d.

∼ κ. Then, E[ˆ Lφ] = exp

  • D

φ(x) dx

  • and

V[ˆ Lφ] = L2

φ

  • exp

1 a C

  • D

φ(x)2 κ(x) dx

  • − 1
  • < ∞.
slide-30
SLIDE 30

Unbiased estimation of exp{−ψ(v)} Assuming that x1, x2, . . . , xn are distinct, m⋆

i = m(xi) and

m⋆ = (m⋆

1, . . . , m⋆ n), exp{−ψρ,d(v)} can be re-expressed as

exp

  • (R+)n

  • 1 − exp
  • −z

n

  • i=1

vi m⋆

i

  • h(m⋆) ν⋆(z) dz dm⋆
  • =

n

  • k=1

Lk

where

Lk = exp

  • (R+)n

∞ vk m⋆

k h(m⋆) exp

  • −t

n

  • i=1

vi m⋆

i

  • Tν⋆(t) dt dm⋆
  • and Tν⋆(t) =

t

ν⋆(z) dz (tail mass function).

slide-31
SLIDE 31

Unbiased estimation of the Laplace transform Lk can be estimated using the Poisson estimator with x = (z, m⋆

k), D = (0, ∞) × (R+)n and

φ(z, m⋆

k) = vk m⋆ k h(mk) exp

  • −t

n

  • i=1

vi m⋆

k

  • Tν⋆(t) < ∞.

A suitable approximating density is κ(z, m⋆

k) = κ˜ ν(z)m⋆ k h(m⋆ k)

E[m⋆

k]

where κν(z) > Tν(z) for all z ∈ R+.

slide-32
SLIDE 32

A sampler for more general processes A pseudo-marginal sampler is used with

  • exp{−ψρ,d(v)} estimated by the Poisson estimator.
  • The jumps are not integrated out and values for empty

clusters are proposed from h(m, J) ∝ h(m1/z, . . . , mK/z)z exp{−vz}ν⋆(z).

  • An interweaving scheme for m and z (Yu and Meng, 2011).
slide-33
SLIDE 33

Two clinical studies

−5 5 −1 1 2 3 β0 β1 CALGB8881 −5 5 −1 1 2 3 β0 β1 CALGB9160

slide-34
SLIDE 34

Two clinical studies: Posterior mean densities Results using a CoRM with independent gamma scores.

β0 β1 CALGB8881 −2 −1 1 1 2 β0 β1 CALGB9160 −2 −1 1 1 2

slide-35
SLIDE 35

Example: Nonparametric regression We consider the classic motorcycle data which records head acceleration at different times after impact. f(y) =

  • j=1

wj(x)N(y|µj, σ2

j )

where

  • wj =

exp{rk(x)}Jk ∞

m=1 exp{rm(x)}Jm

  • rm(x) are given independent Gaussian process prior with

squared exponential covariance function.

  • J1, J2, . . . follow a Gamma process with Lévy intensity

M x−1 exp{−x}.

slide-36
SLIDE 36

1 2 3 4 5

  • 100
  • 50

50 100

slide-37
SLIDE 37

5000 10000 2 4 6 8 10 12

M

5000 10000 2 4 6 8 10 12

?

5000 10000 0.5 1 1.5 2 2.5 3 3.5 4

L

slide-38
SLIDE 38

Example: Nonparametric variable selection The classic Boston housing data record the median value of

  • wner-occupied homes in 506 areas of Boston and the values
  • f 14 attributes that are thought to effect house prices.

The covariance function k(x, x′) = exp{− p

i=1 wj(xj − x′ j )2}

and p(wj) ∝ (1 + wj)−1.

slide-39
SLIDE 39

Posterior median and 95% credible intervals for wj

0.5 1 1.5 2 CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT

slide-40
SLIDE 40

Summary

  • CoRM processes are a unifying framework for a

wide-range of proposed vectors of CRMs.

  • CoRM process are vectors of CRM’s which are

constructed in terms of a (univariate) CRM and a distribution (which defines the dependence).

  • Several MCMC methods for NCoRM mixture models are
  • developed. These include methods which depend on the

availability of analytical forms for some integrals with respect to the score distribution and methods which do not.

  • Modelling dependence through distributions allows a

wide-range of dependent nonparametric models to be developed (e.g. regression, time series, etc.).

slide-41
SLIDE 41

References

  • C. Andrieu and G. O. Roberts (2009). The pseudo-marginal approach for efficient

Monte Carlo computations. Ann. Statist., 37, 697–725.

  • S. Favaro and Y. W. Teh (2013). MCMC for Normalized Random Measure Mixture
  • Models. Statistical Science, 28, 335–359.
  • J. E. Griffin, M. Kolossiatis and M. F. J. Steel (2013). Comparing Distributions By Using

Dependent Normalized Random-Measure Mixtures. Journal of the Royal Statistical Society, Series B, 75, 499–529.

  • F. Leisen and A. Lijoi (2011). Vectors of Poisson-Dirichlet processes. J. Multivariate

Anal., 102, 482–495.

  • F. Leisen, A. Lijoi and D. Spano (2013). A Vector of Dirichlet processes. Electronic

Journal of Statistics 7, 62–90.

  • A. Lijoi, and B. Nipoti (2014), A class of hazard rate mixtures for combining survival

data from different experiments. Journal of the American Statistical Association, 109, 802–814.

  • A. Lijoi, B. Nipoti and I. Prünster (2014a), Bayesian inference with dependent

normalized completely random measures, Bernoulli, 20, 1260–1291.

  • A. Lijoi, B. Nipoti and I. Prünster (2014b), Dependent mixture models: clustering and

borrowing information, Computational Statistics and Data Analysis, 71, 417–433.

slide-42
SLIDE 42

References

  • O. Papaspiliopoulos (2011). A methodological framework for Monte Carlo probabilistic

inference for diffusion processes. In Bayesian Time Series Models (D. Barber, A. Taylan Cemgil and S. Chippia, Eds), Cambridge University Press.

  • R. Ranganath and D. M. Blei (2015). Correlated Random Measures. arXiv:1507.00720
  • Y. W. Teh and D. Görür (2009). Indian Buffet Proceses with Power-law Behavior. In

Advances in Neural Information Processing Systems 22 (Y. Bengio, D. Schuurmans, J.

  • D. Lafferty, C. K. I. Williams and A. Culotta, Eds.), 1838–1846.
  • Y. W. Teh, M. I. Jordan, M. J. Beal and D. M. Blei (2006). Hierarchical Dirichlet
  • processes. Journal of the American Statistical Association, 101, 1566–1581.
  • Y. Yu and X.-L. Meng (2011). To Center or Not to Center: That is Not the Question – An

Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency. Journal of Computational and Graphical Statistics, 20, 531–570.

  • W. Zhu and F

. Leisen (2014). A multivariate extension of a vector of Poisson-Dirichlet

  • processes. To appear in the Journal of Nonparametric Statistics.