Compound Random Measures Jim Griffin (joint work with Fabrizio - - PowerPoint PPT Presentation
Compound Random Measures Jim Griffin (joint work with Fabrizio - - PowerPoint PPT Presentation
Compound Random Measures Jim Griffin (joint work with Fabrizio Leisen) University of Kent Introduction: Two clinical studies CALGB8881 CALGB9160 3 3 2 2 1 1 1 1 0 0 1 1 5 0 5 5 0 5 0 0 Infinite mixture
Introduction: Two clinical studies
−5 5 −1 1 2 3 β0 β1 CALGB8881 −5 5 −1 1 2 3 β0 β1 CALGB9160
Infinite mixture models This data could be analysed using two infinite mixture models CALGB8881: f1(y) =
∞
- j=1
w(1)
j
k(y|θj) and CALGB9160: f2(y) =
∞
- j=1
w(2)
j
k(y|θj) where
- w(k)
j
> 0 for j = 1, 2, . . . and ∞
j=1 w(k) j
= 1 for k = 1, 2.
- k(y|θ) is a p.d.f. for y with parameter θ.
We need to put a prior on the w(1), w(2) and θ (random probability measure).
Some dependent random probability measures: stick-breaking θ are i.i.d. and w(k)
j
= V (k)
j
- i<j
- 1 − V (k)
i
- Hierarchical Dirichlet Process (Teh et al, 2006):
V (k)
j
∼ Be
- α0βj, α0
- 1 − j
l=1 βl
- ,
β′
j ∼ Be(1, γ),
βj = β′
j j−1
- l=1
(1 − β′
l),
- Probit stick-breaking processes, etc.:
- V (1)
j
, V (2)
j
- are
correlated and independent of
- V (1)
i
, V (2)
i
- for i = j.
Completely random measures ˜ µ is a completely random measure (CRM) on Θ if, for any disjoint subsets A1, . . . , An, ˜ µ(A1) . . . , ˜ µ(An) are mutually independent. We concentrate on completely random measures (CRM’s) which can be represented in terms of jump sizes Ji and jump locations θi as ˜ µ =
∞
- i=1
Jiδθi where δ is Dirac’s delta function and have Lévy-Khintchine representation E
- e−
- f(θ)˜
µ(dθ)
= e−
∞ [1−e−sf(θ)]α(dθ)ρ(ds)
where α and ρ are measures for which
- α(dθ) < ∞.
Completely random measures Poisson process with intensity α(dθ)ρ(ds).
0.2 0.4 0.6 0.8 1
3
0.5 1 1.5
J
Examples of CRM’s Many processes that we use in Bayesian nonparametrics are CRM’s
- Gamma process - ρ(ds) = s−1 exp{−s} ds.
- Beta process - ρ(ds) = βs−1(1 − s)β−1 ds.
- r can be derived from CRM’s
- Normalizing a Gamma process, i.e. taking ˜
p = ˜ µ/˜ µ(Θ), leads to a Dirichlet process.
- A beta process prior for p1, p2, . . . can be used to define an
Indian buffet process.
Vectors of CRMs It is useful to define d related CRM’s. Suppose that ˜ µ1, . . . , ˜ µd are CRM’s on Θ with marginal Lévy intensities ¯ νj(ds, dθ) = νj(ds)α(dθ) Then ˜ µ1, . . . , ˜ µd are a vector of CRM’s if there is a Lévy-Khintchine representation of the form E
- e−˜
µ(f1)−···−˜ µd(fd)
= e−ψ⋆
ρ,d(f1,...,fd)
where
ψ⋆
ρ,d(f1, . . . , fd) =
- (R+)d
1 − e−s1f1(θ)−···−sdfd(θ) α(dθ)ρd(ds1, . . . dsd)
and νj(ds) =
- ρd(ds1, . . . , dsd).
Compound Random Measures: Definition A compound random measure (CoRM) is a vector of CRM’s with intensity ρd(ds1, . . . , dsd) =
- z−dh
s1 z , . . . , sd z
- ds1 . . . dsd ν⋆(dz)
where
- s1, . . . , sd are called scores.
- H is a score distribution with density h.
- ν⋆ is the Lévy intensity of a directing Lévy process.
which satisfies the condition
- min(1, s )z−dh
s1
z , . . . , sd z
- ν⋆(dz) < ∞ where s is the
Euclidean norm of the vector s = (s1, . . . , sd).
A representation of a CoRM Realizations of a CoRM can be expressed as ˜ µj =
∞
- i=1
mj,i Ji δθi where
- m1,i, . . . , md,i
i.i.d.
∼ H
- ˜
η = ∞
i=1 Ji δθi is a CRM with Lévy intensity ν⋆(ds) α(dθ).
CoRMs with independent gamma scores We will concentrate on the class of CoRMs for which h(s1/z, . . . , sd/z) =
d
- j=1
f(sj/z) where f is the p.d.f. of a gamma distribution with shape φ, f(x) =
1 Γ(φ)xφ−1 exp{−x}.
Properties of CoRMs with independent score distributions
- The Lévy copula can be expressed as a univariate integral.
- Let Mf
z(t) =
- etsz−1f(s/z)ds be the moment generating
function of z−1f(s/z) then
ψρ,d(λ1, . . . , λd) =
- (R+)d
1 − e−s1λ1−···−sdλd ρd(ds1, . . . dsd) = ψρ,d(λ1, . . . , λd) = 1 −
d
- j=1
Mf
z(−λj)
ν⋆(z)dz
- This expression can be used to calculate quantities such as
Corr(˜ µk(A), ˜ µm(A)).
CoRMs with gamma distributed scores Consider a CoRM process with independent Ga(φ, 1) distributed scores. If the CoRM process has gamma process marginals then ρd(s1, . . . , sd) = (d
j=1 sj)φ−1
[Γ(φ)]d−1 |s|− dφ+1
2 e− |s| 2 W (d−2)φ+1 2
,− dφ
2 (|s|)
(1) where |s| = s1 + · · · + sd and W is the Whittaker function. If the CoRM process has σ-stable process marginals then ρd(s1, . . . , sd) = (d
j=1 sj)φ−1
[Γ(φ)]d−1 σΓ(σ + dφ) Γ(σ + φ)Γ(1 − σ)|s|−σ−dφ. (2)
CoRMs with exponentially distributed scores Consider a CoRM process with independent exponentially distributed scores. If the CoRM has gamma process marginals we recover the multivariate Lévy intensity of Leisen et al (2013), ρd(s1, . . . , sd) =
d−1
- j=0
(d − 1)! (d − 1 − j)!|s|−j−1e−|s|. Otherwise, if σ-stable marginals are considered then we recover the multivariate vector introduced in Leisen and Lijoi (2011) and Zhu and Leisen (2014), ρd(s1, . . . , sd) = (σ)d Γ(1 − σ)|s|−σ−d.
CoRMs with independent gamma scores: specific marginals The Lévy intensity of ˜ µj νj(ds) =
- z−1f(s/z)ds ν⋆(dz) = ν(ds).
If we have independent gamma scores, the directing Lévy intensity ν⋆ is linked to the marginal Lévy intensity by ν⋆ 1 t
- = t2−φL−1
Γ(φ) sφ−1 ν(s)
- (t)
where L−1 is the inverse Laplace transform.
CoRMs with independent gamma scores: specific marginals The intensity of the directing Lévy process is ν⋆(z) = z−1(1 − z)φ−1, 0 < z < 1 leads to a marginal gamma process for which ν(s) = s−1 exp{−s}, s > 0 Remarks
- ν⋆ is the the Lévy intensity of a beta process.
- If ν⋆ is the Lévy intensity of a Stable-Beta process (Teh
and Görür, 2009), the marginal process is a generalized gamma process.
NCoRM: Gamma marginal, φ = 1 DLP
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
- Dim. 1
- Dim. 2
- Dim. 3
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
NCoRM: Gamma marginal, φ = 10 DLP
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01
- Dim. 1
- Dim. 2
- Dim. 3
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.02 0.04 0.06 0.08 0.1 0.12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.02 0.04 0.06 0.08 0.1 0.12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.02 0.04 0.06 0.08 0.1 0.12
NCoRM: Gamma marginal, φ = 50 DLP
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 x 10
−3- Dim. 1
- Dim. 2
- Dim. 3
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Marginal beta process A CoRM with beta process marginals (ν(s) = βs−1(1 − s)β−1) can be constructed using
- A beta score distribution with parameters α and 1
- A directing Lévy intensity
ν⋆(z) = βz−1(1 − z)β−1 + β(β − 1) α (1 − z)β−2 i.e. a superposition of a beta process and a compound Poisson process with beta jump distribution.
Links to other processes Other processes can be expressed as CoRM’s:
- Superpositions/Thinning: e.g. Griffin et al (2013), Chen et
al (2014), Lijoi and Nipoti (2014), Lijoi et al (2014a, b) using mixture score distributions h(s) = πδs=0 + (1 − π)h⋆(s).
- Lévy copulae: e.g. Leisen and Lijoi (2011), Leisen et
al (2013), Zhu and Leisen (2014).
Normalized Compound Random Measures (NCoRM) A vector of random probability measures can be defined by normalizing each dimension of the CoRM so that pk = ˜ µk ˜ µk(Θ) =
∞
- j=1
w(k)
j
δθj.
CoRMs on more general spaces For a more general space X, we define ˜ µ(·; x) to be a completely random measure for x ∈ X. The collection {˜ µ(·; x)|x ∈ X} can be given a CoRM prior with ˜ µ(·; x) =
∞
- j=1
mj(x) Jj δθj where mk(x) is a realisation of a random process on X. Example X = Rp, mk(x) = exp{rk(x)} where rk(x) is given a zero-mean Gaussian process prior (see Ranganath and Blei, 2015).
Inference with NCoRM’s We assume that the data are (x1, y1), . . . , (xn, yn) and are modelled as yi|ζi
ind.
∼ k(yi|ζi), ζi ∼ p(·; xi) = ˜ µ(·; xi) ˜ µ(Θ; xi), i = 1, 2, . . . , n where k(y|θ) is a probability density function for y with parameter θ and {p(·; x)|x ∈ X} is given an NCoRM prior.
MCMC inference for infinite mixture models Introducing allocation variables c1, . . . , cn, the posterior is proportional p(y, c|m, J, θ) = n
- i=1
k (yi|θci) Jci mci(xi) ∞
l=1 Jl ml(xi)
- .
This form is not tractable due to the infinite sum in the denominator of each term. This can be addressed using the identity 1 ∞
l=1 Jl ml(xi) =
∞ exp
- −vi
∞
- l=1
Jl ml(xi)
- dvi
MCMC inference for infinite mixture models Introducing latent variables vi leads to a suitable form of augmented posterior for MCMC p(y, c, v|m, J, θ) =
n
- i=1
- k (yi|θci) Jci mci(xi) exp
- −vi
∞
- l=1
Jl ml(xi)
- =
K
- j=1
{i|ci=j}
k
- yi|θj
- J
aj j
- {i|ci=j}
mj(xi) exp
- −
∞
- l=1
Jl
n
- i=1
vi ml(xi)
- where there are K distinct values of ci and aj = n
i=1 I(ci = j).
MCMC inference for infinite mixture models: Finite X, independent scores In this case, we can define a marginal sampler (e.g. Favaro and Teh, 2013) by integrating over J and m.
Ja ν⋆(J) dJ is typical for marginal samplers of normalized random measure mixtures.
- Integrals of
{i|ci=j} mj(xi) will be a product of moments of
the scored distribution.
- E[exp
- − ∞
l=1 Jl
n
i=1 vi ml(xi)
- ] can be evaluated either
exactly or as a univariate integral.
MCMC inference for infinite mixture models: General X Pseudo-marginal methods (Andrieu and Roberts, 2009) are useful for a target density of the form π(θ) ∝ f(θ) g(θ) where g(θ) cannot be directly evaluated. Samples from the target density ˆ π(θ) ∝ f(θ) ˆ g(θ) where E[ˆ g(θ)] = g(θ) will have the distribution π. In our target, the problem is evaluating E[exp
- − ∞
l=1 Jl
n
i=1 vi ml(xi)
- ] = exp{−ψ(v)}
Unbiased estimation of the Laplace transform The Poisson estimator (see Papaspiliopoulos, 2011) of Lφ = exp
- −
- D φ(x) dx
- is
ˆ Lφ =
K
- i=1
- 1 −
φ(xi) a C κ(xi)
- where κ is a p.d.f. on D, C > φ(x)
κ(x) for x ∈ D, a > 1,
K ∼ Pn(a C) and xi
i.i.d.
∼ κ. Then, E[ˆ Lφ] = exp
- −
- D
φ(x) dx
- and
V[ˆ Lφ] = L2
φ
- exp
1 a C
- D
φ(x)2 κ(x) dx
- − 1
- < ∞.
Unbiased estimation of exp{−ψ(v)} Assuming that x1, x2, . . . , xn are distinct, m⋆
i = m(xi) and
m⋆ = (m⋆
1, . . . , m⋆ n), exp{−ψρ,d(v)} can be re-expressed as
exp
- −
- (R+)n
∞
- 1 − exp
- −z
n
- i=1
vi m⋆
i
- h(m⋆) ν⋆(z) dz dm⋆
- =
n
- k=1
Lk
where
Lk = exp
- −
- (R+)n
∞ vk m⋆
k h(m⋆) exp
- −t
n
- i=1
vi m⋆
i
- Tν⋆(t) dt dm⋆
- and Tν⋆(t) =
∞
t
ν⋆(z) dz (tail mass function).
Unbiased estimation of the Laplace transform Lk can be estimated using the Poisson estimator with x = (z, m⋆
k), D = (0, ∞) × (R+)n and
φ(z, m⋆
k) = vk m⋆ k h(mk) exp
- −t
n
- i=1
vi m⋆
k
- Tν⋆(t) < ∞.
A suitable approximating density is κ(z, m⋆
k) = κ˜ ν(z)m⋆ k h(m⋆ k)
E[m⋆
k]
where κν(z) > Tν(z) for all z ∈ R+.
A sampler for more general processes A pseudo-marginal sampler is used with
- exp{−ψρ,d(v)} estimated by the Poisson estimator.
- The jumps are not integrated out and values for empty
clusters are proposed from h(m, J) ∝ h(m1/z, . . . , mK/z)z exp{−vz}ν⋆(z).
- An interweaving scheme for m and z (Yu and Meng, 2011).
Two clinical studies
−5 5 −1 1 2 3 β0 β1 CALGB8881 −5 5 −1 1 2 3 β0 β1 CALGB9160
Two clinical studies: Posterior mean densities Results using a CoRM with independent gamma scores.
β0 β1 CALGB8881 −2 −1 1 1 2 β0 β1 CALGB9160 −2 −1 1 1 2
Example: Nonparametric regression We consider the classic motorcycle data which records head acceleration at different times after impact. f(y) =
∞
- j=1
wj(x)N(y|µj, σ2
j )
where
- wj =
exp{rk(x)}Jk ∞
m=1 exp{rm(x)}Jm
- rm(x) are given independent Gaussian process prior with
squared exponential covariance function.
- J1, J2, . . . follow a Gamma process with Lévy intensity
M x−1 exp{−x}.
1 2 3 4 5
- 100
- 50
50 100
5000 10000 2 4 6 8 10 12
M
5000 10000 2 4 6 8 10 12
?
5000 10000 0.5 1 1.5 2 2.5 3 3.5 4
L
Example: Nonparametric variable selection The classic Boston housing data record the median value of
- wner-occupied homes in 506 areas of Boston and the values
- f 14 attributes that are thought to effect house prices.
The covariance function k(x, x′) = exp{− p
i=1 wj(xj − x′ j )2}
and p(wj) ∝ (1 + wj)−1.
Posterior median and 95% credible intervals for wj
0.5 1 1.5 2 CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
Summary
- CoRM processes are a unifying framework for a
wide-range of proposed vectors of CRMs.
- CoRM process are vectors of CRM’s which are
constructed in terms of a (univariate) CRM and a distribution (which defines the dependence).
- Several MCMC methods for NCoRM mixture models are
- developed. These include methods which depend on the
availability of analytical forms for some integrals with respect to the score distribution and methods which do not.
- Modelling dependence through distributions allows a
wide-range of dependent nonparametric models to be developed (e.g. regression, time series, etc.).
References
- C. Andrieu and G. O. Roberts (2009). The pseudo-marginal approach for efficient
Monte Carlo computations. Ann. Statist., 37, 697–725.
- S. Favaro and Y. W. Teh (2013). MCMC for Normalized Random Measure Mixture
- Models. Statistical Science, 28, 335–359.
- J. E. Griffin, M. Kolossiatis and M. F. J. Steel (2013). Comparing Distributions By Using
Dependent Normalized Random-Measure Mixtures. Journal of the Royal Statistical Society, Series B, 75, 499–529.
- F. Leisen and A. Lijoi (2011). Vectors of Poisson-Dirichlet processes. J. Multivariate
Anal., 102, 482–495.
- F. Leisen, A. Lijoi and D. Spano (2013). A Vector of Dirichlet processes. Electronic
Journal of Statistics 7, 62–90.
- A. Lijoi, and B. Nipoti (2014), A class of hazard rate mixtures for combining survival
data from different experiments. Journal of the American Statistical Association, 109, 802–814.
- A. Lijoi, B. Nipoti and I. Prünster (2014a), Bayesian inference with dependent
normalized completely random measures, Bernoulli, 20, 1260–1291.
- A. Lijoi, B. Nipoti and I. Prünster (2014b), Dependent mixture models: clustering and
borrowing information, Computational Statistics and Data Analysis, 71, 417–433.
References
- O. Papaspiliopoulos (2011). A methodological framework for Monte Carlo probabilistic
inference for diffusion processes. In Bayesian Time Series Models (D. Barber, A. Taylan Cemgil and S. Chippia, Eds), Cambridge University Press.
- R. Ranganath and D. M. Blei (2015). Correlated Random Measures. arXiv:1507.00720
- Y. W. Teh and D. Görür (2009). Indian Buffet Proceses with Power-law Behavior. In
Advances in Neural Information Processing Systems 22 (Y. Bengio, D. Schuurmans, J.
- D. Lafferty, C. K. I. Williams and A. Culotta, Eds.), 1838–1846.
- Y. W. Teh, M. I. Jordan, M. J. Beal and D. M. Blei (2006). Hierarchical Dirichlet
- processes. Journal of the American Statistical Association, 101, 1566–1581.
- Y. Yu and X.-L. Meng (2011). To Center or Not to Center: That is Not the Question – An
Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency. Journal of Computational and Graphical Statistics, 20, 531–570.
- W. Zhu and F
. Leisen (2014). A multivariate extension of a vector of Poisson-Dirichlet
- processes. To appear in the Journal of Nonparametric Statistics.