Constructing dependent random probability measures from completely - - PowerPoint PPT Presentation

constructing dependent random probability measures from
SMART_READER_LITE
LIVE PREVIEW

Constructing dependent random probability measures from completely - - PowerPoint PPT Presentation

Constructing dependent random probability measures from completely random measures Changyou Chen 1 , Vinayak Rao 2 , Wray Buntine 1 , Yee Whye Teh 3 presented by Sinead Williamson 4 1 NICTA, 2 Duke University, 3 University of Oxford, 4 UT Austin


slide-1
SLIDE 1

Constructing dependent random probability measures from completely random measures

Changyou Chen1, Vinayak Rao2, Wray Buntine1, Yee Whye Teh3 presented by Sinead Williamson4

1NICTA, 2Duke University, 3University of Oxford, 4UT Austin

ICML 2013

slide-2
SLIDE 2

Introduction

Two strands of research in NPBayes modelling of random probability measures (RPMs):

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 2 / 23

slide-3
SLIDE 3

Introduction

Two strands of research in NPBayes modelling of random probability measures (RPMs):

priors that are more expressive than the Dirichlet Process

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 2 / 23

slide-4
SLIDE 4

Introduction

Two strands of research in NPBayes modelling of random probability measures (RPMs):

priors that are more expressive than the Dirichlet Process

e.g. power-law behaviour or more uncertainty on number of clusters: Normalized Random Measures [James et al., 2005, Kingman, 1975] (e.g. normalized generalized Gamma process) Poisson-Kingman processes [Pitman, 2003] (e.g. Pitman-Yor process [Pitman and Yor, 1997])

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 2 / 23

slide-5
SLIDE 5

Introduction

Two strands of research in NPBayes modelling of random probability measures (RPMs):

priors that are more expressive than the Dirichlet Process priors that model more structured data

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 2 / 23

slide-6
SLIDE 6

Introduction

Two strands of research in NPBayes modelling of random probability measures (RPMs):

priors that are more expressive than the Dirichlet Process priors that model more structured data

for data violating the assumption of exchangeability: Time-series, spatial data, conditional density modelling Research traces back to work of [MacEachern, 1999] on dependent RPMs

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 2 / 23

slide-7
SLIDE 7

Introduction

Two strands of research in NPBayes modelling of random probability measures (RPMs):

priors that are more expressive than the Dirichlet Process priors that model more structured data

This talk:

Flexible constructions for dependent RPMs with flexible marginals

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 2 / 23

slide-8
SLIDE 8

Relevant work

There is a rich literature on dependent RPMs the seminal work of [MacEachern, 1999] on dependent DPs Existing work that is directly relevant [Rao and Teh, 2009, Nipoti, 2010, Lijoi et al., 2012, Foti et al., 2012, Lin and Fisher, 2012, Griffin et al., 2013]

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 3 / 23

slide-9
SLIDE 9

Relevant work

There is a rich literature on dependent RPMs the seminal work of [MacEachern, 1999] on dependent DPs Existing work that is directly relevant [Rao and Teh, 2009, Nipoti, 2010, Lijoi et al., 2012, Foti et al., 2012, Lin and Fisher, 2012, Griffin et al., 2013]

  • C. Chen, V. Rao, W. Buntine and Y.W. Teh (2013)

Dependent Normalized Random Measures

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 3 / 23

slide-10
SLIDE 10

Completely random measures (CRMs)

A random measure µ on some space (X, ΣX) such that µ(A) ⊥ ⊥ µ(B) if A and B are disjoint

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 4 / 23

slide-11
SLIDE 11

Completely random measures (CRMs)

A random measure µ on some space (X, ΣX) such that µ(A) ⊥ ⊥ µ(B) if A and B are disjoint The measure µ is atomic: µ =

  • i

wiδxi

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 4 / 23

slide-12
SLIDE 12

Completely random measures (CRMs)

A random measure µ on some space (X, ΣX) such that µ(A) ⊥ ⊥ µ(B) if A and B are disjoint The measure µ is atomic: µ =

  • i

wiδxi (xi, wi) : events of a Poisson process on the space X × W, where W = [0, ∞). The Poisson process has intensity ν(w, x) = ρ(w)h(x), where ρ(w) is the L´ evy intensity of the CRM, and h(x) is the base probability density.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 4 / 23

slide-13
SLIDE 13

Normalized random measures

Poisson process {wi, xi}

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 5 / 23

slide-14
SLIDE 14

Normalized random measures

Poisson process {wi, xi} CRM µ ≡ {wi, xi}

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 5 / 23

slide-15
SLIDE 15

Normalized random measures

Poisson process {wi, xi} CRM µ ≡ {wi, xi} Normalize to construct a random probability measure G: G(·) = µ(·)

µ(X)

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 5 / 23

slide-16
SLIDE 16

Normalized random measures

Poisson process {wi, xi} CRM µ ≡ {wi, xi} Normalize to construct a random probability measure G: G(·) = µ(·)

µ(X)

In the following, we set ρ(w) = αw −σ−1 exp(−τw) corresponding to the generalized Gamma process.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 5 / 23

slide-17
SLIDE 17

Normalized random measures

Poisson process {wi, xi} CRM µ ≡ {wi, xi} Normalize to construct a random probability measure G: G(·) = µ(·)

µ(X)

In the following, we set ρ(w) = αw −σ−1 exp(−τw) corresponding to the generalized Gamma process. We want: Dependent normalized random measures, Gt

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 5 / 23

slide-18
SLIDE 18

Dependent normalized random measures

Define a common latent CRM/Poisson process. Define dependent measures via transformations of this process.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 6 / 23

slide-19
SLIDE 19

Dependent normalized random measures

Define a common latent CRM/Poisson process. Define dependent measures via transformations of this process.

◮ Superposition [Rao and Teh, 2009, Griffin et al., 2013] ◮ Rescaling ◮ Thinning [Lin et al., 2010, Lin and Fisher, 2012]

Normalize these dependent CRMs to produce dependent NRMs.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 6 / 23

slide-20
SLIDE 20

Superposition theorem

The superposition of two independent Poisson processes with intensity νi(·), i = 1, 2 is a Poisson process with intensity ν1(·) + ν2(·) The resulting CRM has L´ evy measure ρ = ρ1 + ρ2.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 7 / 23

slide-21
SLIDE 21

Superposition theorem

The superposition of two independent Poisson processes with intensity νi(·), i = 1, 2 is a Poisson process with intensity ν1(·) + ν2(·) The resulting CRM has L´ evy measure ρ = ρ1 + ρ2. The projection of a Poisson process from X × W × A to X × W is a Poisson process with intensity

  • A ν(dx, dw, da)

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 7 / 23

slide-22
SLIDE 22

Superposition theorem

The superposition of two independent Poisson processes with intensity νi(·), i = 1, 2 is a Poisson process with intensity ν1(·) + ν2(·) The resulting CRM has L´ evy measure ρ = ρ1 + ρ2. The projection of a Poisson process from X × W × A to X × W is a Poisson process with intensity

  • A ν(dx, dw, da)

If ν(·) factors as ρ(w)h(x)νa(a), then the resulting CRM has L´ evy intensity

  • A νa(a)da
  • ρ(w)h(x).

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 7 / 23

slide-23
SLIDE 23

Spatial Normalized Gamma processes [Rao and Teh, 2009]

A measure-valued stochastic process Gt, t ∈ T is an arbitrary space

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 8 / 23

slide-24
SLIDE 24

Spatial Normalized Gamma processes [Rao and Teh, 2009]

A measure-valued stochastic process Gt, t ∈ T is an arbitrary space Instantiate a Poisson process on some augmented space

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 8 / 23

slide-25
SLIDE 25

Spatial Normalized Gamma processes [Rao and Teh, 2009]

A measure-valued stochastic process Gt, t ∈ T is an arbitrary space Instantiate a Poisson process on some augmented space Associate each t with a subset X × W × At

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 8 / 23

slide-26
SLIDE 26

Spatial Normalized Gamma processes [Rao and Teh, 2009]

A measure-valued stochastic process Gt, t ∈ T is an arbitrary space Instantiate a Poisson process on some augmented space Associate each t with a subset X × W × At Restrict to At, and project onto the original space, defining an NRM

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 8 / 23

slide-27
SLIDE 27

Spatial Normalized Gamma processes [Rao and Teh, 2009]

A measure-valued stochastic process Gt, t ∈ T is an arbitrary space Instantiate a Poisson process on some augmented space Associate each t with a subset X × W × At Restrict to At, and project onto the original space, defining an NRM

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 8 / 23

slide-28
SLIDE 28

Spatial Normalized Gamma processes [Rao and Teh, 2009]

A measure-valued stochastic process Gt, t ∈ T is an arbitrary space Instantiate a Poisson process on some augmented space Associate each t with a subset X × W × At Restrict to At, and project onto the original space, defining an NRM

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 8 / 23

slide-29
SLIDE 29

Spatial Normalized Gamma processes [Rao and Teh, 2009]

A measure-valued stochastic process Gt, t ∈ T is an arbitrary space Instantiate a Poisson process on some augmented space Associate each t with a subset X × W × At Restrict to At, and project onto the original space, defining an NRM Dependency across NRMs is controlled by amount of overlap of At’s

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 8 / 23

slide-30
SLIDE 30

Dependent normalized random measures ([Griffin et al., 2013])

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 9 / 23

slide-31
SLIDE 31

Dependent normalized random measures ([Griffin et al., 2013])

Gt ∝

R

  • r=1

ztrµr, ztr ∈ {0, 1} (1)

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 9 / 23

slide-32
SLIDE 32

Mixed normalized random measures

A simple generalization: allow ztr ∈ R+

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 10 / 23

slide-33
SLIDE 33

Mixed normalized random measures

A simple generalization: allow ztr ∈ R+ zµ(·) belongs to the same class of CRMs as µ(·). Poisson mapping theorem:

◮ If {wi} is a sample from a Poisson process with intensity ν(w), then {zwi} is

a Poisson process with intensity z−1ν(w/z).

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 10 / 23

slide-34
SLIDE 34

Mixed normalized random measures

A simple generalization: allow ztr ∈ R+ zµ(·) belongs to the same class of CRMs as µ(·). Poisson mapping theorem:

◮ If {wi} is a sample from a Poisson process with intensity ν(w), then {zwi} is

a Poisson process with intensity z−1ν(w/z).

ztr governs how strongly atoms of CRM r contribute to covariate t. Given a set {ztr}, for each t, Gt is an NRM.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 10 / 23

slide-35
SLIDE 35

Thinning theorem

If {wi} is a sample from a Poisson process with intensity ν(w), then {ziwi}, where zi

i.i.d.

∼ Bernoulli(p) is Poisson with intensity pν(w). Suggests independently thinning atoms of a CRM to form a new CRM ([Lin et al., 2010]). Corresponds to SNRM with an exponential number of CRM.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 11 / 23

slide-36
SLIDE 36

Thinned Normalized Random Measures

Spatial NRM characterized by a set {ztr ∈ {0, 1} ∀t ∈ T , r ∈ R} ztr specifies whether or not all atoms of CRM r are present at covariate t.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 12 / 23

slide-37
SLIDE 37

Thinned Normalized Random Measures

Spatial NRM characterized by a set {ztr ∈ {0, 1} ∀t ∈ T , r ∈ R} ztr specifies whether or not all atoms of CRM r are present at covariate t. Thinned NRM: introduce indicator variables ztrw ∈ {0, 1} for each atom. zrtk ∼ Bernoulli(qrt) k = 1, 2, · · · Then, the probability measure at covariate t is given by µt(dθ) = 1 ˆ µt(Θ) ˆ µt(dθ), where ˆ µt(dθ) =

R

  • r=1

  • k=1

zrtkwrk (2)

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 12 / 23

slide-38
SLIDE 38

Thinned Normalized Random Measures

Spatial NRM characterized by a set {ztr ∈ {0, 1} ∀t ∈ T , r ∈ R} ztr specifies whether or not all atoms of CRM r are present at covariate t. Thinned NRM: introduce indicator variables ztrw ∈ {0, 1} for each atom. zrtk ∼ Bernoulli(qrt) k = 1, 2, · · · Then, the probability measure at covariate t is given by µt(dθ) = 1 ˆ µt(Θ) ˆ µt(dθ), where ˆ µt(dθ) =

R

  • r=1

  • k=1

zrtkwrk (2)

Proposition

Conditioned on the set of qrt’s, each random probability measure µt defined in (2) is marginally distributed as a normalized random measure with L´ evy measure

  • r zrtνr(dw, dθ).

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 12 / 23

slide-39
SLIDE 39

Dependent normalized random measures

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 13 / 23

slide-40
SLIDE 40

Inference

[Lin et al., 2010] have proposed a similar model at NIPS this year. They provide a marginal sampler for posterior inference.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 14 / 23

slide-41
SLIDE 41

Inference

[Lin et al., 2010] have proposed a similar model at NIPS this year. They provide a marginal sampler for posterior inference. Unfortunately, this sampler is incorrect. A similar error exists in [Lin and Fisher, 2012]

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 14 / 23

slide-42
SLIDE 42

Inference

[Lin et al., 2010] have proposed a similar model at NIPS this year. They provide a marginal sampler for posterior inference. Unfortunately, this sampler is incorrect. A similar error exists in [Lin and Fisher, 2012] At a high level:

◮ One can superimpose 3 CRMs to construct 2 dependent RPMs, each

marginally an NRM.

◮ However, given observations from one , the other is no longer an NRM. ◮ It becomes a mixture of NRMs. Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 14 / 23

slide-43
SLIDE 43

Inference (a marginal sampling approach)

Following [James et al., 2005], introduce auxiliary variables ut ∀t ∈ T Conditionally marginalize the CRMs µr to obtain a generalized CRP Alternately resample partition given {ut}, and then {ut} given partition. Works for SNRM and MNRM, but impractical for TNRM.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 15 / 23

slide-44
SLIDE 44

Inference (a slice sampling approach)

Following [Walker, 2007], introduce auxiliary variables sr ∀r ∈ R Instantiate atoms of the CRM µr larger than sr Conditionally sample partition of observations, and associated z′s Alternately resample µr given {sr}, and then {sr} given µr.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 16 / 23

slide-45
SLIDE 45

Application: Document modelling

Four corpora of documents (ICML, Person, TPAMI, NIPS) Documents organized by year. Largest corpus: NIPS

◮ 17 years, 2483 documents, 3.28M

words and a vocabulary of 14K

Use a nonparametric topic model.

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 17 / 23

slide-46
SLIDE 46

Application to document modelling

1 2 3 4 5 6 7 8 900 950 1000 1050 1100

ICML

1 2 3 4 5 6 7 8 9 1050 1100 1150 1200 1250 1300

Person

1 2 3 4 5 6 7 8 9 5500 6000 6500 7000 7500 8000 8500

TPAMI

1 2 3 4 5 6 7 8 9 1500 2000 2500 3000

NIPS

Test perplexity on four different corpora (small is good)

HDP HNGG TNGG MNGG HSNGG HTNGG HMNGG HMNGP

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 18 / 23

slide-47
SLIDE 47

Application to document modelling

20 40 60 80 100 120

ICML

50 100 150 200

Person

50 100 150 200 250 300 350 400

TPAMI

100 200 300 400 500

NIPS

ESS/1000 samples

MNGG (marg) MNGG (slice) TNGG (slice)

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 19 / 23

slide-48
SLIDE 48

Application to document modelling

20 40 60 80 100 120 140

ICML (s)

50 100 150 200 250 300 350

Person (s)

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

TPAMI (h)

2 4 6 8 10 12

NIPS (h)

Time/1000 samples

MNGG (marg) MNGG (slice) TNGG (slice)

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 20 / 23

slide-49
SLIDE 49

Thank you!

Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 21 / 23

slide-50
SLIDE 50

Bibliography I

Foti, N. J., Futoma, J., Rockmore, D., and Williamson, S. A. (2012). A unifying representation for a class of dependent random measures. Technical Report arXiv:1211.4753, Dartmouth College and CMU, USA. Griffin, J. E., Kolossiatis, M., and Steel, M. F. J. (2013). Comparing distributions by using dependent normalized random-measure mixtures. Journal of the Royal Statistical Society: Series B (Statistical Methodology), pages n/a–n/a. James, L. F., Lijoi, A., and Pruenster, I. (2005). Bayesian inference via classes of normalized random measures. ICER Working Papers - Applied Mathematics Series 5-2005, ICER - International Centre for Economic Research. Kingman, J. F. C. (1975). Random discrete distributions. Journal of the Royal Statistical Society, 37:1–22. Lijoi, A., Nipoti, B., and Pruenster, I. (2012). Bayesian inference with dependent normalized completely random measures. Technical report. Lin, D., Grimson, E., and Fisher, J. (2010). Construction of dependent Dirichlet processes based on Poisson processes. In NIPS. Lin, D. H. and Fisher, J. (2012). Coupling nonparametric mixtures via latent Dirichlet processes. In NIPS. MacEachern, S. (1999). Dependent nonparametric processes. In Proceedings of the Section on Bayesian Statistical Science. American Statistical Association. Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 22 / 23

slide-51
SLIDE 51

Bibliography II

Nipoti, B. (2010). Transformations of dependent completely random measures. Technical report. Pitman, J. (2003). Poisson-kingman partitions. In of Lecture Notes-Monograph Series, pages 1–34. Pitman, J. and Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability, 25:855–900. Rao, V. and Teh, Y. W. (2009). Spatial normalized gamma processes. In Advances in Neural Information Processing Systems. Walker, S. G. (2007). Sampling the Dirichlet mixture model with slices. Communications in Statistics - Simulation and Computation, 36:45. Chen, Rao, Buntine, and Teh (Duke) Dependent RPMs from CRMs June, 2013 23 / 23