[PPT] - Posteriors, conjugacy, and exponential families for completely PowerPoint Presentation

SLIDE 1

Posteriors, conjugacy, and exponential families

for completely random measures

Tamara Broderick, Ashia C. Wilson, Michael I. Jordan

MIT Berkeley Berkeley

SLIDE 2

Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

SLIDE 3

Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

SLIDE 4

Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

SLIDE 5

Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

SLIDE 6

Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

SLIDE 7

Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

SLIDE 8

Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

SLIDE 9

Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

SLIDE 10

Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

SLIDE 11

Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

SLIDE 12

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
Likelihood ➞ conjugate prior, straightforward inference
Integration ➞ addition
Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

2

SLIDE 13

Models Background

Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
Likelihood ➞ conjugate prior, straightforward inference
Integration ➞ addition
Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

2

SLIDE 14

Models Want: One framework

For Bayesian nonparametric models:
Likelihood ➞ conjugate prior, straightforward inference
Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

3

SLIDE 15

Models Want: One framework

For Bayesian nonparametric models:
Likelihood ➞ conjugate prior, straightforward inference
Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

3

SLIDE 16

Models Want: One framework

For Bayesian nonparametric models:
Likelihood ➞ conjugate prior, straightforward inference
Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

3

SLIDE 17

For Bayesian nonparametric models:
Likelihood ➞ conjugate prior, straightforward inference

Models Want: One framework

Beta process, Bernoulli process (IBP)
Gamma process, Poisson likelihood process (DP, CRP)
Beta process, negative binomial process

3

SLIDE 18

Clustering

Arts Document 1 Econ Sports Health Technology Document 2 Document 3 Document 4 Document 5 Document 6 Document 7

4

SLIDE 19

Arts Document 1 Econ Health Technology Document 2 Document 3 Document 4 Document 5 Document 6 Document 7

Feature allocation

Sports

5

SLIDE 20

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

1. Data point n has an existing

feature k that has occurred times with probability

2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

SLIDE 21

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

1. Data point n has an existing

feature k that has occurred times with probability

2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

SLIDE 22

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

1. Data point n has an existing

feature k that has occurred times with probability

2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

SLIDE 23

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

1. Data point n has an existing

feature k that has occurred times with probability

2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

SLIDE 24

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

1. Data point n has an existing

feature k that has occurred times with probability

2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

SLIDE 25

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

1. Data point n has an existing

feature k that has occurred times with probability

2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

SLIDE 26

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

1. Data point n has an existing

feature k that has occurred times with probability

2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

SLIDE 27

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

1. Data point n has an existing

feature k that has occurred times with probability

2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

SLIDE 28

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

1. Data point n has an existing

feature k that has occurred times with probability

2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

SLIDE 29

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

1. Data point n has an existing

feature k that has occurred times with probability

2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

SLIDE 30

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

1. Data point n has an existing

feature k that has occurred times with probability

2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

SLIDE 31

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

7

SLIDE 32

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 33

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 34

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 35

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 36

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

1, . . . , K+

m

Beta process & Bernoulli process

K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 37

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 38

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 39

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 40

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 41

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 42

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 43

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 44

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

1. Draw
2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

SLIDE 45

Exchangeable (e.g.

Gibbs sampling)

Finite but unbounded

Why are these useful?

...

(Countable) sequence
f finite-dimensional

distributions

Hierarchical models

How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

8

SLIDE 46

Exchangeable (e.g.

Gibbs sampling)

Finite but unbounded

Why are these useful?

...

(Countable) sequence
f finite-dimensional

distributions

Hierarchical models

How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

8

SLIDE 47

Exchangeable (e.g.

Gibbs sampling)

Finite but unbounded

Why are these useful?

...

(Countable) sequence
f finite-dimensional

distributions

Hierarchical models

How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

8

SLIDE 48

Hierarchical models
(Countable) sequence
f finite-dimensional

distributions

Exchangeable (e.g.

Gibbs sampling)

Finite but unbounded

Why are these useful?

...

How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

8

SLIDE 49

Exchangeable (e.g.

Gibbs sampling)

Finite but unbounded

Why are these useful?

...

How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

Hierarchical models
(Countable) sequence
f finite-dimensional

distributions

8

SLIDE 50

Exchangeable (e.g.

Gibbs sampling)

Finite but unbounded

Why are these useful?

...

Hierarchical models
(Countable) sequence
f finite-dimensional

distributions How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

8

SLIDE 51

Likelihood (e.g. Bernoulli)

Conjugate prior (e.g. BP)
Marginal (e.g. IBP)
Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

SLIDE 52

Likelihood (e.g. Bernoulli)

Conjugate prior (e.g. BP)
Marginal (e.g. IBP)
Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

SLIDE 53

Likelihood (e.g. Bernoulli)

Conjugate prior (e.g. BP)
Marginal (e.g. IBP)
Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

SLIDE 54

Likelihood (e.g. Bernoulli)

Conjugate prior (e.g. BP)
Marginal (e.g. IBP)
Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

SLIDE 55

Likelihood (e.g. Bernoulli)

Conjugate prior (e.g. BP)
Marginal (e.g. IBP)
Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

SLIDE 56

Likelihood (e.g. Bernoulli)

Conjugate prior (e.g. BP)
Marginal (e.g. IBP)
Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

SLIDE 57

Likelihood (e.g. Bernoulli)

Conjugate prior (e.g. BP)
Marginal (e.g. IBP)
Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

SLIDE 58

Likelihood (e.g. Bernoulli)

Conjugate prior (e.g. BP)
Marginal (e.g. IBP)
Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

Poisson process rate

measure

Marked Poisson

process rate measure

Conjugate prior:
Rate measure
Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0

10

SLIDE 62

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

Poisson process rate

measure

Marked Poisson

process rate measure

Conjugate prior:
Rate measure
Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0

10

SLIDE 63

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

Poisson process rate

measure

Marked Poisson

process rate measure

Conjugate prior:
Rate measure
Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0

10

SLIDE 64

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

Poisson process rate

measure

Marked Poisson

process rate measure

Conjugate prior:
Rate measure
Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

SLIDE 65

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

Poisson process rate

measure

Marked Poisson

process rate measure

Conjugate prior:
Rate measure
Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

SLIDE 66

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

Poisson process rate

measure

Marked Poisson

process rate measure

Conjugate prior:
Rate measure
Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

SLIDE 67

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

Poisson process rate

measure

Marked Poisson

process rate measure

Conjugate prior:
Rate measure
Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

SLIDE 68

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

Poisson process rate

measure

Marked Poisson

process rate measure

Conjugate prior:
Rate measure
Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

SLIDE 69

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

Poisson process rate

measure

Marked Poisson

process rate measure

Conjugate prior:
Rate measure
Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

SLIDE 70

Poisson process rate

measure

Marked Poisson

process rate measure

Conjugate prior:
Rate measure
Beta prime fixed

SLIDE 80

Example: odds Bernoulli

θ x

...

R+ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 θ6 ν(dθ|x1:2 = 0) = γθα−1(1 − θ)−α−(β+2) θ1 θ2 θ3 θ4 θ5

11

SLIDE 81

...

For m = 1, 2, ...

1. Draw
2. For k =

Draw a rate of size α = 0 1, . . . , K+

m

R+ θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆ θk ∼ BetaPrime(1, β + m − 1) θ1 θ2 θ3 θ4 θ5

Size-biased atoms, beta prime process

11

SLIDE 82

...

For m = 1, 2, ...

1. Draw
2. For k =

Draw a rate of size α = 0 1, . . . , K+

m

R+ θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆ θk ∼ BetaPrime(1, β + m − 1) θ1 θ2 θ3 θ4 θ5

Size-biased atoms, beta prime process

11

SLIDE 83

...

For m = 1, 2, ...

1. Draw
2. For k =

Draw a rate of size α = 0 1, . . . , K+

m

R+ θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆ θk ∼ BetaPrime(1, β + m − 1) θ1 θ2 θ3 θ4 θ5

Size-biased atoms, beta prime process

Marginal process derivation is similar

11

SLIDE 84

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

One Framework

Conjugate prior
Size-biased atom sequence

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ + fixed atoms PPP rate measure f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

K+

m ∼ Poisson

✓Z

x

γ · κ(0)m−1κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

12

SLIDE 85

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

One Framework

Conjugate prior
Size-biased atom sequence

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)} ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ + fixed atoms PPP rate measure dx f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

K+

m ∼ Poisson

✓Z

x

γ · κ(0)m−1κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

12

SLIDE 86

Exponential family likelihood p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)} ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ + fixed atoms PPP rate measure dx

K+

m ∼ Poisson

✓Z

x

γ · κ(0)m−1κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

[Broderick, Wilson, Jordan 2014]

SLIDE 87

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

Conjugate prior
Size-biased atom sequence
Marginal process

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)}

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ

+ fixed atoms PPP rate measure dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

K+

m ∼ Poisson

✓Z

x>0

γ · κ(0)m−1 · κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x>0

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

K+

m as above

p(xn|x1:(n−1)) = κ(xn) exp ( −B(ξ +

n−1

X

m=1

xm, λ + n − 1) + B(ξ +

n−1

X

m=1

xm + xn, λ + n) )

SLIDE 88

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

Conjugate prior
Size-biased atom sequence
Marginal process

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)}

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ

+ fixed atoms PPP rate measure dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

K+

m ∼ Poisson

✓Z

x>0

γ · κ(0)m−1 · κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x>0

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

K+

m as above

p(xn|x1:(n−1)) = κ(xn) exp ( −B(ξ +

n−1

X

m=1

xm, λ + n − 1) + B(ξ +

n−1

X

m=1

xm + xn, λ + n) )

SLIDE 89

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

Conjugate prior
Size-biased atom sequence
Marginal process

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)}

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ

+ fixed atoms PPP rate measure dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

K+

m ∼ Poisson

✓Z

x>0

γ · κ(0)m−1 · κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x>0

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

K+

m as above

p(xn|x1:(n−1)) = κ(xn) exp ( −B(ξ +

n−1

X

m=1

xm, λ + n − 1) + B(ξ +

n−1

X

m=1

xm + xn, λ + n) )

SLIDE 90

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

Conjugate prior
Size-biased atom sequence
Marginal process

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)}

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ

+ fixed atoms PPP rate measure dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

K+

m ∼ Poisson

✓Z

x>0

γ · κ(0)m−1 · κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x>0

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

K+

m as above

p(xn|x1:(n−1)) = κ(xn) exp ( −B(ξ +

n−1

X

m=1

xm, λ + n − 1) + B(ξ +

n−1

X

m=1

xm + xn, λ + n) )

SLIDE 91

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

Conjugate prior
Size-biased atom sequence
Marginal process

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)}

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ

+ fixed atoms PPP rate measure dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

K+

m ∼ Poisson

✓Z

x>0

γ · κ(0)m−1 · κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x>0

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

as above

p(xn|x1:(n−1)) = κ(xn) exp ( −B(ξ +

n−1

X

m=1

xm, λ + n − 1) + B(ξ +

n−1

X

m=1

xm + xn, λ + n) )

K+

n

SLIDE 92

To satisfy BNP desiderata,

likelihood must have a point mass at 0

Poisson distribution direct

result of Poisson process

Much previous work on

conjugacy at a different level

f a BNP hierarchy
Can be used with arbitrary

(i.e., discrete, continuous, or

ther) data likelihood

Notes

A r t s Document 1 E c

n

H e a l t h T e c h n

l
g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

r

t s

13

SLIDE 93

To satisfy BNP desiderata,

likelihood must have a point mass at 0

Poisson distribution direct

result of Poisson process

Much previous work on

conjugacy at a different level

f a BNP hierarchy
Can be used with arbitrary

(i.e., discrete, continuous, or

ther) data likelihood

Notes

A r t s Document 1 E c

n

H e a l t h T e c h n

l
g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

r

t s

13

SLIDE 94

To satisfy BNP desiderata,

likelihood must have a point mass at 0

Poisson distribution direct

result of Poisson process

Much previous work on

conjugacy at a different level

f a BNP hierarchy
Can be used with arbitrary

(i.e., discrete, continuous, or

ther) data likelihood

Notes

A r t s Document 1 E c

n

H e a l t h T e c h n

l
g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

r

t s

13

SLIDE 95

To satisfy BNP desiderata,

likelihood must have a point mass at 0

Poisson distribution direct

result of Poisson process

Much previous work on

conjugacy at a different level

f a BNP hierarchy
Can be used with arbitrary

(i.e., discrete, continuous, or

ther) data likelihood

Notes

A r t s Document 1 E c

n

H e a l t h T e c h n

l
g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

r

t s

13

SLIDE 96

To satisfy BNP desiderata,

likelihood must have a point mass at 0

Poisson distribution direct

result of Poisson process

Much previous work on

conjugacy at a different level

f a BNP hierarchy
Can be used with arbitrary

(i.e., discrete, continuous, or

ther) data likelihood

Notes

A r t s Document 1 E c

n

H e a l t h T e c h n

l
g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

r

t s

13

SLIDE 97

To satisfy BNP desiderata,

likelihood must have a point mass at 0

Poisson distribution direct

result of Poisson process

Much previous work on

conjugacy at a different level

f a BNP hierarchy
Can be used with arbitrary

(i.e., discrete, continuous, or

ther) data likelihood

Notes

A r t s Document 1 E c

n

H e a l t h T e c h n

l
g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

r

t s

13

SLIDE 98

To satisfy BNP desiderata,

likelihood must have a point mass at 0

Poisson distribution direct

result of Poisson process

Much previous work on

conjugacy at a different level

f a BNP hierarchy
Can be used with arbitrary

(i.e., discrete, continuous, or

ther) data likelihood

Notes

A r t s Document 1 E c

n

H e a l t h T e c h n

l
g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

r

t s

13

SLIDE 99

References

T Broderick, AC Wilson, and MI Jordan. Posteriors, conjugacy, and exponential families for completely random measures. Submitted, ArXiv, 2014.

P Diaconis and D Ylvisaker. Conjugate priors for exponential families. Annals of

Statistics, 1979.

T Griffiths and Z Ghahramani. Infinite latent features models and the Indian buffet
process. In NIPS, 2006.
Y Kim. Nonparametric Bayesian estimators for counting processes. Annals of

Statistics, 1999.

N Hjort. Nonparametric Bayes estimators based on beta processes in models for

life history data. Annals of Statistics, 1990.

R Thibaux and MI Jordan. Hierarchical beta processes and the Indian buffet
process. In AISTATS, 2007.

14