Posteriors, conjugacy, and exponential families for completely - - PowerPoint PPT Presentation

posteriors conjugacy and exponential families
SMART_READER_LITE
LIVE PREVIEW

Posteriors, conjugacy, and exponential families for completely - - PowerPoint PPT Presentation

Posteriors, conjugacy, and exponential families for completely random measures Tamara Broderick, Ashia C. Wilson, Michael I. Jordan MIT Berkeley Berkeley Models Beta process, Bernoulli process (IBP) Gamma process, Poisson likelihood


slide-1
SLIDE 1

Posteriors, conjugacy, and exponential families

for completely random measures

Tamara Broderick, Ashia C. Wilson, Michael I. Jordan

MIT Berkeley Berkeley

slide-2
SLIDE 2
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

slide-3
SLIDE 3
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

slide-4
SLIDE 4
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

slide-5
SLIDE 5
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

slide-6
SLIDE 6
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

slide-7
SLIDE 7
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

slide-8
SLIDE 8
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

slide-9
SLIDE 9
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

slide-10
SLIDE 10
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

slide-11
SLIDE 11
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • p(θ) ∝ θα(1 + θ)−α−β

p(θ|x) ∝ θα+x(1 + θ)−(α+x)−(β−x+1) x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 α > 0, β > 0 = BetaPrime(θ|α, β)

1

slide-12
SLIDE 12

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • Likelihood ➞ conjugate prior, straightforward inference
  • Integration ➞ addition
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

2

slide-13
SLIDE 13

Models Background

  • Parametric exponential family conjugacy [Diaconis & Ylvisaker 1979]
  • Likelihood ➞ conjugate prior, straightforward inference
  • Integration ➞ addition
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

2

slide-14
SLIDE 14

Models Want: One framework

  • For Bayesian nonparametric models:
  • Likelihood ➞ conjugate prior, straightforward inference
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

3

slide-15
SLIDE 15

Models Want: One framework

  • For Bayesian nonparametric models:
  • Likelihood ➞ conjugate prior, straightforward inference
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

3

slide-16
SLIDE 16

Models Want: One framework

  • For Bayesian nonparametric models:
  • Likelihood ➞ conjugate prior, straightforward inference
  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

3

slide-17
SLIDE 17
  • For Bayesian nonparametric models:
  • Likelihood ➞ conjugate prior, straightforward inference

Models Want: One framework

  • Beta process, Bernoulli process (IBP)
  • Gamma process, Poisson likelihood process (DP, CRP)
  • Beta process, negative binomial process

3

slide-18
SLIDE 18

Clustering

Arts Document 1 Econ Sports Health Technology Document 2 Document 3 Document 4 Document 5 Document 6 Document 7

4

slide-19
SLIDE 19

Arts Document 1 Econ Health Technology Document 2 Document 3 Document 4 Document 5 Document 6 Document 7

Feature allocation

Sports

5

slide-20
SLIDE 20

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

  • 1. Data point n has an existing

feature k that has occurred times with probability

  • 2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

slide-21
SLIDE 21

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

  • 1. Data point n has an existing

feature k that has occurred times with probability

  • 2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

slide-22
SLIDE 22

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

  • 1. Data point n has an existing

feature k that has occurred times with probability

  • 2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

slide-23
SLIDE 23

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

  • 1. Data point n has an existing

feature k that has occurred times with probability

  • 2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

slide-24
SLIDE 24

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

  • 1. Data point n has an existing

feature k that has occurred times with probability

  • 2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

slide-25
SLIDE 25

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

  • 1. Data point n has an existing

feature k that has occurred times with probability

  • 2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

slide-26
SLIDE 26

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

  • 1. Data point n has an existing

feature k that has occurred times with probability

  • 2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

slide-27
SLIDE 27

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

  • 1. Data point n has an existing

feature k that has occurred times with probability

  • 2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

slide-28
SLIDE 28

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

  • 1. Data point n has an existing

feature k that has occurred times with probability

  • 2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

slide-29
SLIDE 29

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

  • 1. Data point n has an existing

feature k that has occurred times with probability

  • 2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

slide-30
SLIDE 30

K+

n = Poisson

✓ γ β β + n − 1 ◆ Sn−1,k β + n − 1 For n = 1, 2, ..., N

  • 1. Data point n has an existing

feature k that has occurred times with probability

  • 2. Number of new features for data

point n: Sn−1,k

n = 1 2 N ... k = 1 2 ...

[Griffiths & Ghahramani 2006]

Indian buffet process (IBP)

6

slide-31
SLIDE 31

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

7

slide-32
SLIDE 32

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-33
SLIDE 33

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-34
SLIDE 34

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-35
SLIDE 35

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-36
SLIDE 36

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

1, . . . , K+

m

Beta process & Bernoulli process

K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-37
SLIDE 37

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-38
SLIDE 38

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-39
SLIDE 39

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-40
SLIDE 40

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-41
SLIDE 41

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-42
SLIDE 42

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-43
SLIDE 43

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-44
SLIDE 44

θk ∼ Beta(1, β + m − 1) For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a frequency of size

1

...

1, . . . , K+

m

Beta process & Bernoulli process

θ1 θ2 θ3 θ4 θ5 θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆

7

[Hjort 1990; Kim 1999; Thibaux & Jordan 2007]

slide-45
SLIDE 45
  • Exchangeable (e.g.

Gibbs sampling)

  • Finite but unbounded

Why are these useful?

...

  • (Countable) sequence
  • f finite-dimensional

distributions

  • Hierarchical models

How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

8

slide-46
SLIDE 46
  • Exchangeable (e.g.

Gibbs sampling)

  • Finite but unbounded

Why are these useful?

...

  • (Countable) sequence
  • f finite-dimensional

distributions

  • Hierarchical models

How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

8

slide-47
SLIDE 47
  • Exchangeable (e.g.

Gibbs sampling)

  • Finite but unbounded

Why are these useful?

...

  • (Countable) sequence
  • f finite-dimensional

distributions

  • Hierarchical models

How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

8

slide-48
SLIDE 48
  • Hierarchical models
  • (Countable) sequence
  • f finite-dimensional

distributions

  • Exchangeable (e.g.

Gibbs sampling)

  • Finite but unbounded

Why are these useful?

...

How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

8

slide-49
SLIDE 49
  • Exchangeable (e.g.

Gibbs sampling)

  • Finite but unbounded

Why are these useful?

...

How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

  • Hierarchical models
  • (Countable) sequence
  • f finite-dimensional

distributions

8

slide-50
SLIDE 50
  • Exchangeable (e.g.

Gibbs sampling)

  • Finite but unbounded

Why are these useful?

...

  • Hierarchical models
  • (Countable) sequence
  • f finite-dimensional

distributions How do we come up with these models?

n = 1 2 N ... k = 1 2 ...

8

slide-51
SLIDE 51

Likelihood (e.g. Bernoulli)

  • Conjugate prior (e.g. BP)
  • Marginal (e.g. IBP)
  • Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

slide-52
SLIDE 52

Likelihood (e.g. Bernoulli)

  • Conjugate prior (e.g. BP)
  • Marginal (e.g. IBP)
  • Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

slide-53
SLIDE 53

Likelihood (e.g. Bernoulli)

  • Conjugate prior (e.g. BP)
  • Marginal (e.g. IBP)
  • Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

slide-54
SLIDE 54

Likelihood (e.g. Bernoulli)

  • Conjugate prior (e.g. BP)
  • Marginal (e.g. IBP)
  • Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

slide-55
SLIDE 55

Likelihood (e.g. Bernoulli)

  • Conjugate prior (e.g. BP)
  • Marginal (e.g. IBP)
  • Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

slide-56
SLIDE 56

Likelihood (e.g. Bernoulli)

  • Conjugate prior (e.g. BP)
  • Marginal (e.g. IBP)
  • Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

slide-57
SLIDE 57

Likelihood (e.g. Bernoulli)

  • Conjugate prior (e.g. BP)
  • Marginal (e.g. IBP)
  • Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

slide-58
SLIDE 58

Likelihood (e.g. Bernoulli)

  • Conjugate prior (e.g. BP)
  • Marginal (e.g. IBP)
  • Size-biased atom

sequence (e.g. BP stick- breaking)

[Broderick, Wilson, Jordan 2014]

One Framework

9

slide-59
SLIDE 59

Example: odds Bernoulli

x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

10

slide-60
SLIDE 60

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

10

slide-61
SLIDE 61

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

  • Poisson process rate

measure

  • Marked Poisson

process rate measure

  • Conjugate prior:
  • Rate measure
  • Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0

10

slide-62
SLIDE 62

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

  • Poisson process rate

measure

  • Marked Poisson

process rate measure

  • Conjugate prior:
  • Rate measure
  • Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0

10

slide-63
SLIDE 63

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

  • Poisson process rate

measure

  • Marked Poisson

process rate measure

  • Conjugate prior:
  • Rate measure
  • Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0

10

slide-64
SLIDE 64

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

  • Poisson process rate

measure

  • Marked Poisson

process rate measure

  • Conjugate prior:
  • Rate measure
  • Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

slide-65
SLIDE 65

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

  • Poisson process rate

measure

  • Marked Poisson

process rate measure

  • Conjugate prior:
  • Rate measure
  • Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

slide-66
SLIDE 66

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

  • Poisson process rate

measure

  • Marked Poisson

process rate measure

  • Conjugate prior:
  • Rate measure
  • Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

slide-67
SLIDE 67

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

  • Poisson process rate

measure

  • Marked Poisson

process rate measure

  • Conjugate prior:
  • Rate measure
  • Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

slide-68
SLIDE 68

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

  • Poisson process rate

measure

  • Marked Poisson

process rate measure

  • Conjugate prior:
  • Rate measure
  • Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

slide-69
SLIDE 69

Example: odds Bernoulli

θ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0

  • Poisson process rate

measure

  • Marked Poisson

process rate measure

  • Conjugate prior:
  • Rate measure
  • Beta prime fixed

atoms ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0 x

10

slide-70
SLIDE 70
  • Poisson process rate

measure

  • Marked Poisson

process rate measure

  • Conjugate prior:
  • Rate measure
  • Beta prime fixed

atoms

Example: odds Bernoulli

θ x x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 ν(dθ) ν(dθ)p(x|θ) ν(dθ) = γθα−1(1 − θ)−α−βdθ α ∈ (−1, 0], β > 0, γ > 0

10

slide-71
SLIDE 71

Example: odds Bernoulli

x x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 θ ν(dθ) = γθα−1(1 − θ)−α−βdθ

11

slide-72
SLIDE 72

Example: odds Bernoulli

x x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 θ ν(dθ) = γθα−1(1 − θ)−α−βdθ

11

slide-73
SLIDE 73

Example: odds Bernoulli

x x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 θ ν(dθ) = γθα−1(1 − θ)−α−βdθ

11

slide-74
SLIDE 74

Example: odds Bernoulli

θ x x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 ν(dθ) = γθα−1(1 − θ)−α−βdθ

11

slide-75
SLIDE 75

Example: odds Bernoulli

θ x x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 R+ θ1 θ2 θ3 ν(dθ) = γθα−1(1 − θ)−α−βdθ

11

slide-76
SLIDE 76

Example: odds Bernoulli

θ x R+ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 θ1 θ2 θ3 ν(dθ|x1 = 0) = γθα−1(1 − θ)−α−(β+1)

11

slide-77
SLIDE 77

Example: odds Bernoulli

θ x R+ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 θ1 θ2 θ3 ν(dθ|x1 = 0) = γθα−1(1 − θ)−α−(β+1)

11

slide-78
SLIDE 78

Example: odds Bernoulli

θ x R+ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 θ1 θ2 θ3 θ4 θ5 ν(dθ|x1 = 0) = γθα−1(1 − θ)−α−(β+1)

11

slide-79
SLIDE 79

Example: odds Bernoulli

θ x R+ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 θ1 θ2 θ3 θ4 θ5 ν(dθ|x1:2 = 0) = γθα−1(1 − θ)−α−(β+2)

11

slide-80
SLIDE 80

Example: odds Bernoulli

θ x

...

R+ x ∈ {0, 1} p(x|θ) = θx(1 + θ)−1 θ > 0 θ6 ν(dθ|x1:2 = 0) = γθα−1(1 − θ)−α−(β+2) θ1 θ2 θ3 θ4 θ5

11

slide-81
SLIDE 81

...

For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a rate of size α = 0 1, . . . , K+

m

R+ θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆ θk ∼ BetaPrime(1, β + m − 1) θ1 θ2 θ3 θ4 θ5

Size-biased atoms, beta prime process

11

slide-82
SLIDE 82

...

For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a rate of size α = 0 1, . . . , K+

m

R+ θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆ θk ∼ BetaPrime(1, β + m − 1) θ1 θ2 θ3 θ4 θ5

Size-biased atoms, beta prime process

11

slide-83
SLIDE 83

...

For m = 1, 2, ...

  • 1. Draw
  • 2. For k =

Draw a rate of size α = 0 1, . . . , K+

m

R+ θ6 K+

m ∼ Poisson

✓ γ β β + m − 1 ◆ θk ∼ BetaPrime(1, β + m − 1) θ1 θ2 θ3 θ4 θ5

Size-biased atoms, beta prime process

Marginal process derivation is similar

11

slide-84
SLIDE 84

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

One Framework

  • Conjugate prior
  • Size-biased atom sequence

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ + fixed atoms PPP rate measure f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

K+

m ∼ Poisson

✓Z

x

γ · κ(0)m−1κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

12

slide-85
SLIDE 85

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

One Framework

  • Conjugate prior
  • Size-biased atom sequence

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)} ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ + fixed atoms PPP rate measure dx f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

K+

m ∼ Poisson

✓Z

x

γ · κ(0)m−1κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

12

slide-86
SLIDE 86

Exponential family likelihood p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)} ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ + fixed atoms PPP rate measure dx

K+

m ∼ Poisson

✓Z

x

γ · κ(0)m−1κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

[Broderick, Wilson, Jordan 2014]

slide-87
SLIDE 87

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

  • Conjugate prior
  • Size-biased atom sequence
  • Marginal process

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)}

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ

+ fixed atoms PPP rate measure dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

K+

m ∼ Poisson

✓Z

x>0

γ · κ(0)m−1 · κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x>0

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

K+

m as above

p(xn|x1:(n−1)) = κ(xn) exp ( −B(ξ +

n−1

X

m=1

xm, λ + n − 1) + B(ξ +

n−1

X

m=1

xm + xn, λ + n) )

slide-88
SLIDE 88

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

  • Conjugate prior
  • Size-biased atom sequence
  • Marginal process

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)}

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ

+ fixed atoms PPP rate measure dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

K+

m ∼ Poisson

✓Z

x>0

γ · κ(0)m−1 · κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x>0

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

K+

m as above

p(xn|x1:(n−1)) = κ(xn) exp ( −B(ξ +

n−1

X

m=1

xm, λ + n − 1) + B(ξ +

n−1

X

m=1

xm + xn, λ + n) )

slide-89
SLIDE 89

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

  • Conjugate prior
  • Size-biased atom sequence
  • Marginal process

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)}

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ

+ fixed atoms PPP rate measure dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

K+

m ∼ Poisson

✓Z

x>0

γ · κ(0)m−1 · κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x>0

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

K+

m as above

p(xn|x1:(n−1)) = κ(xn) exp ( −B(ξ +

n−1

X

m=1

xm, λ + n − 1) + B(ξ +

n−1

X

m=1

xm + xn, λ + n) )

slide-90
SLIDE 90

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

  • Conjugate prior
  • Size-biased atom sequence
  • Marginal process

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)}

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ

+ fixed atoms PPP rate measure dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

K+

m ∼ Poisson

✓Z

x>0

γ · κ(0)m−1 · κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x>0

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

K+

m as above

p(xn|x1:(n−1)) = κ(xn) exp ( −B(ξ +

n−1

X

m=1

xm, λ + n − 1) + B(ξ +

n−1

X

m=1

xm + xn, λ + n) )

slide-91
SLIDE 91

Exponential family likelihood

[Broderick, Wilson, Jordan 2014]

  • Conjugate prior
  • Size-biased atom sequence
  • Marginal process

p(dx|θ) = κ(x) exp{hη(θ), φ(x)i A(θ)}

ν(dθ) = γ exp{hξ, η(θ)i + λ[A(θ)]}dθ

+ fixed atoms PPP rate measure dx

f(dθ) = exp{hξk, η(θ)i + λk[A(θ)] B(ξk, λk)}dθ

One Framework

12

K+

m ∼ Poisson

✓Z

x>0

γ · κ(0)m−1 · κ(x) · exp {B(ξ + (m − 1)φ(0) + φ(x), λ + m)} dx ◆

f(dθ) / Z

x>0

exp {hξ + (m 1)φ(0) + φ(x), η(θ)i + (λ + m)[A(θ)]} dx

as above

p(xn|x1:(n−1)) = κ(xn) exp ( −B(ξ +

n−1

X

m=1

xm, λ + n − 1) + B(ξ +

n−1

X

m=1

xm + xn, λ + n) )

K+

n

slide-92
SLIDE 92
  • To satisfy BNP desiderata,

likelihood must have a point mass at 0

  • Poisson distribution direct

result of Poisson process

  • Much previous work on

conjugacy at a different level

  • f a BNP hierarchy
  • Can be used with arbitrary

(i.e., discrete, continuous, or

  • ther) data likelihood

Notes

A r t s Document 1 E c

  • n

H e a l t h T e c h n

  • l
  • g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

  • r

t s

13

slide-93
SLIDE 93
  • To satisfy BNP desiderata,

likelihood must have a point mass at 0

  • Poisson distribution direct

result of Poisson process

  • Much previous work on

conjugacy at a different level

  • f a BNP hierarchy
  • Can be used with arbitrary

(i.e., discrete, continuous, or

  • ther) data likelihood

Notes

A r t s Document 1 E c

  • n

H e a l t h T e c h n

  • l
  • g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

  • r

t s

13

slide-94
SLIDE 94
  • To satisfy BNP desiderata,

likelihood must have a point mass at 0

  • Poisson distribution direct

result of Poisson process

  • Much previous work on

conjugacy at a different level

  • f a BNP hierarchy
  • Can be used with arbitrary

(i.e., discrete, continuous, or

  • ther) data likelihood

Notes

A r t s Document 1 E c

  • n

H e a l t h T e c h n

  • l
  • g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

  • r

t s

13

slide-95
SLIDE 95
  • To satisfy BNP desiderata,

likelihood must have a point mass at 0

  • Poisson distribution direct

result of Poisson process

  • Much previous work on

conjugacy at a different level

  • f a BNP hierarchy
  • Can be used with arbitrary

(i.e., discrete, continuous, or

  • ther) data likelihood

Notes

A r t s Document 1 E c

  • n

H e a l t h T e c h n

  • l
  • g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

  • r

t s

13

slide-96
SLIDE 96
  • To satisfy BNP desiderata,

likelihood must have a point mass at 0

  • Poisson distribution direct

result of Poisson process

  • Much previous work on

conjugacy at a different level

  • f a BNP hierarchy
  • Can be used with arbitrary

(i.e., discrete, continuous, or

  • ther) data likelihood

Notes

A r t s Document 1 E c

  • n

H e a l t h T e c h n

  • l
  • g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

  • r

t s

13

slide-97
SLIDE 97
  • To satisfy BNP desiderata,

likelihood must have a point mass at 0

  • Poisson distribution direct

result of Poisson process

  • Much previous work on

conjugacy at a different level

  • f a BNP hierarchy
  • Can be used with arbitrary

(i.e., discrete, continuous, or

  • ther) data likelihood

Notes

A r t s Document 1 E c

  • n

H e a l t h T e c h n

  • l
  • g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

  • r

t s

13

slide-98
SLIDE 98
  • To satisfy BNP desiderata,

likelihood must have a point mass at 0

  • Poisson distribution direct

result of Poisson process

  • Much previous work on

conjugacy at a different level

  • f a BNP hierarchy
  • Can be used with arbitrary

(i.e., discrete, continuous, or

  • ther) data likelihood

Notes

A r t s Document 1 E c

  • n

H e a l t h T e c h n

  • l
  • g

y Document 2 Document 3 Document 4 Document 5 Document 6 Document 7 S p

  • r

t s

13

slide-99
SLIDE 99

References

T Broderick, AC Wilson, and MI Jordan. Posteriors, conjugacy, and exponential families for completely random measures. Submitted, ArXiv, 2014.

  • P Diaconis and D Ylvisaker. Conjugate priors for exponential families. Annals of

Statistics, 1979.

  • T Griffiths and Z Ghahramani. Infinite latent features models and the Indian buffet
  • process. In NIPS, 2006.
  • Y Kim. Nonparametric Bayesian estimators for counting processes. Annals of

Statistics, 1999.

  • N Hjort. Nonparametric Bayes estimators based on beta processes in models for

life history data. Annals of Statistics, 1990.

  • R Thibaux and MI Jordan. Hierarchical beta processes and the Indian buffet
  • process. In AISTATS, 2007.

14