Implementing discrete approximations to continuous mixture - - PowerPoint PPT Presentation

implementing discrete approximations to continuous
SMART_READER_LITE
LIVE PREVIEW

Implementing discrete approximations to continuous mixture - - PowerPoint PPT Presentation

Implementing discrete approximations to continuous mixture distributions Christian R over Department of Medical Statistics University Medical Center G ottingen December 5, 2014 C. R over Implementing mixture approximations December


slide-1
SLIDE 1

Implementing discrete approximations to continuous mixture distributions

Christian R¨

  • ver

Department of Medical Statistics University Medical Center G¨

  • ttingen

December 5, 2014

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 1 / 31

slide-2
SLIDE 2

Overview

mixture distributions meta analysis example discrete ‘grid’ approximations design strategy / algorithm example application

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 2 / 31

slide-3
SLIDE 3

Mixture distributions

mixture distribution:

a convex combination of “component” distributions “a distribution whose parameters are random variables”

(“conditional”) distribution with density p(y|x) “parameter” x follows a distribution p(x) marginal distribution of y is p(y) =

  • X p(y|x) dp(x)

x discrete: p(y) =

i p(y|xi) p(xi)

ubiquitous in many applications

Student-t distribution negative binomial distribution marginal distributions . . .

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 3 / 31

slide-4
SLIDE 4

Meta analysis

Context: random-effects meta-analysis

effect Θ # 7 # 6 # 5 # 4 # 3 # 2 # 1 120 140 160 180 200 220 240

have:

estimates yi standard errors σi

want:

combined estimate ˆ Θ

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 4 / 31

slide-5
SLIDE 5

Meta analysis

Context: random-effects meta-analysis

effect Θ # 7 # 6 # 5 # 4 # 3 # 2 # 1 120 140 160 180 200 220 240

have:

estimates yi standard errors σi

want:

combined estimate ˆ Θ

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 4 / 31

slide-6
SLIDE 6

Meta analysis

Context: random-effects meta-analysis

effect Θ Θ # 7 # 6 # 5 # 4 # 3 # 2 # 1 120 140 160 180 200 220 240

have:

estimates yi standard errors σi

want:

combined estimate ˆ Θ

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 4 / 31

slide-7
SLIDE 7

Meta analysis

Context: random-effects meta-analysis

effect Θ Θ # 7 # 6 # 5 # 4 # 3 # 2 # 1 120 140 160 180 200 220 240

have:

estimates yi standard errors σi

want:

combined estimate ˆ Θ

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 4 / 31

slide-8
SLIDE 8

Meta analysis

The random effects model

assume: yi ∼ Normal(Θ, σi

2 + τ 2)

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 5 / 31

slide-9
SLIDE 9

Meta analysis

The random effects model

assume: yi ∼ Normal(Θ, σi

2 + τ 2)

ingredients:

Data: estimates yi standard errors σi Parameters: true parameter value Θ heterogeneity τ

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 5 / 31

slide-10
SLIDE 10

Meta analysis

The random effects model

assume: yi ∼ Normal(Θ, σi

2 + τ 2)

ingredients:

Data: estimates yi standard errors σi Parameters: true parameter value Θ heterogeneity τ

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 5 / 31

slide-11
SLIDE 11

Meta analysis

The random effects model

assume: yi ∼ Normal(Θ, σi

2 + τ 2)

ingredients:

Data: estimates yi standard errors σi Parameters: true parameter value Θ heterogeneity τ

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 5 / 31

slide-12
SLIDE 12

Meta analysis

The random effects model

assume: yi ∼ Normal(Θ, σi

2 + τ 2)

ingredients:

Data: estimates yi standard errors σi Parameters: true parameter value Θ heterogeneity τ

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 5 / 31

slide-13
SLIDE 13

Meta analysis

The random effects model

assume: yi ∼ Normal(Θ, σi

2 + τ 2)

ingredients:

Data: estimates yi standard errors σi Parameters: true parameter value Θ heterogeneity τ

Θ ∈ R of primary interest τ ∈ R+ nuisance parameter: account for (potential) incompatibility

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 5 / 31

slide-14
SLIDE 14

Meta analysis example

Motivation: background

effect Θ Θ # 7 # 6 # 5 # 4 # 3 # 2 # 1 120 140 160 180 200 220 240 140 160 180 200 0.00 0.01 0.02 0.03 0.04 0.05 0.06 effect Θ marginal posterior p(Θ)

estimation: via marginal posterior distribution of parameter Θ

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 6 / 31

slide-15
SLIDE 15

Meta analysis example

Motivation: two-parameter model & marginals

heterogeneity τ effect Θ

50% 90% 95% 99%

10 20 30 40 130 140 150 160 170 180 190 20 40 60 80 0.00 0.01 0.02 0.03 0.04 0.05 heterogeneity τ marginal posterior p(τ) 140 160 180 200 0.00 0.01 0.02 0.03 0.04 0.05 0.06 effect Θ marginal posterior p(Θ)

two unknowns: joint & marginal posterior distributions

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 7 / 31

slide-16
SLIDE 16

Meta analysis example

Motivation: two-parameter model, conditionals & marginals

here: easy to derive one of the marginals: p(τ|y) and conditional posteriors p(Θ|τ, y) p(τ|y) = . . . (. . . function of yi, σi,. . . ) p(Θ|τ, y) = Normal(µ = f1(τ), σ = f2(τ)) but main interest in other marginal: p(Θ|y) p(Θ|y) =

  • p(Θ|τ, y)
  • conditional

p(τ|y)

marginal

dτ is a mixture distribution

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 8 / 31

slide-17
SLIDE 17

Meta analysis example

Motivation: two-parameter model, conditionals & marginals

heterogeneity τ effect Θ 10 20 30 40 130 140 150 160 170 180 190 10 20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 heterogeneity τ marginal posterior p(τ) 130 140 150 160 170 180 190 0.00 0.02 0.04 0.06 0.08 0.10 effect Θ conditional posterior p(Θ|τi) effect Θ marginal posterior p(Θ|y) 130 140 150 160 170 180 190 0.00 0.01 0.02 0.03 0.04 0.05 0.06

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 9 / 31

slide-18
SLIDE 18

Meta analysis example

Motivation: two-parameter model, conditionals & marginals

heterogeneity τ effect Θ conditional mean conditional mean + sd conditional mean − sd 10 20 30 40 130 140 150 160 170 180 190 10 20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 heterogeneity τ marginal posterior p(τ) 130 140 150 160 170 180 190 0.00 0.02 0.04 0.06 0.08 0.10 effect Θ conditional posterior p(Θ|τi) effect Θ marginal posterior p(Θ|y) 130 140 150 160 170 180 190 0.00 0.01 0.02 0.03 0.04 0.05 0.06

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 9 / 31

slide-19
SLIDE 19

Meta analysis example

Motivation: two-parameter model, conditionals & marginals

heterogeneity τ effect Θ τ1 10 20 30 40 130 140 150 160 170 180 190 10 20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 heterogeneity τ marginal posterior p(τ) τ1 130 140 150 160 170 180 190 0.00 0.02 0.04 0.06 0.08 0.10 effect Θ conditional posterior p(Θ|τi) τ = τ1 effect Θ marginal posterior p(Θ|y) 130 140 150 160 170 180 190 0.00 0.01 0.02 0.03 0.04 0.05 0.06

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 9 / 31

slide-20
SLIDE 20

Meta analysis example

Motivation: two-parameter model, conditionals & marginals

heterogeneity τ effect Θ τ1 τ2 τ3 τ4 10 20 30 40 130 140 150 160 170 180 190 10 20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 heterogeneity τ marginal posterior p(τ) τ1 τ2 τ3 τ4 130 140 150 160 170 180 190 0.00 0.02 0.04 0.06 0.08 0.10 effect Θ conditional posterior p(Θ|τi) τ = τ1 τ = τ2 τ = τ3 τ = τ4 effect Θ marginal posterior p(Θ|y) 130 140 150 160 170 180 190 0.00 0.01 0.02 0.03 0.04 0.05 0.06

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 9 / 31

slide-21
SLIDE 21

Meta analysis example

Motivation: two-parameter model, conditionals & marginals

heterogeneity τ effect Θ τ1 τ2 τ3 τ4 10 20 30 40 130 140 150 160 170 180 190 10 20 30 40 50 0.00 0.01 0.02 0.03 0.04 0.05 heterogeneity τ marginal posterior p(τ) τ1 τ2 τ3 τ4 130 140 150 160 170 180 190 0.00 0.02 0.04 0.06 0.08 0.10 effect Θ conditional posterior p(Θ|τi) τ = τ1 τ = τ2 τ = τ3 τ = τ4 effect Θ marginal posterior p(Θ|y) 130 140 150 160 170 180 190 0.00 0.01 0.02 0.03 0.04 0.05 0.06

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 9 / 31

slide-22
SLIDE 22

Meta analysis example

Questions

approximating the continuous mixture through a discrete set of points in τ. . . actual marginal: p(Θ) =

  • p(Θ|τ) p(τ) dτ

approximation: p(Θ) ≈

  • i

p(Θ|τi) πi Questions:

how to set up the discrete grid of points? how well can we approximate? do we have a handle on accuracy?

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 10 / 31

slide-23
SLIDE 23

Meta analysis example

Motivation: discretizing a mixture

heterogeneity τ effect Θ τ1 τ2 τ3 τ4 10 20 30 40 130 140 150 160 170 180 190 130 140 150 160 170 180 190 0.00 0.02 0.04 0.06 0.08 0.10 effect Θ conditional posterior p(Θ|τi) τ = τ1 τ = τ2 τ = τ3 τ = τ4

Note: conditional distributions p(Θ|τ, y) are very different for τ1 and τ2 and rather similar for τ3 and τ4. idea: may need fewer bins for larger τ values...? . . . bin spacing based on similarity / dissimilarity of conditionals?

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 11 / 31

slide-24
SLIDE 24

Discretizing mixture distributions

Terminology and examples

random variables X, Y joint density p(x, y) = p(y|x) × p(x) marginal density p(y) =

  • p(y|x) p(x)dx

p(y) “mixture distribution” p(x) “mixing distribution” Examples:

Y|λ ∼ Poisson(λ), λ ∼ Gamma(α, σ) ⇒ Y ∼ Negative Binomial Y|p ∼ Binomial(p, N), p ∼ Beta(α, β) ⇒ Y ∼ Beta-Binomial Y|σ ∼ Normal(0, σ), σ = ν

X ,

X ∼ χ2

ν

⇒ Y ∼ Student-t . . .

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 12 / 31

slide-25
SLIDE 25

Discretizing mixture distributions

Setting up a binning

need: discretization of the mixing distribution p(x). domain of X: R (or subset) define bin margins: x(1) < x(2) < . . . < x(k−1) bins: Xi =    {x : x ≤ x(1)} if i = 1 {x : x(i−1) < x ≤ x(i)} if 1 < i < k {x : x(k−1) < x} if i = k. reference points: ˜ x1, . . . , ˜ xk, where ˜ xi ∈ Xi bin probabilities: πi = P

  • x(i−1) < x ≤ x(i)
  • = P
  • x ∈ Xi
  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 13 / 31

slide-26
SLIDE 26

Discretizing mixture distributions

Setting up a binned mixture

actual distribution: p(x, y) discrete approximation: q(x, y) same marginal (mixing distribution): q(x) = p(x) but “binned” conditionals: q(y|x) = p(y|x = ˜ xi) for x ∈ Xi. q similar to p, instead of conditioning on “exact” x, conditioning on corresponding bin’s reference point ˜ xi marginal: q(y) =

  • q(y|x) q(x) dx

=

  • i

πi p(y|˜ xi)

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 14 / 31

slide-27
SLIDE 27

Discretizing mixture distributions

Setting up a binned mixture

in previous example:

bin margins: τ(1) = 10, τ(2) = 20, τ(3) = 30 reference points: ˜ τ1 = 5, ˜ τ2 = 15, ˜ τ3 = 25, ˜ τ4 = 35 probabilities: π1 = 0.34, π2 = 0.44, π3 = 0.15, π4 = 0.07

heterogeneity τ effect Θ τ1 τ2 τ3 τ4 10 20 30 40 130 140 150 160 170 180 190 heterogeneity τ effect Θ τ(1) τ(2) τ(3) τ ~

1

τ ~

2

τ ~

3

τ ~

4

10 20 30 40 130 140 150 160 170 180 190 effect Θ marginal posterior p(Θ|y) 130 140 150 160 170 180 190 0.00 0.01 0.02 0.03 0.04 0.05 0.06

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 15 / 31

slide-28
SLIDE 28

Similarity / dissimilarity of distributions

Kullback-Leibler divergence

The Kullback-Leibler divergence of two distributions with density functions p and q is defined as DKL

  • p(θ)
  • q(θ)
  • =
  • Θ

log p(θ) q(θ)

  • p(θ) dθ

= Ep(θ)

  • log

p(θ) q(θ)

  • the KL-divergence

is always positive: DKL

  • p(θ)
  • q(θ)
  • ≥ 0

is not symmetric: DKL

  • p(θ)
  • q(θ)
  • = DKL
  • q(θ)
  • p(θ)
  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 16 / 31

slide-29
SLIDE 29

Similarity / dissimilarity of distributions

Symmetrized KL-divergence

The symmetrized KL-divergence of two distributions is defined as Ds

  • p(θ)
  • q(θ)
  • = DKL
  • p(θ)
  • q(θ)
  • + DKL
  • q(θ)
  • p(θ)
  • the symmetrized KL-divergence

is obviously symmetric: Ds

  • p(θ)
  • q(θ)
  • = Ds
  • q(θ)
  • p(θ)
  • bounds the individual directed divergences:

Ds

  • p(θ)
  • q(θ)
  • ≥ max
  • DKL
  • p(θ)
  • q(θ)
  • , DKL
  • q(θ)
  • p(θ)
  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 17 / 31

slide-30
SLIDE 30

Divergence

Interpretation

How to interpret divergences? heuristically: expected log ratio of densities. . . – relevant case here: p(x) ≈ q(x). log

  • p(x)

q(x)

p(x) q(x) − 1

(for p(x)

q(x) ≈ 1)

DKL

  • p(x), q(x)
  • = 0.01

corresponds to (expected) ≈ 1% difference in densities

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 18 / 31

slide-31
SLIDE 31

Divergence

Interpretation

Divergence for two normals: Ds

  • p(θ|µA, σA)
  • p(θ|µB, σB)
  • =

(µA − µB)2 1

2(σ−2 A

+ σ−2

B )

−1 + (σ2

A − σ2 B)2

2 σ2

A σ2 B

  • bvious special cases:

equal variances: σB = σA, µB = µA + cσA ⇒ Ds

  • pq) = c2

equal means: µB = µA, σB = (1+c)σA ⇒ Ds

  • pq) =

c2(c+2)2 2(c+1)2

≈ 2c2

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 19 / 31

slide-32
SLIDE 32

Divergence

Bin-wise maximum divergence: definition

consider: divergence between reference point and other points within each bin define: di = max

x∈Xi

  • Ds
  • p(y|x)
  • p(y|˜

xi)

  • = max

x∈Xi

  • Ds
  • p(y|x)
  • q(y|x)
  • ,

the bin-wise maximum divergence “worst-case discrepancy” introduced within each bin (note: symmetrized divergence Ds)

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 20 / 31

slide-33
SLIDE 33

Divergence

Bin-wise maximum divergence: example

heterogeneity τ effect Θ τ(1) τ(2) τ(3) τ ~

1

τ ~

2

τ ~

3

τ ~

4

10 20 30 40 130 140 150 160 170 180 190

recall: actual parameters of conditionals p(y|x) (in black)

  • vs. parameters of q(y|x) assumed through binning (in red)
  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 21 / 31

slide-34
SLIDE 34

Divergence

Bin-wise maximum divergence: example

heterogeneity τ effect Θ τ(1) τ(2) τ ~

2

10 12 14 16 18 20 150 155 160 165 170 effect Θ conditional density p(Θ|τ) 130 140 150 160 170 180 190 0.00 0.02 0.04 0.06 heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

2))

10 12 14 16 18 20 0.00 0.05 0.10 0.15 0.20

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 22 / 31

slide-35
SLIDE 35

Divergence

Bin-wise maximum divergence: example

heterogeneity τ effect Θ τ(1) τ(2) τ ~

2

10 12 14 16 18 20 150 155 160 165 170 effect Θ conditional density p(Θ|τ) 130 140 150 160 170 180 190 0.00 0.02 0.04 0.06 heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

2))

10 12 14 16 18 20 0.00 0.05 0.10 0.15 0.20

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 22 / 31

slide-36
SLIDE 36

Divergence

Bin-wise maximum divergence: example

heterogeneity τ effect Θ τ(1) τ(2) τ ~

2

10 12 14 16 18 20 150 155 160 165 170 effect Θ conditional density p(Θ|τ) 130 140 150 160 170 180 190 0.00 0.02 0.04 0.06 heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

2))

10 12 14 16 18 20 0.00 0.05 0.10 0.15 0.20

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 22 / 31

slide-37
SLIDE 37

Divergence

Bin-wise maximum divergence: example

heterogeneity τ effect Θ τ(1) τ(2) τ ~

2

10 12 14 16 18 20 150 155 160 165 170 effect Θ conditional density p(Θ|τ) 130 140 150 160 170 180 190 0.00 0.02 0.04 0.06 heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

2))

← maximum

10 12 14 16 18 20 0.00 0.05 0.10 0.15 0.20

determine maximum di for each bin i (usually at bin margin)

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 22 / 31

slide-38
SLIDE 38

Bounding divergence

Idea

now consider divergences

  • f true and approximate marginals p(y) and q(y)

(not the conditionals!) what about Ds

  • p(y)
  • q(y)
  • ?

having the individual bin-wise divergences di, we can show: Ds

  • p(y)
  • q(y)
  • i

πi di ≤ max

i

di in other words: by bounding bin-wise divergences (of conditionals) we can bound the overall divergence (of marginals)

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 23 / 31

slide-39
SLIDE 39

Discretizing mixtures

Mixture setup algorithm

heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

i))

τ ~

1

δ 0.00 0.01 0.0 0.5 1.0 1.5 2.0 2.5 3.0

1st reference point ˜ τ1 at zero

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 24 / 31

slide-40
SLIDE 40

Discretizing mixtures

Mixture setup algorithm

heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

i))

τ ~

1

τ(1) δ 0.00 0.01 0.0 0.5 1.0 1.5 2.0 2.5 3.0

1st reference point ˜ τ1 at zero, first margin τ(1) at 0.904

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 24 / 31

slide-41
SLIDE 41

Discretizing mixtures

Mixture setup algorithm

heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

i))

τ ~

1

τ(1) δ 0.00 0.01 0.0 0.5 1.0 1.5 2.0 2.5 3.0

1st reference point ˜ τ1 at zero, first margin τ(1) at 0.904 (. . . )

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 24 / 31

slide-42
SLIDE 42

Discretizing mixtures

Mixture setup algorithm

heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

i))

τ ~

1

τ(1) τ ~

2

δ 0.00 0.01 0.0 0.5 1.0 1.5 2.0 2.5 3.0

1st reference point ˜ τ1 at zero, first margin τ(1) at 0.904 (. . . )

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 24 / 31

slide-43
SLIDE 43

Discretizing mixtures

Mixture setup algorithm

heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

i))

τ ~

1

τ(1) τ ~

2

δ 0.00 0.01 0.0 0.5 1.0 1.5 2.0 2.5 3.0

1st reference point ˜ τ1 at zero, first margin τ(1) at 0.904 (. . . )

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 24 / 31

slide-44
SLIDE 44

Discretizing mixtures

Mixture setup algorithm

heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

i))

τ ~

1

τ(1) τ ~

2

τ(2) δ 0.00 0.01 0.0 0.5 1.0 1.5 2.0 2.5 3.0

1st reference point ˜ τ1 at zero, first margin τ(1) at 0.904 (. . . )

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 24 / 31

slide-45
SLIDE 45

Discretizing mixtures

Mixture setup algorithm

heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

i))

τ ~

1

τ(1) τ ~

2

τ(2) τ ~

3

τ(3) δ 0.00 0.01 0.0 0.5 1.0 1.5 2.0 2.5 3.0

1st reference point ˜ τ1 at zero, first margin τ(1) at 0.904 (. . . )

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 24 / 31

slide-46
SLIDE 46

Discretizing mixtures

Mixture setup algorithm

heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

i))

τ ~

1

τ(1) τ ~

2

τ(2) τ ~

3

τ(3) τ ~

4

τ(4) 1st bin 2nd bin 3rd bin 4th bin (...) δ 0.00 0.01 0.0 0.5 1.0 1.5 2.0 2.5 3.0

1st reference point ˜ τ1 at zero, first margin τ(1) at 0.904 (. . . ) result: binning with bounded divergence (≤ δ) per bin

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 24 / 31

slide-47
SLIDE 47

Discretizing mixtures

Mixture setup algorithm

heterogeneity τ divergence Ds(p(Θ|τ) || p(Θ|τ ~

i))

τ ~

1

τ(1) τ ~

2

τ(2) τ ~

3

τ(3) τ ~

4

τ(4) 1st bin 2nd bin 3rd bin 4th bin (...) δ 0.00 0.01 0.0 0.5 1.0 1.5 2.0 2.5 3.0

1st reference point ˜ τ1 at zero, first margin τ(1) at 0.904 (. . . ) result: binning with bounded divergence (≤ δ) per bin (when to stop?)

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 24 / 31

slide-48
SLIDE 48

Discretizing mixtures

General algorithm (variations possible)

1

Specify δ > 0, 0 ≤ ǫ ≪ 1, and starting reference point ˜ x1 (e.g. minimum possible value, or ǫ

2-quantile).

Define ǫ1 ≥ 0 as ǫ1 := P(X ≤ ˜ x1). Set i = 1.

2

Set x⋆ = ˜

  • x1. Obviously, Ds
  • p(y|˜

x1)

  • p(y|x⋆)
  • = 0.

Now increase x⋆ as far as possible while ensuring that Ds

  • p(y|˜

x1)

  • p(y|x⋆)
  • ≤ δ.

Use this point as the first bin margin: x(1) = x⋆. Compute π1 = P(x < x(1)). Set i = i + 1.

3

Increase x⋆ until Ds

  • p(y|x(i−1))
  • p(y|x⋆)
  • = δ. Use this point as

the next reference point: ˜ xi = x⋆.

4

Increase x⋆ again until Ds

  • p(y|˜

xi)

  • p(y|x⋆)
  • = δ. Use this point as

the next bin margin: x(i) = x⋆.

5

Compute the bin weight πi = P(x(i−1) < X ≤ x(i)).

6

If P(X > x(i)) > (ǫ − ǫ1), set i = i + 1 and proceed at step 3. Otherwise stop.

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 25 / 31

slide-49
SLIDE 49

Discretizing mixtures

General algorithm

remaining issue: ignored ǫ > 0 tail probability (usually: problems at domain’s margins) (is there a way to define a criterion “jointly”?)

  • nly need to keep track of reference points ˜

xi and probabilities πi meta-analysis example: 35 reference (“support”) points required (δ = 0.01, ǫ = 0.001)

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 26 / 31

slide-50
SLIDE 50

Discretizing mixtures

General algorithm

remaining issue: ignored ǫ > 0 tail probability (usually: problems at domain’s margins) (is there a way to define a criterion “jointly”?)

  • nly need to keep track of reference points ˜

xi and probabilities πi meta-analysis example: 35 reference (“support”) points required (δ = 0.01, ǫ = 0.001)

heterogeneity τ effect Θ

50% 90% 9 5 % 99%

10 20 30 40 130 140 150 160 170 180 190

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 26 / 31

slide-51
SLIDE 51

Discretizing mixtures

General algorithm

remaining issue: ignored ǫ > 0 tail probability (usually: problems at domain’s margins) (is there a way to define a criterion “jointly”?)

  • nly need to keep track of reference points ˜

xi and probabilities πi meta-analysis example: 35 reference (“support”) points required (δ = 0.01, ǫ = 0.001)

heterogeneity τ effect Θ

50% 90% 9 5 % 99%

10 20 30 40 130 140 150 160 170 180 190

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 26 / 31

slide-52
SLIDE 52

Discretizing mixtures

Example application: Student-t distribution

Student-t distribution also arises as a mixture distribution:

draw X from a χ2

ν-distribution

calculate σ = ν

X

draw Y|σ from a Normal(0, σ2)-distribution marginal of Y is Student-t (with ν d.f.)

set:

aimed for divergence: δ = 0.01 neglected tail probability: ǫ = 0.001 first reference point: ˜ x1 = ǫ

2-quantile of χ2 ν-distribution

  • iterate. . .
  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 27 / 31

slide-53
SLIDE 53

Discretizing mixtures

Example application: Student-t distribution

x σ = ν x 5 10 15 20 25 1 2 3 4 5 6 7 x P(X ≤ x) 5 10 15 20 25 0.0 0.2 0.4 0.6 0.8 1.0

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 28 / 31

slide-54
SLIDE 54

Discretizing mixtures

Example application: Student-t distribution

x σ = ν x 5 10 15 20 25 1 2 3 4 5 6 7 x P(X ≤ x) 5 10 15 20 25 0.0 0.2 0.4 0.6 0.8 1.0

algorithm yields 19 reference points ˜ xi (δ = 0.01, ǫ = 0.001)

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 28 / 31

slide-55
SLIDE 55

Discretizing mixtures

Example application: Student-t distribution

x σ = ν x 5 10 15 20 25 1 2 3 4 5 6 7 x P(X ≤ x) 5 10 15 20 25 0.0 0.2 0.4 0.6 0.8 1.0

algorithm yields 19 reference points ˜ xi (δ = 0.01, ǫ = 0.001)

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 28 / 31

slide-56
SLIDE 56

Discretizing mixtures

Example application: Student-t distribution

y conditional p(y|x ~

i)

−4 −2 2 4

19 conditionals p(y|˜ xi)

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 29 / 31

slide-57
SLIDE 57

Discretizing mixtures

Example application: Student-t distribution

y weighted conditional p(y|x ~

i) × πi

−4 −2 2 4

19 conditionals p(y|˜ xi) → weighted conditionals p(y|˜ xi) × πi

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 29 / 31

slide-58
SLIDE 58

Discretizing mixtures

Example application: Student-t distribution

y mixture ∑

i

p(y|x ~

i) × πi

−4 −2 2 4

19 conditionals p(y|˜ xi) → weighted conditionals p(y|˜ xi) × πi → discrete mixture

i p(y|˜

xi) × πi

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 29 / 31

slide-59
SLIDE 59

Discretizing mixtures

Example application: Student-t distribution

y mixture ∑

i

p(y|x ~

i) × πi

Student−t ν=5 −4 −2 2 4

19 conditionals p(y|˜ xi) → weighted conditionals p(y|˜ xi) × πi → discrete mixture

i p(y|˜

xi) × πi

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 29 / 31

slide-60
SLIDE 60

Discretizing mixtures

Example application: Student-t distribution

y 2 4 6 8 10 p(y) y 2 4 6 8 10 0.99 1.00 1.01 1.02 q(y) p(y)

how well do we do?

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 30 / 31

slide-61
SLIDE 61

Discretizing mixtures

Example application: Student-t distribution

y 2 4 6 8 10 p(y) y 2 4 6 8 10 0.99 1.00 1.01 1.02 q(y) p(y) −0.01 0.00 +0.01 +0.02 log(q(y) p(y))

how well do we do?

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 30 / 31

slide-62
SLIDE 62

Discretizing mixtures

Example application: Student-t distribution

y 2 4 6 8 10 p(y) y +δ −δ 2 4 6 8 10 0.99 1.00 1.01 1.02 q(y) p(y) −0.01 0.00 +0.01 +0.02 log(q(y) p(y))

how well do we do?

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 30 / 31

slide-63
SLIDE 63

Discretizing mixtures

Example application: Student-t distribution

y 2 4 6 8 10 p(y) y +δ −δ 2 4 6 8 10 0.99 1.00 1.01 1.02 q(y) p(y) −0.01 0.00 +0.01 +0.02 log(q(y) p(y))

how well do we do? → compute divergences numerically: DKL

  • p(θ)
  • q(θ)
  • = 0.000035, DKL
  • q(θ)
  • p(θ)
  • = 0.000013,

Ds

  • p(θ)
  • q(θ)
  • = 0.000048
  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 30 / 31

slide-64
SLIDE 64

Conclusions

discrete approximation allows to compute density, quantiles, moments,. . . algorithm yields quick-and-easy solution need to specify error budget in terms of

divergence δ tail probability ǫ

also makes sense for discrete marginals p(x)

  • ther strategies possible, e.g. aiming not for (bin-wise) maximum

divergence di, but for conditional expectation... higher dimensions: should work in priciple, probably tricky random-effects meta-analysis implemented in bmeta R package methods paper in preparation

  • C. R¨
  • ver

Implementing mixture approximations December 5, 2014 31 / 31