Bayesian inference for logistic models using Polya-gamma latent - - PowerPoint PPT Presentation

bayesian inference for logistic models using polya gamma
SMART_READER_LITE
LIVE PREVIEW

Bayesian inference for logistic models using Polya-gamma latent - - PowerPoint PPT Presentation

Bayesian inference for logistic models using Polya-gamma latent variables Nicholas G. Polson, James G. Scott, Jesse Windle (Slides based on slides by James G. Scott) 1 Modeling binary data Age AgeGroup Race Completed InsuranceType


slide-1
SLIDE 1

Bayesian inference for logistic models using Polya-gamma latent variables

Nicholas G. Polson, James G. Scott, Jesse Windle (Slides based on slides by James G. Scott)

1

slide-2
SLIDE 2

Modeling binary data

Age AgeGroup Race Completed InsuranceType Location PracticeType 515 21 18to26 Black 0 Military Odenton FamilyPractice 423 21 18to26 Black 0 PrivatePayer Odenton FamilyPractice 388 17 11to17 White 0 PrivatePayer Odenton Pediatric 6 11 11to17 Black 0 Medicaid Odenton Pediatric 1104 19 18to26 Black 0 Medicaid Bayview Pediatric 1412 19 18to26 Black 0 Medicaid JohnsHopkins OBGYN 1354 24 18to26 White 0 PrivatePayer JohnsHopkins OBGYN 318 18 18to26 Black 1 Military Odenton FamilyPractice 768 24 18to26 White 1 PrivatePayer Odenton OBGYN 29 13 11to17 Other/Unknown 0 PrivatePayer Odenton FamilyPractice 1173 14 11to17 Hispanic 0 PrivatePayer Bayview Pediatric 799 24 18to26 White 0 PrivatePayer Odenton OBGYN 633 24 18to26 White 1 PrivatePayer WhiteMarsh OBGYN 111 13 11to17 Other/Unknown 0 Medicaid Odenton Pediatric 69 15 11to17 Black 0 PrivatePayer Odenton FamilyPractice 559 12 11to17 Black 0 Military Odenton Pediatric 1289 26 18to26 White 1 HospitalBased Bayview OBGYN 1127 18 11to17 White 0 Medicaid Bayview Pediatric 1250 18 11to17 Black 0 PrivatePayer Bayview Pediatric 1098 15 11to17 White 1 Medicaid Bayview Pediatric 378 12 11to17 White 1 Military Odenton FamilyPractice 702 26 18to26 White 0 PrivatePayer WhiteMarsh OBGYN ....

Idea: p(y|x) = f(β⊤x)

2

slide-3
SLIDE 3

Modeling binary data

Age AgeGroup Race Completed InsuranceType Location PracticeType 515 21 18to26 Black 0 Military Odenton FamilyPractice 423 21 18to26 Black 0 PrivatePayer Odenton FamilyPractice 388 17 11to17 White 0 PrivatePayer Odenton Pediatric 6 11 11to17 Black 0 Medicaid Odenton Pediatric 1104 19 18to26 Black 0 Medicaid Bayview Pediatric 1412 19 18to26 Black 0 Medicaid JohnsHopkins OBGYN 1354 24 18to26 White 0 PrivatePayer JohnsHopkins OBGYN 318 18 18to26 Black 1 Military Odenton FamilyPractice 768 24 18to26 White 1 PrivatePayer Odenton OBGYN 29 13 11to17 Other/Unknown 0 PrivatePayer Odenton FamilyPractice 1173 14 11to17 Hispanic 0 PrivatePayer Bayview Pediatric 799 24 18to26 White 0 PrivatePayer Odenton OBGYN 633 24 18to26 White 1 PrivatePayer WhiteMarsh OBGYN 111 13 11to17 Other/Unknown 0 Medicaid Odenton Pediatric 69 15 11to17 Black 0 PrivatePayer Odenton FamilyPractice 559 12 11to17 Black 0 Military Odenton Pediatric 1289 26 18to26 White 1 HospitalBased Bayview OBGYN 1127 18 11to17 White 0 Medicaid Bayview Pediatric 1250 18 11to17 Black 0 PrivatePayer Bayview Pediatric 1098 15 11to17 White 1 Medicaid Bayview Pediatric 378 12 11to17 White 1 Military Odenton FamilyPractice 702 26 18to26 White 0 PrivatePayer WhiteMarsh OBGYN ....

Idea: p(y|x) = f(β⊤x) Non-Bayesian: logistic regression Bayesian: probit regression

2

slide-4
SLIDE 4

Why probit?

  • Simple auxiliary variable trick (Albert and Chib)

3

slide-5
SLIDE 5
  • L(β) =

N

  • i=1

Li(β) =

N

  • i=1

{Φ(xT

i β)}1−yi · {1 − Φ(xT i β)}yi

Li(β) = 1 √ 2

  • Ai

exp{−(zi − xT

i β)2/2} dzi

Ai = (−∞, 0) , yi = 0 (0, ∞) , yi = 1 .

  • 2

2 4

  • 2

2 4

  • 4
slide-6
SLIDE 6

Auxiliary variables

  • p(β | Y )

∝ p(β)L(β) = p(β)

N

  • i=1
  • Ai

(zi; xT

i β, 1) dzi

=

  • Rn I
  • z ∈

n

  • i=1

Ai

  • p(β)

N

  • i=1

(zi; xT

i β, 1) dz

  • Rn p(β, z | Y )dz
  • 5
slide-7
SLIDE 7

Auxiliary variables

  • p(β | Y )

∝ p(β)L(β) = p(β)

N

  • i=1
  • Ai

(zi; xT

i β, 1) dzi

=

  • Rn I
  • z ∈

n

  • i=1

Ai

  • p(β)

N

  • i=1

(zi; xT

i β, 1) dz

  • Rn p(β, z | Y )dz
  • Gibbs sampling: alternate between p(β|z, Y) (Gaussian) and

p(z|β, Y) (Truncated Gaussian)

5

slide-8
SLIDE 8

Auxiliary variables

  • p(β | Y )

∝ p(β)L(β) = p(β)

N

  • i=1
  • Ai

(zi; xT

i β, 1) dzi

=

  • Rn I
  • z ∈

n

  • i=1

Ai

  • p(β)

N

  • i=1

(zi; xT

i β, 1) dz

  • Rn p(β, z | Y )dz
  • Gibbs sampling: alternate between p(β|z, Y) (Gaussian) and

p(z|β, Y) (Truncated Gaussian) Similar ideas for models involving fancier likelihoods (binomial, negative binomial etc.)

5

slide-9
SLIDE 9

Auxiliary variable representation for logistic likelihood?

  • p(β | Y ) ∝ p(β) ·

N

  • i=1

{exp(xT

i β)}yi

1 + exp(xT

i β) ?

=

  • p(β, z | Y )dz .

6

slide-10
SLIDE 10

Polya Gamma distribution

X

D

= 1 2π2

  • k=1

gk (k − 1/2)2 + c2/(4π2) gk

iid

∼ (b, 1)

  • X ∼ (b, c)
  • 7
slide-11
SLIDE 11

Polya Gamma distribution

X

D

= 1 2π2

  • k=1

gk (k − 1/2)2 + c2/(4π2) gk

iid

∼ (b, 1)

  • X ∼ (b, c)
  • (eψ)a

(1 + eψ)b = 2−beκψ ∞ e−ωψ2/2 p(ω) dω , κ = a − b/2 ω ∼ (b, 0)

7

slide-12
SLIDE 12

Polya Gamma distribution

  • PG(1, 0) has laplace transform cosh−1(
  • t/2)
  • PG(1, 0) is infinite sum of exponentials

8

slide-13
SLIDE 13

Polya Gamma distribution

  • PG(1, 0) has laplace transform cosh−1(
  • t/2)
  • PG(1, 0) is infinite sum of exponentials
  • PG(b, 0) has laplace transform cosh−b(
  • t/2)
  • PG(b, c): p(w|b, c) ∝ exp(− c2w

2 )p(w|b, 0)

  • Laplace transform useful to derive moments
  • Can also be represented as an alternating sign sum of

inverse gaussian densities (later)

8

slide-14
SLIDE 14

Model and Inference

  • yi

ni xi = (xi1, . . . , xip) = (y1 − n1/2, . . . , yN − nN/2) β ∼ (b, B)

  • 9
slide-15
SLIDE 15

Model and Inference

  • yi

ni xi = (xi1, . . . , xip) = (y1 − n1/2, . . . , yN − nN/2) β ∼ (b, B)

  • (i | β)

∼ (ni, xT

i β)

(β | y, ) ∼ (mω, Vω) ,

= (XT ΩX + B−1)−1 mω = Vω(XT + B−1b) Ω = (1, . . . , N) .

9

slide-16
SLIDE 16

PG(b,0)

More Gaussian-ish as b increases

10

slide-17
SLIDE 17

PG(1,c)

More like a point mass as c increases

11

slide-18
SLIDE 18

How to sample from PG?

We do this using a simple, efficient rejection sampler.

The proposal: exponential, uniform, and normal draws. Checking for acceptance: roughly like one IG density evaluation. Acceptance probability: in practice, usually better than 0.9998 . . . and uniformly bounded below at 0.9992.

  • X

D

= 1 2π2

  • k=1

gk (k − 1/2)2 + c2/(4π2) .

12

slide-19
SLIDE 19

Rejection sampler

13

slide-20
SLIDE 20

Squeeze principle

14

slide-21
SLIDE 21

Squeeze principle

15

slide-22
SLIDE 22

Squeeze∞

16

slide-23
SLIDE 23

Squeeze∞

Implemented in BayesLogit R package

16

slide-24
SLIDE 24

Some interesting extensions

  • ”Efficient Data Augmentation in Dynamic Models for Binary

and Count Data” by Windle et al.

– Extension of this idea to sequences of binary/count data, where βt (parameter) dynamics are Gaussian – Conditioned on ψi, can run exact sampler for β’s (forward filter backward sampling)

  • Neuroscience (e.g. spike counts)
  • Network data where connections vary with time
  • Sports data where probability of win drifts over time

17

slide-25
SLIDE 25

Thanks!

18