Bayesian generalized linear models and an appropriate default prior - - PowerPoint PPT Presentation

bayesian generalized linear models and an appropriate
SMART_READER_LITE
LIVE PREVIEW

Bayesian generalized linear models and an appropriate default prior - - PowerPoint PPT Presentation

Logistic regression Weakly informative priors Conclusions Bayesian generalized linear models and an appropriate default prior Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung Su Columbia University 14 August 2008 Gelman,


slide-1
SLIDE 1

Logistic regression Weakly informative priors Conclusions

Bayesian generalized linear models and an appropriate default prior

Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung Su Columbia University 14 August 2008

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-2
SLIDE 2

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

Logistic regression

−6 −4 −2 2 4 6 0.0 0.2 0.4 0.6 0.8 1.0 y = logit−1(x) x logit−1(x)

slope = 1/4

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-3
SLIDE 3

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

A clean example

−10 10 20 0.0 0.2 0.4 0.6 0.8 1.0 estimated Pr(y=1) = logit−1(−1.40 + 0.33 x) x y slope = 0.33/4

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-4
SLIDE 4

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

The problem of separation

−6 −4 −2 2 4 6 0.0 0.2 0.4 0.6 0.8 1.0 slope = infinity? x y

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-5
SLIDE 5

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

Separation is no joke!

glm (vote ~ female + black + income, family=binomial(link="logit")) 1960 1968 coef.est coef.se coef.est coef.se (Intercept) -0.14 0.23 (Intercept) 0.47 0.24 female 0.24 0.14 female

  • 0.01

0.15 black

  • 1.03

0.36 black

  • 3.64

0.59 income 0.03 0.06 income

  • 0.03

0.07 1964 1972 coef.est coef.se coef.est coef.se (Intercept)

  • 1.15

0.22 (Intercept) 0.67 0.18 female

  • 0.09

0.14 female

  • 0.25

0.12 black

  • 16.83

420.40 black

  • 2.63

0.27 income 0.19 0.06 income 0.09 0.05

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-6
SLIDE 6

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

bayesglm()

◮ Bayesian logistic regression ◮ In the arm (Applied Regression and Multilevel modeling)

package

◮ Replaces glm(), estimates are more numerically and

computationally stable

◮ Student-t prior distributions for regression coefs ◮ Use EM-like algorithm ◮ We went inside glm.fit to augment the iteratively weighted

least squares step

◮ Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-7
SLIDE 7

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

bayesglm()

◮ Bayesian logistic regression ◮ In the arm (Applied Regression and Multilevel modeling)

package

◮ Replaces glm(), estimates are more numerically and

computationally stable

◮ Student-t prior distributions for regression coefs ◮ Use EM-like algorithm ◮ We went inside glm.fit to augment the iteratively weighted

least squares step

◮ Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-8
SLIDE 8

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

bayesglm()

◮ Bayesian logistic regression ◮ In the arm (Applied Regression and Multilevel modeling)

package

◮ Replaces glm(), estimates are more numerically and

computationally stable

◮ Student-t prior distributions for regression coefs ◮ Use EM-like algorithm ◮ We went inside glm.fit to augment the iteratively weighted

least squares step

◮ Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-9
SLIDE 9

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

bayesglm()

◮ Bayesian logistic regression ◮ In the arm (Applied Regression and Multilevel modeling)

package

◮ Replaces glm(), estimates are more numerically and

computationally stable

◮ Student-t prior distributions for regression coefs ◮ Use EM-like algorithm ◮ We went inside glm.fit to augment the iteratively weighted

least squares step

◮ Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-10
SLIDE 10

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

bayesglm()

◮ Bayesian logistic regression ◮ In the arm (Applied Regression and Multilevel modeling)

package

◮ Replaces glm(), estimates are more numerically and

computationally stable

◮ Student-t prior distributions for regression coefs ◮ Use EM-like algorithm ◮ We went inside glm.fit to augment the iteratively weighted

least squares step

◮ Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-11
SLIDE 11

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

bayesglm()

◮ Bayesian logistic regression ◮ In the arm (Applied Regression and Multilevel modeling)

package

◮ Replaces glm(), estimates are more numerically and

computationally stable

◮ Student-t prior distributions for regression coefs ◮ Use EM-like algorithm ◮ We went inside glm.fit to augment the iteratively weighted

least squares step

◮ Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-12
SLIDE 12

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

bayesglm()

◮ Bayesian logistic regression ◮ In the arm (Applied Regression and Multilevel modeling)

package

◮ Replaces glm(), estimates are more numerically and

computationally stable

◮ Student-t prior distributions for regression coefs ◮ Use EM-like algorithm ◮ We went inside glm.fit to augment the iteratively weighted

least squares step

◮ Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-13
SLIDE 13

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

bayesglm()

◮ Bayesian logistic regression ◮ In the arm (Applied Regression and Multilevel modeling)

package

◮ Replaces glm(), estimates are more numerically and

computationally stable

◮ Student-t prior distributions for regression coefs ◮ Use EM-like algorithm ◮ We went inside glm.fit to augment the iteratively weighted

least squares step

◮ Default choices for tuning parameters (we’ll get back to this!)

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-14
SLIDE 14

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

Regularization in action!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-15
SLIDE 15

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

What else is out there?

◮ glm (maximum likelihood): fails under separation, gives noisy

answers for sparse data

◮ Augment with prior “successes” and “failures”: doesn’t work

well for multiple predictors

◮ brlr (Jeffreys-like prior distribution): computationally

unstable

◮ brglm (improvement on brlr): doesn’t do enough smoothing ◮ BBR (Laplace prior distribution): OK, not quite as good as

bayesglm

◮ Non-Bayesian machine learning algorithms: understate

uncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-16
SLIDE 16

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

What else is out there?

◮ glm (maximum likelihood): fails under separation, gives noisy

answers for sparse data

◮ Augment with prior “successes” and “failures”: doesn’t work

well for multiple predictors

◮ brlr (Jeffreys-like prior distribution): computationally

unstable

◮ brglm (improvement on brlr): doesn’t do enough smoothing ◮ BBR (Laplace prior distribution): OK, not quite as good as

bayesglm

◮ Non-Bayesian machine learning algorithms: understate

uncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-17
SLIDE 17

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

What else is out there?

◮ glm (maximum likelihood): fails under separation, gives noisy

answers for sparse data

◮ Augment with prior “successes” and “failures”: doesn’t work

well for multiple predictors

◮ brlr (Jeffreys-like prior distribution): computationally

unstable

◮ brglm (improvement on brlr): doesn’t do enough smoothing ◮ BBR (Laplace prior distribution): OK, not quite as good as

bayesglm

◮ Non-Bayesian machine learning algorithms: understate

uncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-18
SLIDE 18

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

What else is out there?

◮ glm (maximum likelihood): fails under separation, gives noisy

answers for sparse data

◮ Augment with prior “successes” and “failures”: doesn’t work

well for multiple predictors

◮ brlr (Jeffreys-like prior distribution): computationally

unstable

◮ brglm (improvement on brlr): doesn’t do enough smoothing ◮ BBR (Laplace prior distribution): OK, not quite as good as

bayesglm

◮ Non-Bayesian machine learning algorithms: understate

uncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-19
SLIDE 19

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

What else is out there?

◮ glm (maximum likelihood): fails under separation, gives noisy

answers for sparse data

◮ Augment with prior “successes” and “failures”: doesn’t work

well for multiple predictors

◮ brlr (Jeffreys-like prior distribution): computationally

unstable

◮ brglm (improvement on brlr): doesn’t do enough smoothing ◮ BBR (Laplace prior distribution): OK, not quite as good as

bayesglm

◮ Non-Bayesian machine learning algorithms: understate

uncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-20
SLIDE 20

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

What else is out there?

◮ glm (maximum likelihood): fails under separation, gives noisy

answers for sparse data

◮ Augment with prior “successes” and “failures”: doesn’t work

well for multiple predictors

◮ brlr (Jeffreys-like prior distribution): computationally

unstable

◮ brglm (improvement on brlr): doesn’t do enough smoothing ◮ BBR (Laplace prior distribution): OK, not quite as good as

bayesglm

◮ Non-Bayesian machine learning algorithms: understate

uncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-21
SLIDE 21

Logistic regression Weakly informative priors Conclusions Classical logistic regression The problem of separation Bayesian solution

What else is out there?

◮ glm (maximum likelihood): fails under separation, gives noisy

answers for sparse data

◮ Augment with prior “successes” and “failures”: doesn’t work

well for multiple predictors

◮ brlr (Jeffreys-like prior distribution): computationally

unstable

◮ brglm (improvement on brlr): doesn’t do enough smoothing ◮ BBR (Laplace prior distribution): OK, not quite as good as

bayesglm

◮ Non-Bayesian machine learning algorithms: understate

uncertainty in predictions

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-22
SLIDE 22

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Information in prior distributions

◮ Informative prior dist

◮ A full generative model for the data

◮ Noninformative prior dist

◮ Let the data speak ◮ Goal: valid inference for any θ

◮ Weakly informative prior dist

◮ Purposely include less information than we actually have ◮ Goal: regularization, stabilization Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-23
SLIDE 23

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Information in prior distributions

◮ Informative prior dist

◮ A full generative model for the data

◮ Noninformative prior dist

◮ Let the data speak ◮ Goal: valid inference for any θ

◮ Weakly informative prior dist

◮ Purposely include less information than we actually have ◮ Goal: regularization, stabilization Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-24
SLIDE 24

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Information in prior distributions

◮ Informative prior dist

◮ A full generative model for the data

◮ Noninformative prior dist

◮ Let the data speak ◮ Goal: valid inference for any θ

◮ Weakly informative prior dist

◮ Purposely include less information than we actually have ◮ Goal: regularization, stabilization Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-25
SLIDE 25

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Information in prior distributions

◮ Informative prior dist

◮ A full generative model for the data

◮ Noninformative prior dist

◮ Let the data speak ◮ Goal: valid inference for any θ

◮ Weakly informative prior dist

◮ Purposely include less information than we actually have ◮ Goal: regularization, stabilization Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-26
SLIDE 26

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Information in prior distributions

◮ Informative prior dist

◮ A full generative model for the data

◮ Noninformative prior dist

◮ Let the data speak ◮ Goal: valid inference for any θ

◮ Weakly informative prior dist

◮ Purposely include less information than we actually have ◮ Goal: regularization, stabilization Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-27
SLIDE 27

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Information in prior distributions

◮ Informative prior dist

◮ A full generative model for the data

◮ Noninformative prior dist

◮ Let the data speak ◮ Goal: valid inference for any θ

◮ Weakly informative prior dist

◮ Purposely include less information than we actually have ◮ Goal: regularization, stabilization Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-28
SLIDE 28

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Information in prior distributions

◮ Informative prior dist

◮ A full generative model for the data

◮ Noninformative prior dist

◮ Let the data speak ◮ Goal: valid inference for any θ

◮ Weakly informative prior dist

◮ Purposely include less information than we actually have ◮ Goal: regularization, stabilization Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-29
SLIDE 29

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Information in prior distributions

◮ Informative prior dist

◮ A full generative model for the data

◮ Noninformative prior dist

◮ Let the data speak ◮ Goal: valid inference for any θ

◮ Weakly informative prior dist

◮ Purposely include less information than we actually have ◮ Goal: regularization, stabilization Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-30
SLIDE 30

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Information in prior distributions

◮ Informative prior dist

◮ A full generative model for the data

◮ Noninformative prior dist

◮ Let the data speak ◮ Goal: valid inference for any θ

◮ Weakly informative prior dist

◮ Purposely include less information than we actually have ◮ Goal: regularization, stabilization Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-31
SLIDE 31

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Weakly informative priors for logistic regression coefficients

◮ Separation in logistic regression ◮ Some prior info: logistic regression coefs are almost always

between −5 and 5:

◮ 5 on the logit scale takes you from 0.01 to 0.50

  • r from 0.50 to 0.99

◮ Smoking and lung cancer

◮ Independent Cauchy prior dists with center 0 and scale 2.5 ◮ Rescale each predictor to have mean 0 and sd 1 2 ◮ Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-32
SLIDE 32

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Weakly informative priors for logistic regression coefficients

◮ Separation in logistic regression ◮ Some prior info: logistic regression coefs are almost always

between −5 and 5:

◮ 5 on the logit scale takes you from 0.01 to 0.50

  • r from 0.50 to 0.99

◮ Smoking and lung cancer

◮ Independent Cauchy prior dists with center 0 and scale 2.5 ◮ Rescale each predictor to have mean 0 and sd 1 2 ◮ Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-33
SLIDE 33

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Weakly informative priors for logistic regression coefficients

◮ Separation in logistic regression ◮ Some prior info: logistic regression coefs are almost always

between −5 and 5:

◮ 5 on the logit scale takes you from 0.01 to 0.50

  • r from 0.50 to 0.99

◮ Smoking and lung cancer

◮ Independent Cauchy prior dists with center 0 and scale 2.5 ◮ Rescale each predictor to have mean 0 and sd 1 2 ◮ Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-34
SLIDE 34

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Weakly informative priors for logistic regression coefficients

◮ Separation in logistic regression ◮ Some prior info: logistic regression coefs are almost always

between −5 and 5:

◮ 5 on the logit scale takes you from 0.01 to 0.50

  • r from 0.50 to 0.99

◮ Smoking and lung cancer

◮ Independent Cauchy prior dists with center 0 and scale 2.5 ◮ Rescale each predictor to have mean 0 and sd 1 2 ◮ Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-35
SLIDE 35

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Weakly informative priors for logistic regression coefficients

◮ Separation in logistic regression ◮ Some prior info: logistic regression coefs are almost always

between −5 and 5:

◮ 5 on the logit scale takes you from 0.01 to 0.50

  • r from 0.50 to 0.99

◮ Smoking and lung cancer

◮ Independent Cauchy prior dists with center 0 and scale 2.5 ◮ Rescale each predictor to have mean 0 and sd 1 2 ◮ Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-36
SLIDE 36

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Weakly informative priors for logistic regression coefficients

◮ Separation in logistic regression ◮ Some prior info: logistic regression coefs are almost always

between −5 and 5:

◮ 5 on the logit scale takes you from 0.01 to 0.50

  • r from 0.50 to 0.99

◮ Smoking and lung cancer

◮ Independent Cauchy prior dists with center 0 and scale 2.5 ◮ Rescale each predictor to have mean 0 and sd 1 2 ◮ Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-37
SLIDE 37

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Weakly informative priors for logistic regression coefficients

◮ Separation in logistic regression ◮ Some prior info: logistic regression coefs are almost always

between −5 and 5:

◮ 5 on the logit scale takes you from 0.01 to 0.50

  • r from 0.50 to 0.99

◮ Smoking and lung cancer

◮ Independent Cauchy prior dists with center 0 and scale 2.5 ◮ Rescale each predictor to have mean 0 and sd 1 2 ◮ Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-38
SLIDE 38

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Weakly informative priors for logistic regression coefficients

◮ Separation in logistic regression ◮ Some prior info: logistic regression coefs are almost always

between −5 and 5:

◮ 5 on the logit scale takes you from 0.01 to 0.50

  • r from 0.50 to 0.99

◮ Smoking and lung cancer

◮ Independent Cauchy prior dists with center 0 and scale 2.5 ◮ Rescale each predictor to have mean 0 and sd 1 2 ◮ Fast implementation using EM; easy adaptation of glm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-39
SLIDE 39

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Prior distributions

−10 −5 5 10 θ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-40
SLIDE 40

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Another example

Dose #deaths/#animals −0.86 0/5 −0.30 1/5 −0.05 3/5 0.73 5/5

◮ Slope of a logistic regression of Pr(death) on dose:

◮ Maximum likelihood est is 7.8 ± 4.9 ◮ With weakly-informative prior: Bayes est is 4.4 ± 1.9

◮ Which is truly conservative? ◮ The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-41
SLIDE 41

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Another example

Dose #deaths/#animals −0.86 0/5 −0.30 1/5 −0.05 3/5 0.73 5/5

◮ Slope of a logistic regression of Pr(death) on dose:

◮ Maximum likelihood est is 7.8 ± 4.9 ◮ With weakly-informative prior: Bayes est is 4.4 ± 1.9

◮ Which is truly conservative? ◮ The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-42
SLIDE 42

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Another example

Dose #deaths/#animals −0.86 0/5 −0.30 1/5 −0.05 3/5 0.73 5/5

◮ Slope of a logistic regression of Pr(death) on dose:

◮ Maximum likelihood est is 7.8 ± 4.9 ◮ With weakly-informative prior: Bayes est is 4.4 ± 1.9

◮ Which is truly conservative? ◮ The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-43
SLIDE 43

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Another example

Dose #deaths/#animals −0.86 0/5 −0.30 1/5 −0.05 3/5 0.73 5/5

◮ Slope of a logistic regression of Pr(death) on dose:

◮ Maximum likelihood est is 7.8 ± 4.9 ◮ With weakly-informative prior: Bayes est is 4.4 ± 1.9

◮ Which is truly conservative? ◮ The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-44
SLIDE 44

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Another example

Dose #deaths/#animals −0.86 0/5 −0.30 1/5 −0.05 3/5 0.73 5/5

◮ Slope of a logistic regression of Pr(death) on dose:

◮ Maximum likelihood est is 7.8 ± 4.9 ◮ With weakly-informative prior: Bayes est is 4.4 ± 1.9

◮ Which is truly conservative? ◮ The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-45
SLIDE 45

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Another example

Dose #deaths/#animals −0.86 0/5 −0.30 1/5 −0.05 3/5 0.73 5/5

◮ Slope of a logistic regression of Pr(death) on dose:

◮ Maximum likelihood est is 7.8 ± 4.9 ◮ With weakly-informative prior: Bayes est is 4.4 ± 1.9

◮ Which is truly conservative? ◮ The sociology of shrinkage

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-46
SLIDE 46

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Maximum likelihood and Bayesian estimates

Dose Probability of death 10 20 0.0 0.5 1.0 glm bayesglm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-47
SLIDE 47

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Conservatism of Bayesian inference

◮ Problems with maximum likelihood when data show

separation:

◮ Coefficient estimate of −∞ ◮ Estimated predictive probability of 0 for new cases

◮ Is this conservative? ◮ Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-48
SLIDE 48

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Conservatism of Bayesian inference

◮ Problems with maximum likelihood when data show

separation:

◮ Coefficient estimate of −∞ ◮ Estimated predictive probability of 0 for new cases

◮ Is this conservative? ◮ Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-49
SLIDE 49

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Conservatism of Bayesian inference

◮ Problems with maximum likelihood when data show

separation:

◮ Coefficient estimate of −∞ ◮ Estimated predictive probability of 0 for new cases

◮ Is this conservative? ◮ Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-50
SLIDE 50

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Conservatism of Bayesian inference

◮ Problems with maximum likelihood when data show

separation:

◮ Coefficient estimate of −∞ ◮ Estimated predictive probability of 0 for new cases

◮ Is this conservative? ◮ Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-51
SLIDE 51

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Conservatism of Bayesian inference

◮ Problems with maximum likelihood when data show

separation:

◮ Coefficient estimate of −∞ ◮ Estimated predictive probability of 0 for new cases

◮ Is this conservative? ◮ Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-52
SLIDE 52

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Conservatism of Bayesian inference

◮ Problems with maximum likelihood when data show

separation:

◮ Coefficient estimate of −∞ ◮ Estimated predictive probability of 0 for new cases

◮ Is this conservative? ◮ Not if evaluated by log score or predictive log-likelihood

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-53
SLIDE 53

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Which one is conservative?

Dose Probability of death 10 20 0.0 0.5 1.0 glm bayesglm

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-54
SLIDE 54

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Prior as population distribution

◮ Consider many possible datasets ◮ The “true prior” is the distribution of β’s across these datasets ◮ Fit one dataset at a time ◮ A “weakly informative prior” has less information (wider

variance) than the true prior

◮ Open question: How to formalize the tradeoffs from using

different priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-55
SLIDE 55

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Prior as population distribution

◮ Consider many possible datasets ◮ The “true prior” is the distribution of β’s across these datasets ◮ Fit one dataset at a time ◮ A “weakly informative prior” has less information (wider

variance) than the true prior

◮ Open question: How to formalize the tradeoffs from using

different priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-56
SLIDE 56

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Prior as population distribution

◮ Consider many possible datasets ◮ The “true prior” is the distribution of β’s across these datasets ◮ Fit one dataset at a time ◮ A “weakly informative prior” has less information (wider

variance) than the true prior

◮ Open question: How to formalize the tradeoffs from using

different priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-57
SLIDE 57

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Prior as population distribution

◮ Consider many possible datasets ◮ The “true prior” is the distribution of β’s across these datasets ◮ Fit one dataset at a time ◮ A “weakly informative prior” has less information (wider

variance) than the true prior

◮ Open question: How to formalize the tradeoffs from using

different priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-58
SLIDE 58

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Prior as population distribution

◮ Consider many possible datasets ◮ The “true prior” is the distribution of β’s across these datasets ◮ Fit one dataset at a time ◮ A “weakly informative prior” has less information (wider

variance) than the true prior

◮ Open question: How to formalize the tradeoffs from using

different priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-59
SLIDE 59

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Prior as population distribution

◮ Consider many possible datasets ◮ The “true prior” is the distribution of β’s across these datasets ◮ Fit one dataset at a time ◮ A “weakly informative prior” has less information (wider

variance) than the true prior

◮ Open question: How to formalize the tradeoffs from using

different priors?

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-60
SLIDE 60

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Evaluation using a corpus of datasets

◮ Compare classical glm to Bayesian estimates using various

prior distributions

◮ Evaluate using 5-fold cross-validation and average predictive

error

◮ The optimal prior distribution for β’s is (approx) Cauchy (0, 1) ◮ Our Cauchy (0, 2.5) prior distribution is weakly informative!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-61
SLIDE 61

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Evaluation using a corpus of datasets

◮ Compare classical glm to Bayesian estimates using various

prior distributions

◮ Evaluate using 5-fold cross-validation and average predictive

error

◮ The optimal prior distribution for β’s is (approx) Cauchy (0, 1) ◮ Our Cauchy (0, 2.5) prior distribution is weakly informative!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-62
SLIDE 62

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Evaluation using a corpus of datasets

◮ Compare classical glm to Bayesian estimates using various

prior distributions

◮ Evaluate using 5-fold cross-validation and average predictive

error

◮ The optimal prior distribution for β’s is (approx) Cauchy (0, 1) ◮ Our Cauchy (0, 2.5) prior distribution is weakly informative!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-63
SLIDE 63

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Evaluation using a corpus of datasets

◮ Compare classical glm to Bayesian estimates using various

prior distributions

◮ Evaluate using 5-fold cross-validation and average predictive

error

◮ The optimal prior distribution for β’s is (approx) Cauchy (0, 1) ◮ Our Cauchy (0, 2.5) prior distribution is weakly informative!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-64
SLIDE 64

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Evaluation using a corpus of datasets

◮ Compare classical glm to Bayesian estimates using various

prior distributions

◮ Evaluate using 5-fold cross-validation and average predictive

error

◮ The optimal prior distribution for β’s is (approx) Cauchy (0, 1) ◮ Our Cauchy (0, 2.5) prior distribution is weakly informative!

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-65
SLIDE 65

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Expected predictive loss, avg over a corpus of datasets

1 2 3 4 5 0.29 0.30 0.31 0.32 0.33 scale of prior −log test likelihood (1.79) GLM BBR(l) df=2.0 df=4.0 df=8.0 BBR(g) df=1.0 df=0.5 Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-66
SLIDE 66

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Priors for other regression models

◮ Probit ◮ Ordered logit/probit ◮ Poisson ◮ Linear regression with normal errors

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-67
SLIDE 67

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Priors for other regression models

◮ Probit ◮ Ordered logit/probit ◮ Poisson ◮ Linear regression with normal errors

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-68
SLIDE 68

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Priors for other regression models

◮ Probit ◮ Ordered logit/probit ◮ Poisson ◮ Linear regression with normal errors

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-69
SLIDE 69

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Priors for other regression models

◮ Probit ◮ Ordered logit/probit ◮ Poisson ◮ Linear regression with normal errors

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-70
SLIDE 70

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Priors for other regression models

◮ Probit ◮ Ordered logit/probit ◮ Poisson ◮ Linear regression with normal errors

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-71
SLIDE 71

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-72
SLIDE 72

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-73
SLIDE 73

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-74
SLIDE 74

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-75
SLIDE 75

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-76
SLIDE 76

Logistic regression Weakly informative priors Conclusions Prior information Who’s the real conservative? Evaluation using a corpus of datasets Other generalized linear models

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-77
SLIDE 77

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Conclusions

◮ “Noninformative priors” are actually weakly informative ◮ “Weakly informative” is a more general and useful concept ◮ Regularization

◮ Better inferences ◮ Stability of computation (bayesglm)

◮ Why use weakly informative priors rather than informative

priors?

◮ Conformity with statistical culture (“conservatism”) ◮ Labor-saving device ◮ Robustness Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-78
SLIDE 78

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Conclusions

◮ “Noninformative priors” are actually weakly informative ◮ “Weakly informative” is a more general and useful concept ◮ Regularization

◮ Better inferences ◮ Stability of computation (bayesglm)

◮ Why use weakly informative priors rather than informative

priors?

◮ Conformity with statistical culture (“conservatism”) ◮ Labor-saving device ◮ Robustness Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-79
SLIDE 79

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Conclusions

◮ “Noninformative priors” are actually weakly informative ◮ “Weakly informative” is a more general and useful concept ◮ Regularization

◮ Better inferences ◮ Stability of computation (bayesglm)

◮ Why use weakly informative priors rather than informative

priors?

◮ Conformity with statistical culture (“conservatism”) ◮ Labor-saving device ◮ Robustness Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-80
SLIDE 80

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Conclusions

◮ “Noninformative priors” are actually weakly informative ◮ “Weakly informative” is a more general and useful concept ◮ Regularization

◮ Better inferences ◮ Stability of computation (bayesglm)

◮ Why use weakly informative priors rather than informative

priors?

◮ Conformity with statistical culture (“conservatism”) ◮ Labor-saving device ◮ Robustness Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-81
SLIDE 81

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Conclusions

◮ “Noninformative priors” are actually weakly informative ◮ “Weakly informative” is a more general and useful concept ◮ Regularization

◮ Better inferences ◮ Stability of computation (bayesglm)

◮ Why use weakly informative priors rather than informative

priors?

◮ Conformity with statistical culture (“conservatism”) ◮ Labor-saving device ◮ Robustness Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-82
SLIDE 82

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Conclusions

◮ “Noninformative priors” are actually weakly informative ◮ “Weakly informative” is a more general and useful concept ◮ Regularization

◮ Better inferences ◮ Stability of computation (bayesglm)

◮ Why use weakly informative priors rather than informative

priors?

◮ Conformity with statistical culture (“conservatism”) ◮ Labor-saving device ◮ Robustness Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-83
SLIDE 83

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Conclusions

◮ “Noninformative priors” are actually weakly informative ◮ “Weakly informative” is a more general and useful concept ◮ Regularization

◮ Better inferences ◮ Stability of computation (bayesglm)

◮ Why use weakly informative priors rather than informative

priors?

◮ Conformity with statistical culture (“conservatism”) ◮ Labor-saving device ◮ Robustness Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-84
SLIDE 84

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Conclusions

◮ “Noninformative priors” are actually weakly informative ◮ “Weakly informative” is a more general and useful concept ◮ Regularization

◮ Better inferences ◮ Stability of computation (bayesglm)

◮ Why use weakly informative priors rather than informative

priors?

◮ Conformity with statistical culture (“conservatism”) ◮ Labor-saving device ◮ Robustness Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-85
SLIDE 85

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Conclusions

◮ “Noninformative priors” are actually weakly informative ◮ “Weakly informative” is a more general and useful concept ◮ Regularization

◮ Better inferences ◮ Stability of computation (bayesglm)

◮ Why use weakly informative priors rather than informative

priors?

◮ Conformity with statistical culture (“conservatism”) ◮ Labor-saving device ◮ Robustness Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-86
SLIDE 86

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Conclusions

◮ “Noninformative priors” are actually weakly informative ◮ “Weakly informative” is a more general and useful concept ◮ Regularization

◮ Better inferences ◮ Stability of computation (bayesglm)

◮ Why use weakly informative priors rather than informative

priors?

◮ Conformity with statistical culture (“conservatism”) ◮ Labor-saving device ◮ Robustness Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-87
SLIDE 87

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-88
SLIDE 88

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-89
SLIDE 89

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-90
SLIDE 90

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-91
SLIDE 91

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-92
SLIDE 92

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Other examples of weakly informative priors

◮ Variance parameters ◮ Covariance matrices ◮ Population variation in a physiological model ◮ Mixture models ◮ Intentional underpooling in hierarchical models

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-93
SLIDE 93

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for variance parameter

◮ Basic hierarchical model ◮ Traditional inverse-gamma(0.001, 0.001) prior can be highly

informative (in a bad way)!

◮ Noninformative uniform prior works better ◮ But if #groups is small (J = 2, 3, even 5), a weakly

informative prior helps by shutting down huge values of τ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-94
SLIDE 94

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for variance parameter

◮ Basic hierarchical model ◮ Traditional inverse-gamma(0.001, 0.001) prior can be highly

informative (in a bad way)!

◮ Noninformative uniform prior works better ◮ But if #groups is small (J = 2, 3, even 5), a weakly

informative prior helps by shutting down huge values of τ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-95
SLIDE 95

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for variance parameter

◮ Basic hierarchical model ◮ Traditional inverse-gamma(0.001, 0.001) prior can be highly

informative (in a bad way)!

◮ Noninformative uniform prior works better ◮ But if #groups is small (J = 2, 3, even 5), a weakly

informative prior helps by shutting down huge values of τ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-96
SLIDE 96

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for variance parameter

◮ Basic hierarchical model ◮ Traditional inverse-gamma(0.001, 0.001) prior can be highly

informative (in a bad way)!

◮ Noninformative uniform prior works better ◮ But if #groups is small (J = 2, 3, even 5), a weakly

informative prior helps by shutting down huge values of τ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-97
SLIDE 97

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for variance parameter

◮ Basic hierarchical model ◮ Traditional inverse-gamma(0.001, 0.001) prior can be highly

informative (in a bad way)!

◮ Noninformative uniform prior works better ◮ But if #groups is small (J = 2, 3, even 5), a weakly

informative prior helps by shutting down huge values of τ

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-98
SLIDE 98

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Priors for variance parameter: J = 8 groups

σα

5 10 15 20 25 30

8 schools: posterior on σα given uniform prior on σα σα

5 10 15 20 25 30

8 schools: posterior on σα given inv−gamma (1, 1) prior on σα

2

σα

5 10 15 20 25 30

8 schools: posterior on σα given inv−gamma (.001, .001) prior on σα

2

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-99
SLIDE 99

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Priors for variance parameter: J = 3 groups

σα

50 100 150 200

3 schools: posterior on σα given uniform prior on σα σα

50 100 150 200

3 schools: posterior on σα given half−Cauchy (25) prior on σα

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-100
SLIDE 100

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for covariance matrices

◮ Inverse-Wishart has problems ◮ Correlations can be between 0 and 1 ◮ Set up models so prior expectation of correlations is 0 ◮ Goal: to be weakly informative about correlations and

variances

◮ Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-101
SLIDE 101

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for covariance matrices

◮ Inverse-Wishart has problems ◮ Correlations can be between 0 and 1 ◮ Set up models so prior expectation of correlations is 0 ◮ Goal: to be weakly informative about correlations and

variances

◮ Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-102
SLIDE 102

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for covariance matrices

◮ Inverse-Wishart has problems ◮ Correlations can be between 0 and 1 ◮ Set up models so prior expectation of correlations is 0 ◮ Goal: to be weakly informative about correlations and

variances

◮ Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-103
SLIDE 103

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for covariance matrices

◮ Inverse-Wishart has problems ◮ Correlations can be between 0 and 1 ◮ Set up models so prior expectation of correlations is 0 ◮ Goal: to be weakly informative about correlations and

variances

◮ Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-104
SLIDE 104

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for covariance matrices

◮ Inverse-Wishart has problems ◮ Correlations can be between 0 and 1 ◮ Set up models so prior expectation of correlations is 0 ◮ Goal: to be weakly informative about correlations and

variances

◮ Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-105
SLIDE 105

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for covariance matrices

◮ Inverse-Wishart has problems ◮ Correlations can be between 0 and 1 ◮ Set up models so prior expectation of correlations is 0 ◮ Goal: to be weakly informative about correlations and

variances

◮ Scaled inverse-Wishart model uses redundant parameterization

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-106
SLIDE 106

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for population variation in a physiological model

◮ Pharamcokinetic parameters such as the “Michaelis-Menten

coefficient”

◮ Wide uncertainty: prior guess for θ is 15 with a factor of 100

  • f uncertainty, log θ ∼ N(log(15), log(10)2)

◮ Population model: data on several people j,

log θj ∼ N(log(15), log(10)2) ????

◮ Hierarchical prior distribution:

◮ log θj ∼ N(µ, σ2),

σ ≈ log(2)

◮ µ ∼ N(log(15), log(10)2)

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-107
SLIDE 107

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for population variation in a physiological model

◮ Pharamcokinetic parameters such as the “Michaelis-Menten

coefficient”

◮ Wide uncertainty: prior guess for θ is 15 with a factor of 100

  • f uncertainty, log θ ∼ N(log(15), log(10)2)

◮ Population model: data on several people j,

log θj ∼ N(log(15), log(10)2) ????

◮ Hierarchical prior distribution:

◮ log θj ∼ N(µ, σ2),

σ ≈ log(2)

◮ µ ∼ N(log(15), log(10)2)

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-108
SLIDE 108

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for population variation in a physiological model

◮ Pharamcokinetic parameters such as the “Michaelis-Menten

coefficient”

◮ Wide uncertainty: prior guess for θ is 15 with a factor of 100

  • f uncertainty, log θ ∼ N(log(15), log(10)2)

◮ Population model: data on several people j,

log θj ∼ N(log(15), log(10)2) ????

◮ Hierarchical prior distribution:

◮ log θj ∼ N(µ, σ2),

σ ≈ log(2)

◮ µ ∼ N(log(15), log(10)2)

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-109
SLIDE 109

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for population variation in a physiological model

◮ Pharamcokinetic parameters such as the “Michaelis-Menten

coefficient”

◮ Wide uncertainty: prior guess for θ is 15 with a factor of 100

  • f uncertainty, log θ ∼ N(log(15), log(10)2)

◮ Population model: data on several people j,

log θj ∼ N(log(15), log(10)2) ????

◮ Hierarchical prior distribution:

◮ log θj ∼ N(µ, σ2),

σ ≈ log(2)

◮ µ ∼ N(log(15), log(10)2)

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-110
SLIDE 110

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for population variation in a physiological model

◮ Pharamcokinetic parameters such as the “Michaelis-Menten

coefficient”

◮ Wide uncertainty: prior guess for θ is 15 with a factor of 100

  • f uncertainty, log θ ∼ N(log(15), log(10)2)

◮ Population model: data on several people j,

log θj ∼ N(log(15), log(10)2) ????

◮ Hierarchical prior distribution:

◮ log θj ∼ N(µ, σ2),

σ ≈ log(2)

◮ µ ∼ N(log(15), log(10)2)

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-111
SLIDE 111

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for population variation in a physiological model

◮ Pharamcokinetic parameters such as the “Michaelis-Menten

coefficient”

◮ Wide uncertainty: prior guess for θ is 15 with a factor of 100

  • f uncertainty, log θ ∼ N(log(15), log(10)2)

◮ Population model: data on several people j,

log θj ∼ N(log(15), log(10)2) ????

◮ Hierarchical prior distribution:

◮ log θj ∼ N(µ, σ2),

σ ≈ log(2)

◮ µ ∼ N(log(15), log(10)2)

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-112
SLIDE 112

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for population variation in a physiological model

◮ Pharamcokinetic parameters such as the “Michaelis-Menten

coefficient”

◮ Wide uncertainty: prior guess for θ is 15 with a factor of 100

  • f uncertainty, log θ ∼ N(log(15), log(10)2)

◮ Population model: data on several people j,

log θj ∼ N(log(15), log(10)2) ????

◮ Hierarchical prior distribution:

◮ log θj ∼ N(µ, σ2),

σ ≈ log(2)

◮ µ ∼ N(log(15), log(10)2)

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-113
SLIDE 113

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for population variation in a physiological model

◮ Pharamcokinetic parameters such as the “Michaelis-Menten

coefficient”

◮ Wide uncertainty: prior guess for θ is 15 with a factor of 100

  • f uncertainty, log θ ∼ N(log(15), log(10)2)

◮ Population model: data on several people j,

log θj ∼ N(log(15), log(10)2) ????

◮ Hierarchical prior distribution:

◮ log θj ∼ N(µ, σ2),

σ ≈ log(2)

◮ µ ∼ N(log(15), log(10)2)

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-114
SLIDE 114

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for mixture models

◮ Well-known problem of fitting the mixture model likelihood ◮ The maximum likelihood fits are weird, with a single point

taking half the mixture

◮ Bayes with flat prior is just as bad ◮ These solutions don’t “look” like mixtures ◮ There must be additional prior information—or, to put it

another way, regularization

◮ Simple constraints, for example, a prior dist on the variance

ratio

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-115
SLIDE 115

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for mixture models

◮ Well-known problem of fitting the mixture model likelihood ◮ The maximum likelihood fits are weird, with a single point

taking half the mixture

◮ Bayes with flat prior is just as bad ◮ These solutions don’t “look” like mixtures ◮ There must be additional prior information—or, to put it

another way, regularization

◮ Simple constraints, for example, a prior dist on the variance

ratio

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-116
SLIDE 116

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for mixture models

◮ Well-known problem of fitting the mixture model likelihood ◮ The maximum likelihood fits are weird, with a single point

taking half the mixture

◮ Bayes with flat prior is just as bad ◮ These solutions don’t “look” like mixtures ◮ There must be additional prior information—or, to put it

another way, regularization

◮ Simple constraints, for example, a prior dist on the variance

ratio

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-117
SLIDE 117

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for mixture models

◮ Well-known problem of fitting the mixture model likelihood ◮ The maximum likelihood fits are weird, with a single point

taking half the mixture

◮ Bayes with flat prior is just as bad ◮ These solutions don’t “look” like mixtures ◮ There must be additional prior information—or, to put it

another way, regularization

◮ Simple constraints, for example, a prior dist on the variance

ratio

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-118
SLIDE 118

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for mixture models

◮ Well-known problem of fitting the mixture model likelihood ◮ The maximum likelihood fits are weird, with a single point

taking half the mixture

◮ Bayes with flat prior is just as bad ◮ These solutions don’t “look” like mixtures ◮ There must be additional prior information—or, to put it

another way, regularization

◮ Simple constraints, for example, a prior dist on the variance

ratio

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-119
SLIDE 119

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for mixture models

◮ Well-known problem of fitting the mixture model likelihood ◮ The maximum likelihood fits are weird, with a single point

taking half the mixture

◮ Bayes with flat prior is just as bad ◮ These solutions don’t “look” like mixtures ◮ There must be additional prior information—or, to put it

another way, regularization

◮ Simple constraints, for example, a prior dist on the variance

ratio

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-120
SLIDE 120

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for mixture models

◮ Well-known problem of fitting the mixture model likelihood ◮ The maximum likelihood fits are weird, with a single point

taking half the mixture

◮ Bayes with flat prior is just as bad ◮ These solutions don’t “look” like mixtures ◮ There must be additional prior information—or, to put it

another way, regularization

◮ Simple constraints, for example, a prior dist on the variance

ratio

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-121
SLIDE 121

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Weakly informative priors for mixture models

◮ Well-known problem of fitting the mixture model likelihood ◮ The maximum likelihood fits are weird, with a single point

taking half the mixture

◮ Bayes with flat prior is just as bad ◮ These solutions don’t “look” like mixtures ◮ There must be additional prior information—or, to put it

another way, regularization

◮ Simple constraints, for example, a prior dist on the variance

ratio

◮ Weakly informative

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-122
SLIDE 122

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Intentional underpooling in hierarchical models

◮ Basic hierarchical model:

◮ Data yj on parameters θj ◮ Group-level model θj ∼ N(µ, τ 2) ◮ No-pooling estimate ˆ

θj = yj

◮ Bayesian partial-pooling estimate E(θj|y)

◮ Weak Bayes estimate: same as Bayes, but replacing τ with 2τ ◮ An example of the “incompatible Gibbs” algorithm ◮ Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-123
SLIDE 123

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Intentional underpooling in hierarchical models

◮ Basic hierarchical model:

◮ Data yj on parameters θj ◮ Group-level model θj ∼ N(µ, τ 2) ◮ No-pooling estimate ˆ

θj = yj

◮ Bayesian partial-pooling estimate E(θj|y)

◮ Weak Bayes estimate: same as Bayes, but replacing τ with 2τ ◮ An example of the “incompatible Gibbs” algorithm ◮ Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-124
SLIDE 124

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Intentional underpooling in hierarchical models

◮ Basic hierarchical model:

◮ Data yj on parameters θj ◮ Group-level model θj ∼ N(µ, τ 2) ◮ No-pooling estimate ˆ

θj = yj

◮ Bayesian partial-pooling estimate E(θj|y)

◮ Weak Bayes estimate: same as Bayes, but replacing τ with 2τ ◮ An example of the “incompatible Gibbs” algorithm ◮ Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-125
SLIDE 125

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Intentional underpooling in hierarchical models

◮ Basic hierarchical model:

◮ Data yj on parameters θj ◮ Group-level model θj ∼ N(µ, τ 2) ◮ No-pooling estimate ˆ

θj = yj

◮ Bayesian partial-pooling estimate E(θj|y)

◮ Weak Bayes estimate: same as Bayes, but replacing τ with 2τ ◮ An example of the “incompatible Gibbs” algorithm ◮ Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-126
SLIDE 126

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Intentional underpooling in hierarchical models

◮ Basic hierarchical model:

◮ Data yj on parameters θj ◮ Group-level model θj ∼ N(µ, τ 2) ◮ No-pooling estimate ˆ

θj = yj

◮ Bayesian partial-pooling estimate E(θj|y)

◮ Weak Bayes estimate: same as Bayes, but replacing τ with 2τ ◮ An example of the “incompatible Gibbs” algorithm ◮ Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-127
SLIDE 127

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Intentional underpooling in hierarchical models

◮ Basic hierarchical model:

◮ Data yj on parameters θj ◮ Group-level model θj ∼ N(µ, τ 2) ◮ No-pooling estimate ˆ

θj = yj

◮ Bayesian partial-pooling estimate E(θj|y)

◮ Weak Bayes estimate: same as Bayes, but replacing τ with 2τ ◮ An example of the “incompatible Gibbs” algorithm ◮ Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-128
SLIDE 128

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Intentional underpooling in hierarchical models

◮ Basic hierarchical model:

◮ Data yj on parameters θj ◮ Group-level model θj ∼ N(µ, τ 2) ◮ No-pooling estimate ˆ

θj = yj

◮ Bayesian partial-pooling estimate E(θj|y)

◮ Weak Bayes estimate: same as Bayes, but replacing τ with 2τ ◮ An example of the “incompatible Gibbs” algorithm ◮ Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-129
SLIDE 129

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Intentional underpooling in hierarchical models

◮ Basic hierarchical model:

◮ Data yj on parameters θj ◮ Group-level model θj ∼ N(µ, τ 2) ◮ No-pooling estimate ˆ

θj = yj

◮ Bayesian partial-pooling estimate E(θj|y)

◮ Weak Bayes estimate: same as Bayes, but replacing τ with 2τ ◮ An example of the “incompatible Gibbs” algorithm ◮ Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p

slide-130
SLIDE 130

Logistic regression Weakly informative priors Conclusions Conclusions Extra stuff

Intentional underpooling in hierarchical models

◮ Basic hierarchical model:

◮ Data yj on parameters θj ◮ Group-level model θj ∼ N(µ, τ 2) ◮ No-pooling estimate ˆ

θj = yj

◮ Bayesian partial-pooling estimate E(θj|y)

◮ Weak Bayes estimate: same as Bayes, but replacing τ with 2τ ◮ An example of the “incompatible Gibbs” algorithm ◮ Why would we do this??

Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default p