[PPT] - Midterm review Dr. Jarad Niemi Iowa State University March 6, 2018 PowerPoint Presentation

SLIDE 1

Midterm review

Dr. Jarad Niemi

Iowa State University

March 6, 2018

Jarad Niemi (Iowa State) Midterm review March 6, 2018 1 / 14

SLIDE 2

What we have covered

Chapters Probability and inference (Ch 1)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 2 / 14

SLIDE 3

What we have covered

Chapters Probability and inference (Ch 1) Single-parameter models (Ch 2)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 2 / 14

SLIDE 4

What we have covered

Chapters Probability and inference (Ch 1) Single-parameter models (Ch 2) Introduction to multiparameter models (Ch 3)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 2 / 14

SLIDE 5

What we have covered

Chapters Probability and inference (Ch 1) Single-parameter models (Ch 2) Introduction to multiparameter models (Ch 3) Asymptotics and connections to non-Bayesian approaches (Ch 4)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 2 / 14

SLIDE 6

What we have covered

Chapters Probability and inference (Ch 1) Single-parameter models (Ch 2) Introduction to multiparameter models (Ch 3) Asymptotics and connections to non-Bayesian approaches (Ch 4) Hierarchical models (Ch 5)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 2 / 14

SLIDE 7

What we have covered

Chapters Probability and inference (Ch 1) Single-parameter models (Ch 2) Introduction to multiparameter models (Ch 3) Asymptotics and connections to non-Bayesian approaches (Ch 4) Hierarchical models (Ch 5) Model checking (Ch 6)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 2 / 14

SLIDE 8

What we have covered

Chapters Probability and inference (Ch 1) Single-parameter models (Ch 2) Introduction to multiparameter models (Ch 3) Asymptotics and connections to non-Bayesian approaches (Ch 4) Hierarchical models (Ch 5) Model checking (Ch 6) Bayesian hypothesis tests (Sec 7.4)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 2 / 14

SLIDE 9

What we have covered

Chapters Probability and inference (Ch 1) Single-parameter models (Ch 2) Introduction to multiparameter models (Ch 3) Asymptotics and connections to non-Bayesian approaches (Ch 4) Hierarchical models (Ch 5) Model checking (Ch 6) Bayesian hypothesis tests (Sec 7.4) Decision theory (Sec 9.1)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 2 / 14

SLIDE 10

What we have covered

Chapters Probability and inference (Ch 1) Single-parameter models (Ch 2) Introduction to multiparameter models (Ch 3) Asymptotics and connections to non-Bayesian approaches (Ch 4) Hierarchical models (Ch 5) Model checking (Ch 6) Bayesian hypothesis tests (Sec 7.4) Decision theory (Sec 9.1) Stan

Jarad Niemi (Iowa State) Midterm review March 6, 2018 2 / 14

SLIDE 11

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 12

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 13

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ) Condition on observed data: p(θ|y)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 14

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ) Condition on observed data: p(θ|y) Evaluate the fit of the model: p(y rep|y)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 15

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ) Condition on observed data: p(θ|y) Evaluate the fit of the model: p(y rep|y)

Bayesian inference via Bayes’ rule (Sec 1.3)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 16

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ) Condition on observed data: p(θ|y) Evaluate the fit of the model: p(y rep|y)

Bayesian inference via Bayes’ rule (Sec 1.3)

Parameter posteriors: p(θ|y) ∝ p(y|θ)p(θ)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 17

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ) Condition on observed data: p(θ|y) Evaluate the fit of the model: p(y rep|y)

Bayesian inference via Bayes’ rule (Sec 1.3)

Parameter posteriors: p(θ|y) ∝ p(y|θ)p(θ) Predictions: p(˜ y|y) =

p(˜

y|θ)p(θ|y)dθ

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 18

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ) Condition on observed data: p(θ|y) Evaluate the fit of the model: p(y rep|y)

Bayesian inference via Bayes’ rule (Sec 1.3)

Parameter posteriors: p(θ|y) ∝ p(y|θ)p(θ) Predictions: p(˜ y|y) =

p(˜

y|θ)p(θ|y)dθ Model probabilities p(M|y) ∝ p(y|M)p(M) where p(y|M) =

p(y|θ, M)p(θ|M)dθ.

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 19

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ) Condition on observed data: p(θ|y) Evaluate the fit of the model: p(y rep|y)

Bayesian inference via Bayes’ rule (Sec 1.3)

Parameter posteriors: p(θ|y) ∝ p(y|θ)p(θ) Predictions: p(˜ y|y) =

p(˜

y|θ)p(θ|y)dθ Model probabilities p(M|y) ∝ p(y|M)p(M) where p(y|M) =

p(y|θ, M)p(θ|M)dθ.

Interpreting Bayesian probabilities (Sec 1.5)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 20

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ) Condition on observed data: p(θ|y) Evaluate the fit of the model: p(y rep|y)

Bayesian inference via Bayes’ rule (Sec 1.3)

Parameter posteriors: p(θ|y) ∝ p(y|θ)p(θ) Predictions: p(˜ y|y) =

p(˜

y|θ)p(θ|y)dθ Model probabilities p(M|y) ∝ p(y|M)p(M) where p(y|M) =

p(y|θ, M)p(θ|M)dθ.

Interpreting Bayesian probabilities (Sec 1.5)

Epistemic probability: my belief

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 21

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ) Condition on observed data: p(θ|y) Evaluate the fit of the model: p(y rep|y)

Bayesian inference via Bayes’ rule (Sec 1.3)

Parameter posteriors: p(θ|y) ∝ p(y|θ)p(θ) Predictions: p(˜ y|y) =

p(˜

y|θ)p(θ|y)dθ Model probabilities p(M|y) ∝ p(y|M)p(M) where p(y|M) =

p(y|θ, M)p(θ|M)dθ.

Interpreting Bayesian probabilities (Sec 1.5)

Epistemic probability: my belief Frequency probability: long run percentage

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 22

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ) Condition on observed data: p(θ|y) Evaluate the fit of the model: p(y rep|y)

Bayesian inference via Bayes’ rule (Sec 1.3)

Parameter posteriors: p(θ|y) ∝ p(y|θ)p(θ) Predictions: p(˜ y|y) =

p(˜

y|θ)p(θ|y)dθ Model probabilities p(M|y) ∝ p(y|M)p(M) where p(y|M) =

p(y|θ, M)p(θ|M)dθ.

Interpreting Bayesian probabilities (Sec 1.5)

Epistemic probability: my belief Frequency probability: long run percentage

Computation (Sec 1.9)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 23

What we have covered Probability and inference

Probability and inference (Ch 1)

Three steps of Bayesian data analysis (Sec 1.1)

Set up a full probability model: p(y|θ) and p(θ) Condition on observed data: p(θ|y) Evaluate the fit of the model: p(y rep|y)

Bayesian inference via Bayes’ rule (Sec 1.3)

Parameter posteriors: p(θ|y) ∝ p(y|θ)p(θ) Predictions: p(˜ y|y) =

p(˜

y|θ)p(θ|y)dθ Model probabilities p(M|y) ∝ p(y|M)p(M) where p(y|M) =

p(y|θ, M)p(θ|M)dθ.

Interpreting Bayesian probabilities (Sec 1.5)

Epistemic probability: my belief Frequency probability: long run percentage

Computation (Sec 1.9)

Inference via simulations

Jarad Niemi (Iowa State) Midterm review March 6, 2018 3 / 14

SLIDE 24

What we have covered Single-parameter models

Single-parameter models (Ch 2)

General Priors

Conjugate (Sec 2.4) Default - Jeffreys (Sec 2.8) Weakly informative (Sec 2.9)

Posteriors

Compromise between data and prior (2.2) Point estimation Credible intervals (Sec 2.3)

Specific models Binomial (Sec 2.1–2.4) Normal, unknown mean (Sec 2.5) Normal, unknown variance (Sec 2.6) Poisson (Sec 2.6) Exponential (Sec 2.6) Poisson with exposure (Sec 2.7)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 4 / 14

SLIDE 25

What we have covered Single-parameter models

Single-parameter models (Ch 2)

Additional comments: Deriving posteriors using the kernel

Jarad Niemi (Iowa State) Midterm review March 6, 2018 5 / 14

SLIDE 26

What we have covered Single-parameter models

Single-parameter models (Ch 2)

Additional comments: Deriving posteriors using the kernel Discrete priors are conjugate

Jarad Niemi (Iowa State) Midterm review March 6, 2018 5 / 14

SLIDE 27

What we have covered Single-parameter models

Single-parameter models (Ch 2)

Additional comments: Deriving posteriors using the kernel Discrete priors are conjugate Mixtures of conjugate priors are conjugate

Jarad Niemi (Iowa State) Midterm review March 6, 2018 5 / 14

SLIDE 28

What we have covered Single-parameter models

Single-parameter models (Ch 2)

Additional comments: Deriving posteriors using the kernel Discrete priors are conjugate Mixtures of conjugate priors are conjugate Point estimation depends on utility function

Jarad Niemi (Iowa State) Midterm review March 6, 2018 5 / 14

SLIDE 29

What we have covered Single-parameter models

Single-parameter models (Ch 2)

Additional comments: Deriving posteriors using the kernel Discrete priors are conjugate Mixtures of conjugate priors are conjugate Point estimation depends on utility function

Mean minimizes squared error

Jarad Niemi (Iowa State) Midterm review March 6, 2018 5 / 14

SLIDE 30

What we have covered Single-parameter models

Single-parameter models (Ch 2)

Additional comments: Deriving posteriors using the kernel Discrete priors are conjugate Mixtures of conjugate priors are conjugate Point estimation depends on utility function

Mean minimizes squared error Median minimizes absolute error

Jarad Niemi (Iowa State) Midterm review March 6, 2018 5 / 14

SLIDE 31

What we have covered Single-parameter models

Single-parameter models (Ch 2)

Additional comments: Deriving posteriors using the kernel Discrete priors are conjugate Mixtures of conjugate priors are conjugate Point estimation depends on utility function

Mean minimizes squared error Median minimizes absolute error Mode is obtained as a limit of minimizing a sequence of 0-1 errors

Jarad Niemi (Iowa State) Midterm review March 6, 2018 5 / 14

SLIDE 32

What we have covered Single-parameter models

Single-parameter models (Ch 2)

Additional comments: Deriving posteriors using the kernel Discrete priors are conjugate Mixtures of conjugate priors are conjugate Point estimation depends on utility function

Mean minimizes squared error Median minimizes absolute error Mode is obtained as a limit of minimizing a sequence of 0-1 errors

Credible intervals

Jarad Niemi (Iowa State) Midterm review March 6, 2018 5 / 14

SLIDE 33

What we have covered Single-parameter models

Single-parameter models (Ch 2)

Additional comments: Deriving posteriors using the kernel Discrete priors are conjugate Mixtures of conjugate priors are conjugate Point estimation depends on utility function

Mean minimizes squared error Median minimizes absolute error Mode is obtained as a limit of minimizing a sequence of 0-1 errors

Credible intervals

One-tailed

Jarad Niemi (Iowa State) Midterm review March 6, 2018 5 / 14

SLIDE 34

What we have covered Single-parameter models

Single-parameter models (Ch 2)

Additional comments: Deriving posteriors using the kernel Discrete priors are conjugate Mixtures of conjugate priors are conjugate Point estimation depends on utility function

Mean minimizes squared error Median minimizes absolute error Mode is obtained as a limit of minimizing a sequence of 0-1 errors

Credible intervals

One-tailed Equal-tailed

Jarad Niemi (Iowa State) Midterm review March 6, 2018 5 / 14

SLIDE 35

What we have covered Single-parameter models

Single-parameter models (Ch 2)

Additional comments: Deriving posteriors using the kernel Discrete priors are conjugate Mixtures of conjugate priors are conjugate Point estimation depends on utility function

Mean minimizes squared error Median minimizes absolute error Mode is obtained as a limit of minimizing a sequence of 0-1 errors

Credible intervals

One-tailed Equal-tailed Highest posterior density

Jarad Niemi (Iowa State) Midterm review March 6, 2018 5 / 14

SLIDE 36

What we have covered Introduction to multiparameter models

Introduction to multiparameter models (Ch 3)

Joint posterior p(θ1, . . . , θn|y) ∝ p(y|θ1, . . . , θn)p(θ1, . . . , θn)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 6 / 14

SLIDE 37

What we have covered Introduction to multiparameter models

Introduction to multiparameter models (Ch 3)

Joint posterior p(θ1, . . . , θn|y) ∝ p(y|θ1, . . . , θn)p(θ1, . . . , θn) Marginal posterior p(θ1|y) =

· · ·
p(θ1, . . . , θn|y)dθ2 · · · dθn

Jarad Niemi (Iowa State) Midterm review March 6, 2018 6 / 14

SLIDE 38

What we have covered Introduction to multiparameter models

Introduction to multiparameter models (Ch 3)

Joint posterior p(θ1, . . . , θn|y) ∝ p(y|θ1, . . . , θn)p(θ1, . . . , θn) Marginal posterior p(θ1|y) =

· · ·
p(θ1, . . . , θn|y)dθ2 · · · dθn

Conditional posteriors p(θ2, . . . , θn|θ1, y) ∝ p(θ1, . . . , θn|y)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 6 / 14

SLIDE 39

What we have covered Introduction to multiparameter models

Introduction to multiparameter models (Ch 3)

Joint posterior p(θ1, . . . , θn|y) ∝ p(y|θ1, . . . , θn)p(θ1, . . . , θn) Marginal posterior p(θ1|y) =

· · ·
p(θ1, . . . , θn|y)dθ2 · · · dθn

Conditional posteriors p(θ2, . . . , θn|θ1, y) ∝ p(θ1, . . . , θn|y) Posterior decomposition, e.g. p(θ1, . . . , θn|y) = p(θ1|y)

n

i=2

p(θi|θ1:i−1, y) where 1 : i − 1 = 1, 2, . . . , i − 1.

Jarad Niemi (Iowa State) Midterm review March 6, 2018 6 / 14

SLIDE 40

What we have covered Introduction to multiparameter models

Introduction to multiparameter models (Ch 3)

Joint posterior p(θ1, . . . , θn|y) ∝ p(y|θ1, . . . , θn)p(θ1, . . . , θn) Marginal posterior p(θ1|y) =

· · ·
p(θ1, . . . , θn|y)dθ2 · · · dθn

Conditional posteriors p(θ2, . . . , θn|θ1, y) ∝ p(θ1, . . . , θn|y) Posterior decomposition, e.g. p(θ1, . . . , θn|y) = p(θ1|y)

n

i=2

p(θi|θ1:i−1, y) where 1 : i − 1 = 1, 2, . . . , i − 1. Conditional independence, e.g. p(θi|θ1:i−1, y) = p(θi|θi−1, y)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 6 / 14

SLIDE 41

What we have covered Introduction to multiparameter models

Normal model

Normal model with default prior (Sec 3.2) yi

iid

∼ N(µ, σ2) p(µ, σ2) ∝ 1/σ2

Jarad Niemi (Iowa State) Midterm review March 6, 2018 7 / 14

SLIDE 42

What we have covered Introduction to multiparameter models

Normal model

Normal model with default prior (Sec 3.2) yi

iid

∼ N(µ, σ2) p(µ, σ2) ∝ 1/σ2 results in p(µ, σ2|y) = N(y, σ2/n)Inv-χ2(n − 1, s2)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 7 / 14

SLIDE 43

What we have covered Introduction to multiparameter models

Normal model

Normal model with default prior (Sec 3.2) yi

iid

∼ N(µ, σ2) p(µ, σ2) ∝ 1/σ2 results in p(µ, σ2|y) = N(y, σ2/n)Inv-χ2(n − 1, s2) where s2 =

1 n−1

n

i=1(yi − y)2.

Jarad Niemi (Iowa State) Midterm review March 6, 2018 7 / 14

SLIDE 44

What we have covered Introduction to multiparameter models

Normal model

Normal model with default prior (Sec 3.2) yi

iid

∼ N(µ, σ2) p(µ, σ2) ∝ 1/σ2 results in p(µ, σ2|y) = N(y, σ2/n)Inv-χ2(n − 1, s2) where s2 =

1 n−1

n

i=1(yi − y)2.

Normal model with conjugate prior (Sec 3.3) y iid ∼ N(µ, σ2) µ|σ2 ∼ N(µ0, σ2/κ0) σ2 ∼ Inv-χ2(ν0, σ2

0)

results in p(µ, σ2|y) = N κ0µ0 + ny κ0 + n , σ2 κ0 + n

Inv-χ2(ν0 + n, σ2

n)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 7 / 14

SLIDE 45

What we have covered Introduction to multiparameter models

Normal model

Normal model with default prior (Sec 3.2) yi

iid

∼ N(µ, σ2) p(µ, σ2) ∝ 1/σ2 results in p(µ, σ2|y) = N(y, σ2/n)Inv-χ2(n − 1, s2) where s2 =

1 n−1

n

i=1(yi − y)2.

Normal model with conjugate prior (Sec 3.3) y iid ∼ N(µ, σ2) µ|σ2 ∼ N(µ0, σ2/κ0) σ2 ∼ Inv-χ2(ν0, σ2

0)

results in p(µ, σ2|y) = N κ0µ0 + ny κ0 + n , σ2 κ0 + n

Inv-χ2(ν0 + n, σ2

n)

where σ2

n =

ν0σ2

0 + (n − 1)s2 + κ0n κ0+n(y − µ0)2

/(ν0 + n).

Jarad Niemi (Iowa State) Midterm review March 6, 2018 7 / 14

SLIDE 46

What we have covered Introduction to multiparameter models

Normal model

Normal model with default prior (Sec 3.2) yi

iid

∼ N(µ, σ2) p(µ, σ2) ∝ 1/σ2 results in p(µ, σ2|y) = N(y, σ2/n)Inv-χ2(n − 1, s2) where s2 =

1 n−1

n

i=1(yi − y)2.

Normal model with conjugate prior (Sec 3.3) y iid ∼ N(µ, σ2) µ|σ2 ∼ N(µ0, σ2/κ0) σ2 ∼ Inv-χ2(ν0, σ2

0)

results in p(µ, σ2|y) = N κ0µ0 + ny κ0 + n , σ2 κ0 + n

Inv-χ2(ν0 + n, σ2

n)

where σ2

n =

ν0σ2

0 + (n − 1)s2 + κ0n κ0+n(y − µ0)2

/(ν0 + n).

Jarad Niemi (Iowa State) Midterm review March 6, 2018 7 / 14

SLIDE 47

What we have covered Introduction to multiparameter models

Data asymptotics (Ch 4)

Consider a model yi

iid

∼ p(y|θ0) for some true value θ0.

Jarad Niemi (Iowa State) Midterm review March 6, 2018 8 / 14

SLIDE 48

What we have covered Introduction to multiparameter models

Data asymptotics (Ch 4)

Consider a model yi

iid

∼ p(y|θ0) for some true value θ0. Posterior convergence: If A is a neighborhood of θ0, then Pr(θ ∈ A|y) → 1.

Jarad Niemi (Iowa State) Midterm review March 6, 2018 8 / 14

SLIDE 49

What we have covered Introduction to multiparameter models

Data asymptotics (Ch 4)

Consider a model yi

iid

∼ p(y|θ0) for some true value θ0. Posterior convergence: If A is a neighborhood of θ0, then Pr(θ ∈ A|y) → 1. Point estimation: ˆ θBayes → ˆ θMLE

p

→ θ0

Jarad Niemi (Iowa State) Midterm review March 6, 2018 8 / 14

SLIDE 50

What we have covered Introduction to multiparameter models

Data asymptotics (Ch 4)

Consider a model yi

iid

∼ p(y|θ0) for some true value θ0. Posterior convergence: If A is a neighborhood of θ0, then Pr(θ ∈ A|y) → 1. Point estimation: ˆ θBayes → ˆ θMLE

p

→ θ0 Limiting distribution: θ|y

d

→ N

ˆ

θ, 1 nI(ˆ θ)−1

Jarad Niemi (Iowa State)

Midterm review March 6, 2018 8 / 14

SLIDE 51

What we have covered Introduction to multiparameter models

Asymptotics - What can go wrong?

Not unique to Bayesian statistics

Unidentified parameters Number of parameters increase with sample size Aliasing Unbounded likelihoods Tails of the distribution True sampling distribution is not p(y|θ)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 9 / 14

SLIDE 52

What we have covered Introduction to multiparameter models

Asymptotics - What can go wrong?

Not unique to Bayesian statistics

Unidentified parameters Number of parameters increase with sample size Aliasing Unbounded likelihoods Tails of the distribution True sampling distribution is not p(y|θ)

Unique to Bayesian statistics

Improper posterior

Jarad Niemi (Iowa State) Midterm review March 6, 2018 9 / 14

SLIDE 53

What we have covered Introduction to multiparameter models

Asymptotics - What can go wrong?

Not unique to Bayesian statistics

Unidentified parameters Number of parameters increase with sample size Aliasing Unbounded likelihoods Tails of the distribution True sampling distribution is not p(y|θ)

Unique to Bayesian statistics

Improper posterior Prior distributions that exclude the point of convergence

Jarad Niemi (Iowa State) Midterm review March 6, 2018 9 / 14

SLIDE 54

What we have covered Introduction to multiparameter models

Asymptotics - What can go wrong?

Not unique to Bayesian statistics

Unidentified parameters Number of parameters increase with sample size Aliasing Unbounded likelihoods Tails of the distribution True sampling distribution is not p(y|θ)

Unique to Bayesian statistics

Improper posterior Prior distributions that exclude the point of convergence Convergence to the edge of the parameter space

Jarad Niemi (Iowa State) Midterm review March 6, 2018 9 / 14

SLIDE 55

What we have covered Hierarchical models

Hierarchical models (Ch 5)

Hierarchical model (Ch 5): p(θ, φ|y) ∝ p(y|θ)p(θ|φ)p(φ)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 10 / 14

SLIDE 56

What we have covered Hierarchical models

Hierarchical models (Ch 5)

Hierarchical model (Ch 5): p(θ, φ|y) ∝ p(y|θ)p(θ|φ)p(φ) Exchangeability (Sec 5.2) p(y1, . . . , yn) = p(yπ1, . . . , yπn)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 10 / 14

SLIDE 57

What we have covered Hierarchical models

Hierarchical models (Ch 5)

Hierarchical model (Ch 5): p(θ, φ|y) ∝ p(y|θ)p(θ|φ)p(φ) Exchangeability (Sec 5.2) p(y1, . . . , yn) = p(yπ1, . . . , yπn) Hierarchical binomial model (Sec 5.3): yi

iid

∼ Bin(ni, θi) θi

iid

∼ Be(α, β)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 10 / 14

SLIDE 58

What we have covered Hierarchical models

Hierarchical models (Ch 5)

Hierarchical model (Ch 5): p(θ, φ|y) ∝ p(y|θ)p(θ|φ)p(φ) Exchangeability (Sec 5.2) p(y1, . . . , yn) = p(yπ1, . . . , yπn) Hierarchical binomial model (Sec 5.3): yi

iid

∼ Bin(ni, θi) θi

iid

∼ Be(α, β) Hierarchical Poisson (with exposure) model yi

iid

∼ Po(xiλi) λi

iid

∼ Ga(µβ, β)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 10 / 14

SLIDE 59

What we have covered Hierarchical models

Hierarchical models (Ch 5)

Hierarchical model (Ch 5): p(θ, φ|y) ∝ p(y|θ)p(θ|φ)p(φ) Exchangeability (Sec 5.2) p(y1, . . . , yn) = p(yπ1, . . . , yπn) Hierarchical binomial model (Sec 5.3): yi

iid

∼ Bin(ni, θi) θi

iid

∼ Be(α, β) Hierarchical Poisson (with exposure) model yi

iid

∼ Po(xiλi) λi

iid

∼ Ga(µβ, β) Hierarchical normal model (Sec 5.4) yij

iid

∼ N(µj, σ2

j )

µj

iid

∼ N(η, τ 2) σ2

j iid

∼ Ga(α, β)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 10 / 14

SLIDE 60

What we have covered Model checking

Model checking (Ch 6)

Data replications p(yrep|y) =

p(yrep|θ)p(θ|y)dθ

Jarad Niemi (Iowa State) Midterm review March 6, 2018 11 / 14

SLIDE 61

What we have covered Model checking

Model checking (Ch 6)

Data replications p(yrep|y) =

p(yrep|θ)p(θ|y)dθ

Graphical posterior predictive checks (Sec 6.4)

Jarad Niemi (Iowa State) Midterm review March 6, 2018 11 / 14

SLIDE 62

What we have covered Model checking

Model checking (Ch 6)

Data replications p(yrep|y) =

p(yrep|θ)p(θ|y)dθ

Graphical posterior predictive checks (Sec 6.4) Posterior predictive pvalues (Sec 6.3) pB = P(T(yrep, θ) ≥ T(y, θ)|y) for a test statistic T(y, θ).

Jarad Niemi (Iowa State) Midterm review March 6, 2018 11 / 14

SLIDE 63

What we have covered Model checking

Hypothesis testing (Section 7.4)

From a Bayesian perspective, Simple: Hi : θ = θi Composite: Hi : θ ∈ (θi, θi+1] Treat all simple (or all composite) hypotheses as formal Bayesian parameter estimation. Treat a mix of simple and composite hypotheses as formal Bayesian tests.

Jarad Niemi (Iowa State) Midterm review March 6, 2018 12 / 14

SLIDE 64

What we have covered Model checking

Hypothesis testing (Section 7.4)

From a Bayesian perspective, Simple: Hi : θ = θi Composite: Hi : θ ∈ (θi, θi+1] Treat all simple (or all composite) hypotheses as formal Bayesian parameter estimation. Treat a mix of simple and composite hypotheses as formal Bayesian tests. Formal Bayesian tests require prior probabilities for each hypothesis, p(Hi),

Jarad Niemi (Iowa State) Midterm review March 6, 2018 12 / 14

SLIDE 65

What we have covered Model checking

Hypothesis testing (Section 7.4)

From a Bayesian perspective, Simple: Hi : θ = θi Composite: Hi : θ ∈ (θi, θi+1] Treat all simple (or all composite) hypotheses as formal Bayesian parameter estimation. Treat a mix of simple and composite hypotheses as formal Bayesian tests. Formal Bayesian tests require prior probabilities for each hypothesis, p(Hi), require priors for parameters in non-point hypotheses, p(θ|Hi), and

Jarad Niemi (Iowa State) Midterm review March 6, 2018 12 / 14

SLIDE 66

What we have covered Model checking

Hypothesis testing (Section 7.4)

From a Bayesian perspective, Simple: Hi : θ = θi Composite: Hi : θ ∈ (θi, θi+1] Treat all simple (or all composite) hypotheses as formal Bayesian parameter estimation. Treat a mix of simple and composite hypotheses as formal Bayesian tests. Formal Bayesian tests require prior probabilities for each hypothesis, p(Hi), require priors for parameters in non-point hypotheses, p(θ|Hi), and calculate posterior probabilities p(Hi|y) which depend on

Jarad Niemi (Iowa State) Midterm review March 6, 2018 12 / 14

SLIDE 67

What we have covered Model checking

Hypothesis testing (Section 7.4)

From a Bayesian perspective, Simple: Hi : θ = θi Composite: Hi : θ ∈ (θi, θi+1] Treat all simple (or all composite) hypotheses as formal Bayesian parameter estimation. Treat a mix of simple and composite hypotheses as formal Bayesian tests. Formal Bayesian tests require prior probabilities for each hypothesis, p(Hi), require priors for parameters in non-point hypotheses, p(θ|Hi), and calculate posterior probabilities p(Hi|y) which depend on the marginal likelihood, p(y|Hi).

Jarad Niemi (Iowa State) Midterm review March 6, 2018 12 / 14

SLIDE 68

What we have covered Decision theory

Decision theory (Sec 9.1)

In order to make a decision, a utility (or loss) function, i.e. U(θ, δ) = −L(θ, δ), must be set.

Jarad Niemi (Iowa State) Midterm review March 6, 2018 13 / 14

SLIDE 69

What we have covered Decision theory

Decision theory (Sec 9.1)

In order to make a decision, a utility (or loss) function, i.e. U(θ, δ) = −L(θ, δ), must be set. Then the optimal Bayesian decision is to maximize expected utility (or minimize expected loss), i.e. argmaxδ

U(θ, δ)p(θ)dθ

Jarad Niemi (Iowa State) Midterm review March 6, 2018 13 / 14

SLIDE 70

What we have covered Decision theory

Decision theory (Sec 9.1)

In order to make a decision, a utility (or loss) function, i.e. U(θ, δ) = −L(θ, δ), must be set. Then the optimal Bayesian decision is to maximize expected utility (or minimize expected loss), i.e. argmaxδ

U(θ, δ)p(θ)dθ

where p(θ) represents your current state of belief,

Jarad Niemi (Iowa State) Midterm review March 6, 2018 13 / 14

SLIDE 71

What we have covered Decision theory

Decision theory (Sec 9.1)

In order to make a decision, a utility (or loss) function, i.e. U(θ, δ) = −L(θ, δ), must be set. Then the optimal Bayesian decision is to maximize expected utility (or minimize expected loss), i.e. argmaxδ

U(θ, δ)p(θ)dθ

where p(θ) represents your current state of belief, i.e. it could be a prior or a posterior depending on your perspective.

Jarad Niemi (Iowa State) Midterm review March 6, 2018 13 / 14

SLIDE 72

What we have covered Stan

Stan

model = " data { int<lower=0> N; int<lower=0> n[N]; int<lower=0> y[N]; real s; } parameters { real<lower=0,upper=1> mu; real<lower=0> eta; } transformed parameters { real<lower=0> alpha; real<lower=0> beta; alpha <- eta * mu; beta <- eta * (1-mu); } model { mu ~ beta(20,30); eta ~ lognormal(0,s); y ~ beta_binomial(n,alpha,beta); } generated quantities { real<lower=0,upper=1> theta[N]; for (i in 1:N) theta[i] <- beta_rng(alpha+y[i], beta+n[i]-y[i]); } " Jarad Niemi (Iowa State) Midterm review March 6, 2018 14 / 14