An Introduction to Bayesian Inference and MCMC Methods for - - PowerPoint PPT Presentation

an introduction to bayesian inference and mcmc methods
SMART_READER_LITE
LIVE PREVIEW

An Introduction to Bayesian Inference and MCMC Methods for - - PowerPoint PPT Presentation

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture Trinity River Restoration Program Workshop on Outmigration: Population Estimation October 68, 2009 An Introduction to Bayesian Inference The Binomial Model 1


slide-1
SLIDE 1

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture

Trinity River Restoration Program Workshop on Outmigration: Population Estimation October 6–8, 2009

slide-2
SLIDE 2

An Introduction to Bayesian Inference

1

The Binomial Model Maximum Likelihood Estimation Bayesian Inference and the Posterior Density Summarizing the Posterior Density

2

MCMC Methods and the Binomial Model An Introduction to MCMC An Introduction to WinBUGS

3

Two-Stage Capture-Recapture Models The Simple-Petersen Model The Stratified-Petersen Model The Hierarchical-Petersen Model

4

Further Issues in Bayesian Statistics and MCMC Monitoring MCMC Convergence Model Selection and the DIC Goodness-of-Fit and Bayesian p-values

5

Bayesian Penalized Splines

slide-3
SLIDE 3

An Introduction to Bayesian Inference

1

The Binomial Model Maximum Likelihood Estimation Bayesian Inference and the Posterior Density Summarizing the Posterior Density

slide-4
SLIDE 4

An Introduction to Bayesian Inference

1

The Binomial Model Maximum Likelihood Estimation Bayesian Inference and the Posterior Density Summarizing the Posterior Density

slide-5
SLIDE 5

Maximum Likelihood Estimation

The Binomial Distribution

Setup

◮ a population contains a fixed and known number of marked

individuals (n)

Assumptions

◮ every individual has the same probability of being captured (p) ◮ individuals are captured independently

Probability Mass Function

The probability that m of n individuals are captured is: P(m|p) = n m

  • pm(1 − p)n−m

Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 5/60

slide-6
SLIDE 6

Maximum Likelihood Estimation

The Binomial Distribution

If n = 30 and p = .8:

5 10 15 20 25 30 0.00 0.05 0.10 0.15

Probability Mass Function

m Probability

Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 6/60

slide-7
SLIDE 7

Maximum Likelihood Estimation

The Likelihood Function

Definition

The likelihood function is equal to the probability mass function of the observed data allowing the parameter values to change while the data is fixed. The likelihood function for the binomial experiment is: L(p|m) = n m

  • pm(1 − p)n−m

Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 7/60

slide-8
SLIDE 8

Maximum Likelihood Estimation

The Likelihood Function

If n = 30 and m = 24:

0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 p Likelihood

Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 8/60

slide-9
SLIDE 9

Maximum Likelihood Estimation

Maximum Likelihood Estimates

Definition

The maximum likelihood estimator is the value of the parameter which maximizes the likelihood function for the observed data. The maximum likelihood estimator of p for the binomial experiment is: ˆ p = m n

Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 9/60

slide-10
SLIDE 10

Maximum Likelihood Estimation

Maximum Likelihood Estimates

If n = 30 and m = 24 then ˆ p = 24

30 = .8:

0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 p Likelihood

  • Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation

10/60

slide-11
SLIDE 11

Maximum Likelihood Estimation

Measures of Uncertainty

Imagine that the same experiment could be repeated many times without changing the value of the parameter.

Definition 1

The standard error of the estimator is the standard deviation of the estimates computed from each of the resulting data sets.

Definition 2

A 95% confidence interval is a pair of values which, computed in the same manner for each data set, would bound the true value for at least 95% of the repetitions. The standard error for the capture probability is: SEp =

  • ˆ

p(1 − ˆ p)/m. A 95% confidence interval has bounds: ˆ p − 1.96SEp and ˆ p + 1.96SEp.

Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation 11/60

slide-12
SLIDE 12

Maximum Likelihood Estimation

Measures of Uncertainty

If n = 30 and m = 24 then:

◮ the standard error of ˆ

p is: SEp = .07

◮ a 95% confidence interval for ˆ

p is: (.66,.94)

0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 p Likelihood

  • Introduction to Bayesian Inference: The Binomial Model, Maximum Likelihood Estimation

12/60

slide-13
SLIDE 13

An Introduction to Bayesian Inference

1

The Binomial Model Maximum Likelihood Estimation Bayesian Inference and the Posterior Density Summarizing the Posterior Density

slide-14
SLIDE 14

Bayesian Inference and the Posterior Density

Combining Data from Multiple Experiments

Pilot Study

◮ Data: n = 20, m = 10 ◮ Likelihood:

20

10

  • p10(1 − p)10

Full Experiment

◮ Data: n = 30, m = 24 ◮ Likelihood:

30

24

  • p24(1 − p)6

Combined Analysis

◮ Likelihood:

20

10

30

24

  • p34(1 − p)16

◮ Estimate: ˆ

p = 34

50 = .68

Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 14/60

slide-15
SLIDE 15

Bayesian Inference and the Posterior Density

Combining Data with Prior Beliefs I

Prior Beliefs

◮ Hypothetical Data: n = 20, m = 10 ◮ Prior Density:

20

10

  • p10(1 − p)10

Full Experiment

◮ Data: n = 30, m = 24 ◮ Likelihood:

30

24

  • p24(1 − p)6

Posterior Beliefs

◮ Posterior Density:

20

10

30

24

  • p34(1 − p)16

◮ Estimate: ˆ

p = 34

50 = .68

Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 15/60

slide-16
SLIDE 16

Bayesian Inference and the Posterior Density

Combining Data with Prior Beliefs I

p 0.0 0.2 0.4 0.6 0.8 1.0 Likelihood Prior Posterior

Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 16/60

slide-17
SLIDE 17

Bayesian Inference and the Posterior Density

Combining Data with Prior Beliefs II

Prior Beliefs

◮ Hypothetical Data: n = 2, m = 1 ◮ Prior Density:

2

1

  • p(1 − p)

Full Experiment

◮ Data: n = 30, m = 24 ◮ Likelihood:

30

24

  • p24(1 − p)6

Posterior Beliefs

◮ Posterior Density:

30

24

2

1

  • p25(1 − p)7

◮ Estimate: ˆ

p = 25

32 = .78

Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 17/60

slide-18
SLIDE 18

Bayesian Inference and the Posterior Density

Combining Data with Prior Beliefs II

p 0.0 0.2 0.4 0.6 0.8 1.0 Likelihood Prior Posterior

Introduction to Bayesian Inference: The Binomial Model, Bayesian Inference and the Posterior Density 18/60

slide-19
SLIDE 19

An Introduction to Bayesian Inference

1

The Binomial Model Maximum Likelihood Estimation Bayesian Inference and the Posterior Density Summarizing the Posterior Density

slide-20
SLIDE 20

Summarizing the Posterior Density

Fact

A Bayesian posterior density is a true probability density which can be used to make direct probability statements about a parameter.

Introduction to Bayesian Inference: The Binomial Model, Summarizing the Posterior Density 20/60

slide-21
SLIDE 21

Summarizing the Posterior Density

Bayesian Point Estimates

Classical Point Estimates

◮ Maximum Likelihood Estimate

Bayesian Point Estimates

◮ Posterior Mode

Introduction to Bayesian Inference: The Binomial Model, Summarizing the Posterior Density 21/60

slide-22
SLIDE 22

Summarizing the Posterior Density

Bayesian Point Estimates

Classical Point Estimates

◮ Maximum Likelihood Estimate

Bayesian Point Estimates

◮ Posterior Mode ◮ Posterior Mean ◮ Posterior Median

Introduction to Bayesian Inference: The Binomial Model, Summarizing the Posterior Density 21/60

slide-23
SLIDE 23

Summarizing the Posterior Density

Bayesian Uncertainty Estimates

Classical Measures of Uncertainty

◮ Standard Error ◮ 95% Confidence Interval

Bayesian Measures of Uncertainty

◮ Posterior Standard Deviation:

The standard deviation of the posterior density.

◮ 95% Credible Interval:

Any interval which contains 95% of the posterior density.

Introduction to Bayesian Inference: The Binomial Model, Summarizing the Posterior Density 22/60

slide-24
SLIDE 24

Exercises

  • 1. Bayesian inference for the binomial experiment

File: Intro to splines\Exercises\binomial 1.R This file contains code for plotting the prior density, likelihood function, and posterior density for the binomial model. Vary the values of n, m, and alpha to see how the shapes of these functions and the corresponding posterior summaries are affected.

Introduction to Bayesian Inference: The Binomial Model, Summarizing the Posterior Density 23/60

slide-25
SLIDE 25

An Introduction to Bayesian Inference

2

MCMC Methods and the Binomial Model An Introduction to MCMC An Introduction to WinBUGS

slide-26
SLIDE 26

An Introduction to Bayesian Inference

2

MCMC Methods and the Binomial Model An Introduction to MCMC An Introduction to WinBUGS

slide-27
SLIDE 27

An Introduction to MCMC

Sampling from the Posterior

Concept

If the posterior density is too complicated, then we can estimate posterior quantities by generating a sample from the posterior density and computing sample statistics.

Introduction to Bayesian Inference: MCMC Methods and the Binomial Model, An Introduction to MCMC 26/60

slide-28
SLIDE 28

An Introduction to MCMC

The Very Basics of Markov chain Monte Carlo

Definition

A Markov chain is a sequence of events such that the probabilities for one event depend only on the outcome of the previous event in the sequence.

Key Property

If we choose construct the Markov chain properly then the probability density of the events can be made to match any probability density – including the posterior density.

Introduction to Bayesian Inference: MCMC Methods and the Binomial Model, An Introduction to MCMC 27/60

slide-29
SLIDE 29

An Introduction to MCMC

The Very Basics of Markov chain Monte Carlo

Definition

A Markov chain is a sequence of events such that the probabilities for one event depend only on the outcome of the previous event in the sequence.

Key Property

If we choose construct the Markov chain properly then the probability density of the events can be made to match any probability density – including the posterior density.

Implication

We can use a carefully constructed chain to generate a sample any complicated posterior density.

Introduction to Bayesian Inference: MCMC Methods and the Binomial Model, An Introduction to MCMC 27/60

slide-30
SLIDE 30

An Introduction to Bayesian Inference

2

MCMC Methods and the Binomial Model An Introduction to MCMC An Introduction to WinBUGS

slide-31
SLIDE 31

An Introduction to WinBUGS

WinBUGS for the Binomial Experiment

Intro to splines\Exercises\binomial model winbugs.txt ## 1) Model definition model binomial{ ## Likelihood function m ~ dbin(p,n) ## Prior distribution p ~ dbeta (1 ,1) } ## 2) Data list list(n=30,m=24) ## 3) Initial values list(p=.8)

Introduction to Bayesian Inference: MCMC Methods and the Binomial Model, An Introduction to WinBUGS 29/60

slide-32
SLIDE 32

Exercises

  • 1. WinBUGS for the Binomial Experiment

Intro to splines\Exercises\binomial model winbugs.txt Use the provided code to implement the binomial model in WinBUGS . Change the parameters of the prior distribution for p, a and b, so that they are both equal to 1 and recompute the posterior summaries.

Introduction to Bayesian Inference: MCMC Methods and the Binomial Model, An Introduction to WinBUGS 30/60

slide-33
SLIDE 33

An Introduction to Bayesian Inference

3

Two-Stage Capture-Recapture Models The Simple-Petersen Model The Stratified-Petersen Model The Hierarchical-Petersen Model

slide-34
SLIDE 34

An Introduction to Bayesian Inference

3

Two-Stage Capture-Recapture Models The Simple-Petersen Model The Stratified-Petersen Model The Hierarchical-Petersen Model

slide-35
SLIDE 35

The Simple-Petersen Model

Model Structure

Notation

◮ n/m=# of marked individuals alive/captured ◮ U/u=# of unmarked individuals alive/captured

Model

◮ Marked sample: m ∼ Binomial(n, p) ◮ Unmarked sample: u ∼ Binomial(U, p)

Prior Densities

◮ p: p ∼ Beta(a, b) ◮ U: log(U) ∝ 1 (Jeffrey’s prior)

Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Simple-Petersen Model 33/60

slide-36
SLIDE 36

The Simple-Petersen Model

WinBUGS Implementation

Intro to splines\Exercises\cr winbugs.txt

Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Simple-Petersen Model 34/60

slide-37
SLIDE 37

An Introduction to Bayesian Inference

3

Two-Stage Capture-Recapture Models The Simple-Petersen Model The Stratified-Petersen Model The Hierarchical-Petersen Model

slide-38
SLIDE 38

The Stratified-Petersen Model

Model Structure

Notation

◮ ni/mi=# of marked individuals alive/captured on day i ◮ Ui/ui=# of unmarked individuals alive/captured on day i

Model

◮ Marked sample: mi ∼ Binomial(ni, pi), i = 1, . . . , s ◮ Unmarked sample: ui ∼ Binomial(Ui, pi), i = 1, . . . , s

Prior Densities

◮ pi: pi ∼ Beta(a, b), i = 1, . . . , s ◮ Ui: log(Ui) ∝ 1, i = 1, . . . , s

Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Stratified-Petersen Model 36/60

slide-39
SLIDE 39

The Stratified-Petersen Model

WinBUGS Implementation

Intro to splines\Exercises\cr stratified winbugs.txt

Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Stratified-Petersen Model 37/60

slide-40
SLIDE 40

Exercises

  • 1. The Stratified-Petersen Model

Intro to splines\Exercises\cr stratified winbugs.txt Use the provided code to implement the stratified-Petersen model for the simulated data set and produce a boxplot for the values of p (if you didn’t specify p in the sample monitor than you will need to do so and re-run the chain). Notice that the 95% credible intervals are much wider for some values of pi than for others. Why is this?

Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Stratified-Petersen Model 38/60

slide-41
SLIDE 41

An Introduction to Bayesian Inference

3

Two-Stage Capture-Recapture Models The Simple-Petersen Model The Stratified-Petersen Model The Hierarchical-Petersen Model

slide-42
SLIDE 42

The Hierarchical-Petersen Model

Model Structure

Notation

◮ ni/mi=# of marked individuals alive/captured on day i ◮ Ui/ui=# of unmarked individuals alive/captured on day i

Model

◮ Marked sample: mi ∼ Binomial(ni, pi), i = 1, . . . , s ◮ Unmarked sample: ui ∼ Binomial(Ui, pi), i = 1, . . . , s ◮ Capture probabilities: log(pi/(1 − pi)) = ηp i

Prior Densities

◮ ηp i : ηp i ∼ N(µ, τ 2), i = 1, . . . , s ◮ µ, τ: µ ∼ N(0, 10002), τ ∼ Γ−1(.01, .01) ◮ Ui: log(Ui) ∝ 1, i = 1, . . . , s

Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Hierarchical-Petersen Model 40/60

slide-43
SLIDE 43

The Hierarchical-Petersen Model

WinBUGS Implementation

Intro to splines\Exercises\cr hierarchical winbugs.txt

Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Hierarchical-Petersen Model 41/60

slide-44
SLIDE 44

Exercises

  • 1. Bayesian inference for the hierarchical Petersen model

Intro to splines\Exercises\cr hierarchical 2 winbugs.txt The hierarchical model can be used even in the more extreme case in which no marked fish are released in one stratum or the number of recoveries is missing, so that there is no direct information about the capture probability. This file contains the code for fitting the hierarchical model to the simulated data, except that some of the values of ni have been replaced by the value NA, WinBUGS notation for missing data. Run the model and produce boxplots for U and p. Note that you will have to use the gen inits button in the Specification Tool window to generate initial values for the missing data after loading the initial values for p and U.

Introduction to Bayesian Inference: Two-Stage Capture-Recapture Models, The Hierarchical-Petersen Model 42/60

slide-45
SLIDE 45

An Introduction to Bayesian Inference

4

Further Issues in Bayesian Statistics and MCMC Monitoring MCMC Convergence Model Selection and the DIC Goodness-of-Fit and Bayesian p-values

slide-46
SLIDE 46

An Introduction to Bayesian Inference

4

Further Issues in Bayesian Statistics and MCMC Monitoring MCMC Convergence Model Selection and the DIC Goodness-of-Fit and Bayesian p-values

slide-47
SLIDE 47

Monitoring Convergence

Traceplots

Definition

The traceplot for a Markov chain displays the generated values versus the iteration number. Traceplot for U1 from the hierarchical-Petersen model:

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Monitoring MCMC Convergence 45/60

slide-48
SLIDE 48

Monitoring Convergence

Traceplots and Mixing

Poor Mixing Good Mixing

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Monitoring MCMC Convergence 46/60

slide-49
SLIDE 49

Monitoring Convergence

MC Error

Definition

The MC error is the amount uncertainty in the posterior summaries due to approximation by a finite sample. Posterior summary for U1 after 10,000 iterations:

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Monitoring MCMC Convergence 47/60

slide-50
SLIDE 50

Monitoring Convergence

MC Error

Definition

The MC error is the amount uncertainty in the posterior summaries due to approximation by a finite sample. Posterior summary for U1 after 100,000 iterations:

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Monitoring MCMC Convergence 47/60

slide-51
SLIDE 51

Monitoring Convergence

Thinning

Definition

A chain is thinned if only a subset of the generated values are stored and used to compute summary statistics. Summary statistics for U[1] – 100,000 iterations: Summary statistics for U[1] – 100,000 iterations thinned by 10:

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Monitoring MCMC Convergence 48/60

slide-52
SLIDE 52

Monitoring Convergence

Burn-in Period

Definition

The burn-in period is the number of iterations necessary for the chain to converge to the posterior distribution.

Multiple Chains

The burn-in period can be assessed by running several chains with different starting values:

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Monitoring MCMC Convergence 49/60

slide-53
SLIDE 53

Monitoring Convergence

The Brooks-Gelman-Rubin Diagnostic

Definition

The Brooks-Gelman-Rubin convergence diagnostic compares the posterior summaries for the separate samples from each chain and the posterior summaries from the pooled sample from all chains. These should be equal at convergence. Brooks-Gelman-Rubin diagnostic plot for µ after 100,000 iterations:

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Monitoring MCMC Convergence 50/60

slide-54
SLIDE 54

Exercises

  • 1. Bayesian inference for the hierarchical Petersen model:

convergence diagnostics Intro to splines\Exercises\cr hierarchical bgr winbugs.txt This file contains code to run three parallel chains for the hierarchical-Petersen model. Implement the model and then produce traceplots and compute the Brooks-Gelman-Rubin

  • diagnostics. To initialize the model you will need to enter 3 in

the num of chains dialogue and then load the three sets of initial values one at a time.

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Monitoring MCMC Convergence 51/60

slide-55
SLIDE 55

An Introduction to Bayesian Inference

4

Further Issues in Bayesian Statistics and MCMC Monitoring MCMC Convergence Model Selection and the DIC Goodness-of-Fit and Bayesian p-values

slide-56
SLIDE 56

Model Selection

The Principle of Parsimony

Concept

The most parsimonious model is the one that best explains the data with the fewest number of parameters.

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Model Selection and the DIC 53/60

slide-57
SLIDE 57

Model Selection

The Deviance Information Criterion

Definition 1

The pD value for a model is an estimate of the effective number of parameters in the model – the number of unique and estimable parameters.

Definition 2

The Deviance Information Criterion (DIC) is a penalized form of the likelihood that accounts for the number of parameters in a model, as measured by pD. Smaller values are better.

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Model Selection and the DIC 54/60

slide-58
SLIDE 58

An Introduction to Bayesian Inference

4

Further Issues in Bayesian Statistics and MCMC Monitoring MCMC Convergence Model Selection and the DIC Goodness-of-Fit and Bayesian p-values

slide-59
SLIDE 59

Goodness-of-Fit

Posterior Prediction

Concept

If the model fits well then new data simulated from the model and the parameter values generated from the posterior should be similar to the observed data.

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Goodness-of-Fit and Bayesian p-values56/60

slide-60
SLIDE 60

Goodness-of-Fit

Bayesian p-value

Definition 1

A discrepancy measure is a function of both the data and the parameters that asses the fit of some part of the model.

Example

D(u, U, p) =

n

  • i=1

(√ui −

  • Uipi)2

Definition 2

The Bayesian p-value is the proportion of times the discrepancy of the observed data is less than the discrepancy of the simulated

  • data. Bayesian p-values near 0 indicate lack of fit.

Introduction to Bayesian Inference: Further Issues in Bayesian Statistics and MCMC, Goodness-of-Fit and Bayesian p-values57/60

slide-61
SLIDE 61

An Introduction to Bayesian Inference

5

Bayesian Penalized Splines

slide-62
SLIDE 62

Bayesian Penalized Splines

Concept

We can control the smoothness of a B-spline by assigning a prior density to the differences in the coefficients. Specifically, we would like our prior to favour smoothness but allow for sharp changes if the data warrants.

Introduction to Bayesian Inference: Bayesian Penalized Splines, 59/60

slide-63
SLIDE 63

Bayesian Penalized Splines

Model Structure

B-spline yi = K+D+1

k=1

bkBk(xi) + ǫi Error ǫi ∼ N(0, σ2)

Hierarchical Prior Density for Spline Coefficients

Level 1 (bk − bk−1) ∼ N(bk−1 − bk−2, (1/λ)2) Level 2 λ ∼ Γ(.05, .05) The parameter λ plays the same role as the smoothing parameter:

◮ if λ is big then bk ≈ bk−1 and the spline is smooth, ◮ if λ is small then bk and bk−1 can be very different.

Introduction to Bayesian Inference: Bayesian Penalized Splines, 60/60