 
              Workshop 7.2b: Introduction to Bayesian models Murray Logan 07 Feb 2017
Section 1 Frequentist vs Bayesian
Frequentist • P(DH) • long-run frequency • simple analytical methods to solve roots • conclusions pertain to data, not parameters or hypotheses • compared to theoretical distribution when NULL is true • probability of obtaining observed data or MORE EXTREME data
Frequentist • P-value ◦ probabulity of rejecting NULL ◦ NOT a measure of the magnitude of an effect or degree of significance! ◦ measure of whether the sample size is large enough • 95% CI ◦ NOT about the parameter it is about the interval ◦ does not tell you the range of values likely to contain the true mean
------------------------------------------------- Random, $P(H|D)$ $P(D|H)$ Degree of belief Long-run frequency Probability Parameters Data Inferences distribution Fixed, true ------------------------------------------------- Parameters Fixed, true One possible Obs. data ------------ ------------ -------------- Bayesian Frequentist Frequentist vs Bayesian
Frequentist vs Bayesian ● 250 250 250 ● ● ● ● ● ● ● ● ● ● 200 ● ● ● ● ● ● ● ● ● ● 200 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 150 150 150 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 100 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 50 50 ● 0 0 0 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 x x x n: 10 Slope: -0.1022 t: -2.3252 p: 0.0485 n: 10 Slope: -10.2318 t: -2.2115 p: 0.0579 n: 100 Slope: -10.4713 t: -6.6457 p: 1.7101362  10-9
Frequentist vs Bayesian 250 250 ● ● 200 ● ● ● ● ● ● ● ● ● ● 200 ● ● 150 150 ● ● ● ● ● 100 100 ● 50 50 0 0 2 4 6 8 10 2 4 6 8 10 x x Population A Population B Percentage 0.46 45.46 change Prob. 5% decline 0 0.86
Section 2 Bayesian Statistics
Bayesian u l e s r a y e B P ( D | H ) × P ( H ) P ( H | D ) = P ( D ) posterior belief likelihood × prior probability ( probability ) = normalizing constant
Bayesian u l e s r y e B a P ( H | D ) = P ( D | H ) × P ( H ) P ( D ) posterior belief likelihood × prior probability ( probability ) = normalizing constant The normalizing constant is required for probability - turn a frequency distribution into a probability distribution
Estimation: OLS
Estimation: Likelihood P ( D | H )
Bayesian • conclusions pertain to hypotheses • computationally robust (sample size,balance,collinearity) • inferential flexibility - derive any number of inferences
Bayesian • subjectivity? • intractable P ( D | H ) × P ( H ) P ( H | D ) = P ( D ) P ( D ) - probability of data from all possible hypotheses
MCMC sampling Marchov Chain Monte Carlo sampling • draw samples proportional to likelihood two parameters and infinitely vague priors - posterior likelihood only likelihood multivariate normal
MCMC sampling Marchov Chain Monte Carlo sampling • draw samples proportional to likelihood
MCMC sampling Marchov Chain Monte Carlo sampling • draw samples proportional to likelihood
MCMC sampling Marchov Chain Monte Carlo sampling • chain of samples
MCMC sampling Marchov Chain Monte Carlo sampling • 1000 samples
MCMC sampling Marchov Chain Monte Carlo sampling • 10,000 samples
MCMC sampling Marchov Chain Monte Carlo sampling • Aim: samples reflect posterior frequency distribution • samples used to construct posterior prob. dist. • the sharper the multidimensional features - more samples • chain should have traversed entire posterior • inital location should not influence
MCMC diagnostics o t s p l a c e T r
MCMC diagnostics n t i o e l a o r r t o c A u • Summary stats on non-independent values are biased • Thinning factor = 1
MCMC diagnostics n t i o e l a o r r t o c A u • Summary stats on non-independent values are biased • Thinning factor = 10
MCMC diagnostics n t i o e l a o r r t o c A u • Summary stats on non-independent values are biased • Thinning factor = 10, n=10,000
MCMC diagnostics s i o n b u t t r i D i s o f t P l o
Sampler types Metropolis-Hastings http://twiecki.github.io/blog/2014/01/02/visualizing-mcmc/
Sampler types Gibbs
Sampler types NUTS
Sampling • thinning • burning (warmup) • chains
Bayesian software (for R) • MCMCpack • winbugs (R2winbugs) • jags (R2jags) • stan (rstan, brms)
summary() stanplot(,type=) stancode() residuals() logLik() fitted() standata() predict() marginal_effects() coef() influence.measures() plot() BRMS Extractor Description Residuals Predicted values Predict new responses Extract model coefficients Diagnostic plots More diagnostic plots Partial effects Extract log-likelihood LOO() and WAIC() Calculate WAIC and LOO Leverage, Cooks D Model output Model passed to stan Data list passed to stan
Section 3 Worked Examples
84 80 90 154 148 169 206 244 212 248 > summary (fert) 1st Qu.: 81.25 : 80.0 Min. : 25.00 Min. YIELD FERTILIZER 169 Median :137.50 150 6 148 125 5 154 100 4 1st Qu.:104.5 Median :161.5 75 > str (fert) : int $ YIELD 25 50 75 100 125 150 175 200 225 250 $ FERTILIZER: int 2 variables: 10 obs. of 'data.frame': :248.0 Mean Max. :250.00 Max. 3rd Qu.:210.5 3rd Qu.:193.75 :163.5 Mean :137.50 90 3 > fert <- read.csv ('../data/fertilizer.csv', strip.white=T) 75 148 125 5 154 100 4 90 3 150 80 50 2 84 25 1 FERTILIZER YIELD > fert 6 169 80 250 50 2 84 25 1 FERTILIZER YIELD > head (fert) 248 10 7 212 225 9 244 200 8 206 175 Worked Examples
Worked Examples Question: is there a relationship between fertilizer concentration and grass yield? Linear model: Frequentist ε ∼ N (0 , σ 2 ) y i = β 0 + β 1 x i + ε i Bayesian y i ∼ N ( η i , σ 2 ) η i = β 0 + β 1 x i β 0 ∼ N (0 , 1000) β 1 ∼ N (0 , 1000) σ 2 ∼ cauchy (0 , 4)
Recommend
More recommend