Lecture 22: Continued Introduction to Bayesian Inference and Use of the MCMC Algorithm for Inference
Jason Mezey jgm45@cornell.edu May 4, 2016 (Th) 8:40-9:55
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 - - PowerPoint PPT Presentation
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 22: Continued Introduction to Bayesian Inference and Use of the MCMC Algorithm for Inference Jason Mezey jgm45@cornell.edu May 4, 2016 (Th) 8:40-9:55 Announcements
Jason Mezey jgm45@cornell.edu May 4, 2016 (Th) 8:40-9:55
see the last two lectures of last year
have a probability distribution associated with them that reflects our belief in the values that might be the true value of the parameter
joint distribution of the parameter AND a sample Y produced under a probability model:
certain value given a sample:
sample) we can rewrite this as follows:
Pr(θ ∩ Y)
Pr(θ|y)
Pr(θ|y) = Pr(y|θ)Pr(θ) Pr(y)
Pr(θ|y) ∝ Pr(y|θ)Pr(θ)
likelihood (!!):
values the true parameter value may take
where we make one assumption: 1. the probability distribution that generated the sample, 2. the probability distribution of the parameter
Pr(θ|y) ∝ Pr(y|θ)Pr(θ)
t Pr(θ|y) , i.e. the
t Pr(θ) i
| ∝ | Pr(y|θ) = L(θ|y)
form of the likelihood?):
dom variable Y ∼ N(µ, σ2)
2
nce, and use a math- r Pr(µ) ∼ N(κ, φ2),
Pr(θ|y) ∝ Pr(y|θ)Pr(θ)
Pr(µ|y) ∝ n Y
i=1
1 √ 2πσ2 e
−(yi−µ)2 2σ2
! 1 p 2πφ2 e
−(µ−κ)2 2φ2
Pr(µ|y) ∼ N ( κ
σ2 + Pn
i yi
σ2 )
( 1
φ2 + n σ2 ) , ( 1
φ2 + n σ2 )−1 !
both estimation and hypothesis testing
estimators using the posterior probability distribution, for example:
although in this class and in many cases in practice we will use a “psuedo- Bayesian” approach were we assess if the credible interval (e.g. the 0.95 c.i.) of the posterior distribution overlaps the value of the parameter under the null hypothesis
(Frequentist) framework since estimator construction is fundamentally different (!!) ˆ θ = mean(θ|y) = Z θPr(θ|y)dθ
ˆ θ = median(θ|y)
| ˆ µ = median(µ|y) = mean(µ|y) = ( κ
σ2 + n¯ y σ2 )
( 1
φ2 + n σ2 )
(note that we will focus on the linear regression model but we can perform Bayesian inference for any GLM!):
two parameters
Y = µ + Xaa + Xdd + ✏ ✏ ⇠ N(0, 2
✏ )
y = x + ✏ ✏ ⇠ multiN(0, I2
✏ )
poses of mapping, we ar s H0 : a = 0\d = 0
HA : a 6= 0 [ d 6= 0
distribution for the prior
normal (!!):
Pr(βµ, βa, βd, σ2
✏ ) =
Pr(βµ, βa, βd, σ2
✏ ) = Pr(βµ)Pr(βa)Pr(βd)Pr(σ2 ✏ )
Pr(βµ) = Pr(βa) = Pr(βd) = c Pr(σ2
✏ ) = c
Pr(βµ, βa, βd, σ2
✏ |y) ∝ Pr(y|βµ, βa, βd, σ2 ✏ )
Pr(θ|y) ∝ (σ2
✏ ) − n
2 e (y−x)T(y−x) 22 ✏
interested in is:
y = x + ✏ ✏ ⇠ multiN(0, I2
✏ )
Pr(µ, a, d, 2
✏ |y) / Pr(y|µ, a, d, 2 ✏ )Pr(µ, a, d, 2 ✏ )
Pr(βµ, βa, βd, σ2
✏ |y) ∝ Pr(y|βµ, βa, βd, σ2 ✏ )
Pr(βa, βd|y) = ⌦ ∞ ⌦ ∞
−∞
Pr(βµ, βa, βd, σ2
⇥ |y)dβµdσ2 ⇥
interval for our genetic null hypothesis and test a marker for a phenotype association and we can perform a GWAS by doing this for each marker (!!)
Pr(βa, βd|y) = Z ∞
−∞
Z ∞ Pr(βµ, βa, βd, σ2
✏ |y)dβµdσ2 ✏ ∼ multi-t-distribution
mean(Pr(βa, βd|y)) = h ˆ βa, ˆ βd iT = C−1 [Xa, Xd]T y cov = (y − [Xa, Xd] h ˆ βa, ˆ βd iT )T(y − [Xa, Xd] h ˆ βa, ˆ βd iT ) n − 6 C−1 C = XT
a Xa
XT
a Xd
XT
d Xa
XT
d Xd
f(multi−t) = n − 4
Pr(βa, βd|y) β Pr(βa, βd|y) β
Pr(βa, βd|y)
Pr(βa, βd|y)
βa βa
βd
βa βa
βd βd βd
0.95 credible interval 0.95 credible interval
Cannot reject H0! Reject H0!
simple closed form of the overall posterior
to put together more complex priors with our likelihood or consider a more complicated likelihood equation (e.g. for a logistic regression!)
still need to determine the credible interval from the posterior (or marginal) probability distribution so we need to determine the form
Markov chain Monte Carlo (MCMC) algorithm for this purpose
to consider models from another branch of probability (remember, probability is a field much larger than the components that we use for statistics / inference!): Stochastic processes
vectors (variables) with defined conditional relationships, often indexed by a ordered set t
this probability sub-field: Markov processes (or more specifically Markov chains)
accurately, a set of random vectors), which we will index with t:
property:
chain are in the same class of random variables (e.g. Bernoulli, normal, etc.) we allow the parameters of these random variables to be different, e.g. at time t and t+1
Xt, Xt+1, Xt+2, ...., Xt+k Xt, Xt−1, Xt−2, ...., Xt−k
− − −
Pr(Xt, |Xt−1, Xt−2, ...., Xt−k) = Pr(Xt, |Xt−1)
variable in the chain has a Bernoulli distribution:
(since it is just a random vector with a probability distribution!):
distributions are the same (=they do not change over time)
such stationary distributions
1,0,...,1,1 0,1,...,1,1 0,0,...,0,0 0,1,...,0,0
X1, X2..., X1001, X1002
X1 ⇠ Bern(0.2), X2 ⇠ Bern(0.45), ..., X1001 ⇠ Bern(0.4), X1002 ⇠ Bern(0.4)
prove that the chain will evolve (more accurately converge!) to a unique (!!) stationary distribution and will not leave this stationary distribution (where is it often possible to determine the parameters for the stationary distribution!)
may be very large, e.g. infinite), we will reach a point where each following random variable is in the unique stationary distribution:
chain that evolves to a unique stationary distribution that is exactly the posterior probability distribution that we are interested in (!!!)
reach this stationary distribution and then we will take a sample from this chain to determine (or more accurately approximate) our posterior
|
− − −
Pr(Xt+k) = Pr(Xt+k+1) = ...
| MCMC = Xt+k, Xt+k+1, Xt+k+2, ...., Xt+k+m Sample = 0.1, −0.08, −1.4, ...., 0.5
Pr(µ|y)
ˆ θ = median(Pr(θ|y) ' median(θ[tab], ..., θ[tab+k])
t = 0 or any subsequent iteration.
Pr(θ[t]|y)J(θ∗|θ[t]).
θ[t]) = 1 min(r, 1).
the ‘burn-in’ phase, where the realizations of θ[t] start to behave as though they are sampled from the stationary distribution of the Metropolis-Hastings Markov chain (we will discuss how many iterations are necessary for a burn-in below).
the posterior distribution and perform Bayesian inference.
likelihood) and our prior (which we provide!), and our goal is then to construct an MCMC with a stationary distribution (which we will sample to get the posterior “histogram”:
distribution
the Gibbs sampler (requires no rejections!), which samples each parameter from the conditional posterior distributions (which requires you derive these relationships = not always possible!)
Pr(βµ|βa, βd, σ2
✏ , y)
Pr(βa|βµ, βd, σ2
✏ , y)
Pr(βd|βµ, βa, σ2
✏ , y)
Pr(σ2
✏ |βµ, βa, βd, y)
θ[t] = βµ βa βd σ2
✏
[t]
θ[t+1] = βµ βa βd σ2
✏
[t+1]
practical
Bayesian data analysis is when computers increased in speed
MCMC approaches in genetic analysis has steadily increased
algorithms can be inefficient (they take a long time to converge, they do not sample modes of a complex posterior efficiently, etc.)
genetic inference, e.g. variational Bayes
basics of classic quantitative genetics (additive genetic variance and heritability)!