hierarchical models
play

Hierarchical models Dr. Jarad Niemi Iowa State University August - PowerPoint PPT Presentation

Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let ind N ( g , 2 ) Y ig for i = 1 , . . . , n g , g = 1 , .


  1. Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31

  2. Normal hierarchical model Let ind ∼ N ( θ g , σ 2 ) Y ig for i = 1 , . . . , n g , g = 1 , . . . , G , and � G g =1 n g = n . Now consider the following model assumptions: ind ∼ N ( µ, τ 2 ) θ g ind θ g ∼ La ( µ, τ ) ind ∼ t v ( µ, τ 2 ) θ g ind ∼ πδ 0 + (1 − π ) N ( µ, τ 2 ) θ g ind ∼ πδ 0 + (1 − π ) t v ( µ, τ 2 ) θ g To perform a Bayesian analysis, we need a prior on µ , τ 2 , and (in the case of the discrete mixture) π . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 2 / 31

  3. Gibbs sampling Normal hierarchical model Consider the model ind ∼ N ( θ g , σ 2 ) Y ig ind ∼ N ( µ, τ 2 ) θ g where i = 1 , . . . , n g , g = 1 , . . . , G , and n = � G g =1 n g with prior distribution 1 p ( µ, σ 2 , τ 2 ) = p ( µ ) p ( σ 2 ) p ( τ 2 ) ∝ σ 2 Ca + ( τ ; 0 , C ) . For background on why we are using these priors for the variances, see Gelman (2006) https://projecteuclid.org/euclid.ba/1340371048 : “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)”. Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 3 / 31

  4. Gibbs sampling Multi-step Gibbs sampler for normal hierarchical model Here is a possible Gibbs sampler for this model: For g = 1 , . . . , G , sample θ g ∼ p ( θ g | · · · ). Sample σ 2 ∼ p ( σ 2 | · · · ). Sample µ ∼ p ( µ | · · · ). Sample τ 2 ∼ p ( τ 2 | · · · ). How many steps exist in this Gibbs sampler? G+3? 4? Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 4 / 31

  5. Gibbs sampling 2-Step 2-Step Gibbs sampler for normal hierarchical model Here is a 2-step Gibbs sampler: 1. Sample θ = ( θ 1 , . . . , G ) ∼ p ( θ | · · · ). 2. Sample µ, σ 2 , τ 2 ∼ p ( µ, σ 2 , τ 2 | · · · ). There is stronger theoretical support for 2-step Gibbs sampler, thus, if we can, it is prudent to construct a 2-step Gibbs sampler. Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 5 / 31

  6. Gibbs sampling Sampling θ Sampling θ The full conditional for θ is ∝ p ( θ, µ, σ 2 , τ 2 | y ) p ( θ | · · · ) ∝ p ( y | θ, σ 2 ) p ( θ | µ, τ 2 ) p ( µ, σ 2 , τ 2 ) ∝ p ( y | θ, σ 2 ) p ( θ | µ, τ 2 ) = � G g =1 p ( y g | θ g , σ 2 ) p ( θ g | µ, τ 2 ) where y g = ( y 1 g , . . . , y n g g ). We now know that the θ g are conditionally independent of each other. Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 6 / 31

  7. Gibbs sampling Sampling θ g Sampling θ g The full conditional for θ g is ∝ p ( y g | θ g , σ 2 ) p ( θ g | µ, τ 2 ) p ( θ g | · · · ) = � n g i =1 N ( y ig ; θ g , σ 2 ) N ( θ g ; µ, τ 2 ) Notice that this does not include θ g ′ for any g ′ � = g . This is an alternative way to conclude that the θ g are conditionally independent of each other. Thus θ g | · · · ind ∼ N ( µ g , τ 2 g ) where = [ τ − 2 + n g σ − 2 ] − 1 τ 2 g g [ µτ − 2 + y g n g σ − 2 ] = τ 2 µ g � n g 1 = i =1 y ig . y g n g Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 7 / 31

  8. Sampling µ, σ 2 , τ 2 Gibbs sampling Sampling µ, σ 2 , τ 2 The full conditional for µ, σ 2 , τ 2 is p ( µ, σ 2 , τ 2 | · · · ) ∝ p ( y | θ, σ 2 ) p ( θ | µ, τ 2 ) p ( µ ) p ( σ 2 ) p ( τ 2 ) = p ( y | θ, σ 2 ) p ( σ 2 ) p ( θ | µ, τ 2 ) p ( µ ) p ( τ 2 ) So we know that σ 2 is independent of µ and τ 2 . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 8 / 31

  9. Sampling σ 2 Gibbs sampling Sampling σ 2 Recall that ind ∼ N ( θ g , σ 2 ) and p ( σ 2 ) ∝ 1 /σ 2 . y ig Thus, we are in the scenario of normal data with a known mean and unknown variance and the unknown variance has our default prior. Thus, we should know the full conditional is � i =1 ( y ig − θ g ) 2 � � n g 2 , 1 � G σ 2 | · · · ∼ IG n . 2 g =1 To derive the full conditional, use 2 σ 2 ( y ig − θ g ) 2 � 1 i =1 ( σ 2 ) − 1 / 2 exp � n g p ( σ 2 | · · · ) ∝ � G − 1 � g =1 σ 2 � i =1 ( y ig − θ g ) 2 � σ 2 � = ( σ 2 ) − n / 2 − 1 exp � n g − 1 � G g =1 2 � i =1 ( y ig − θ g ) 2 � � n g n 2 , 1 � G which is the kernel of a IG . 2 g =1 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 9 / 31

  10. Sampling µ, τ 2 Sampling µ, τ 2 Recall that ind ∼ N ( µ, τ 2 ) and p ( µ, τ 2 ) ∝ Ca + ( τ ; 0 , C ) . θ g This is a non-standard distribution, but is extremely close a normal model with unknown mean and variance with the standard non-informative prior p ( µ, τ 2 ) ∝ 1 /τ 2 or the conjugate normal-inverse-gamma prior. Here are some options for sampling from this distribution: random-walk Metropolis (in 2 dimensions), independent Metropolis-Hastings using posterior from standard non-informative prior as the proposal, or rejection sampling using posterior from standard non-informative prior as the proposal The posterior under the standard non-informative prior is τ 2 | · · · ∼ Inv- χ 2 ( G − 1 , s 2 θ ) and µ | τ 2 , . . . ∼ N ( θ, τ 2 / G ) � G where θ = 1 g =1 θ g and s 2 G − 1 ( θ g − θ ) 2 . What is the MH ratio? 1 θ = G Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 10 / 31

  11. Sampling µ, τ 2 Summary Markov chain Monte Carlo for normal hierarchical model 1. Sample θ ∼ p ( θ | · · · ): a. For g = 1 , . . . , G , sample θ g ∼ N ( µ g , τ 2 g ). 2. Sample µ, σ 2 , τ 2 : a. Sample σ 2 ∼ IG ( n / 2 , SSE / 2). b. Sample µ, τ 2 using independent Metropolis-Hastings using posterior from standard non-informative prior as the proposal. ind ind ∼ t v ( µ, τ 2 )? What happens if θ g ∼ La ( µ, τ ) or θ g Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 11 / 31

  12. Scale mixtures of normals Scale mixtures of normals Recall that if θ | φ ∼ N ( φ, V ) and φ ∼ N ( m , C ) then θ ∼ N ( m , V + C ) . This is called a location mixture. Now, if θ | φ ∼ N ( m , C φ ) and we assume a mixing distribution for φ , we have a scale mixture. Since the top level distributional assumption is normal, we refer to this as a scale mixture of normals. Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 12 / 31

  13. Scale mixtures of normals t distribution t distribution Let θ | φ ∼ N ( m , φ C ) and φ ∼ IG ( a , b ) then � p ( θ ) = p ( θ | φ ) p ( φ ) d φ √ C ) − 1 / 2 b a φ − 1 / 2 e − ( θ − m ) 2 / 2 φ C φ − ( a +1) e − b /φ d φ � = (2 π Γ ( a ) = (2 π C ) − 1 / 2 b a φ − ( a +1 / 2+1) e − [ b +( θ − m ) 2 / 2 C ] /φ d φ � Γ ( a ) = (2 π C ) − 1 / 2 b a Γ ( a +1 / 2) Γ ( a ) [ b +( θ − m ) 2 / 2 C ] a +1 / 2 � − [2 a +1] / 2 � ( θ − m ) 2 Γ ([2 a +1] / 2) 1 + 1 Γ (2 a / 2) √ = 2 a bC / a 2 a π bC / a Thus θ ∼ t 2 a ( m , bC / a ) i.e. θ has a t distribution with 2 a degrees of freedom, location m , scale bC bC / a , and variance a − 1 . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 13 / 31

  14. Scale mixtures of normals t distribution Hierarchical t distribution Let m = µ , C = 1, a = ν/ 2, and b = ντ 2 / 2, i.e. θ | φ ∼ N ( µ, φ ) and φ ∼ IG ( ν/ 2 , ντ 2 / 2) . Then, we have θ ∼ t ν ( µ, τ 2 ) , i.e. a t distribution with ν degrees of freedom, location µ , and scale τ 2 . Notice that the parameterization has a redundancy between C and a / b , i.e. we could have chosen C = τ 2 , a = ν/ 2, and b = ν/ 2 and we would have obtained the same marginal distribution for θ . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 14 / 31

  15. Scale mixtures of normals t distribution Laplace distribution Let θ | φ ∼ N ( m , φ C 2 ) and φ ∼ Exp (1 / 2 b 2 ) where E [ φ ] = 2 b 2 and Var [ φ ] = 4 b 4 . Then, by an extension of equation (4) in Park and Casella (2008), we have 1 2 Cbe − | θ − m | Cb . p ( θ ) = This is the pdf for a Laplace (double exponential) distribution with location m and scale Cb which we write θ ∼ La ( m , Cb ) . and say θ has a Laplace distribution with location m and scale Cb and E [ θ ] = m and Var [ θ ] = 2[ Cb ] 2 = 2 C 2 b 2 . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 15 / 31

  16. Scale mixtures of normals t distribution Hierarchical Laplace distribution Let m = µ , C = 1, and b = τ i.e. θ | φ ∼ N ( µ, φ ) and φ ∼ Exp (1 / 2 τ 2 ) . Then, we have θ ∼ La ( µ, τ ) , i.e. a Laplace distribution with location µ and scale τ . Notice that the parameterization has a redundancy between C and b , i.e. we could have chosen C = τ and b = 1 and we would have obtained the same marginal distribution for θ . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 16 / 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend