Hierarchical models Dr. Jarad Niemi Iowa State University August - PowerPoint PPT Presentation

Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31

Normal hierarchical model Let ind ∼ N ( θ g , σ 2 ) Y ig for i = 1 , . . . , n g , g = 1 , . . . , G , and � G g =1 n g = n . Now consider the following model assumptions: ind ∼ N ( µ, τ 2 ) θ g ind θ g ∼ La ( µ, τ ) ind ∼ t v ( µ, τ 2 ) θ g ind ∼ πδ 0 + (1 − π ) N ( µ, τ 2 ) θ g ind ∼ πδ 0 + (1 − π ) t v ( µ, τ 2 ) θ g To perform a Bayesian analysis, we need a prior on µ , τ 2 , and (in the case of the discrete mixture) π . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 2 / 31

Gibbs sampling Normal hierarchical model Consider the model ind ∼ N ( θ g , σ 2 ) Y ig ind ∼ N ( µ, τ 2 ) θ g where i = 1 , . . . , n g , g = 1 , . . . , G , and n = � G g =1 n g with prior distribution 1 p ( µ, σ 2 , τ 2 ) = p ( µ ) p ( σ 2 ) p ( τ 2 ) ∝ σ 2 Ca + ( τ ; 0 , C ) . For background on why we are using these priors for the variances, see Gelman (2006) https://projecteuclid.org/euclid.ba/1340371048 : “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)”. Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 3 / 31

Gibbs sampling Multi-step Gibbs sampler for normal hierarchical model Here is a possible Gibbs sampler for this model: For g = 1 , . . . , G , sample θ g ∼ p ( θ g | · · · ). Sample σ 2 ∼ p ( σ 2 | · · · ). Sample µ ∼ p ( µ | · · · ). Sample τ 2 ∼ p ( τ 2 | · · · ). How many steps exist in this Gibbs sampler? G+3? 4? Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 4 / 31

Gibbs sampling 2-Step 2-Step Gibbs sampler for normal hierarchical model Here is a 2-step Gibbs sampler: 1. Sample θ = ( θ 1 , . . . , G ) ∼ p ( θ | · · · ). 2. Sample µ, σ 2 , τ 2 ∼ p ( µ, σ 2 , τ 2 | · · · ). There is stronger theoretical support for 2-step Gibbs sampler, thus, if we can, it is prudent to construct a 2-step Gibbs sampler. Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 5 / 31

Gibbs sampling Sampling θ Sampling θ The full conditional for θ is ∝ p ( θ, µ, σ 2 , τ 2 | y ) p ( θ | · · · ) ∝ p ( y | θ, σ 2 ) p ( θ | µ, τ 2 ) p ( µ, σ 2 , τ 2 ) ∝ p ( y | θ, σ 2 ) p ( θ | µ, τ 2 ) = � G g =1 p ( y g | θ g , σ 2 ) p ( θ g | µ, τ 2 ) where y g = ( y 1 g , . . . , y n g g ). We now know that the θ g are conditionally independent of each other. Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 6 / 31

Gibbs sampling Sampling θ g Sampling θ g The full conditional for θ g is ∝ p ( y g | θ g , σ 2 ) p ( θ g | µ, τ 2 ) p ( θ g | · · · ) = � n g i =1 N ( y ig ; θ g , σ 2 ) N ( θ g ; µ, τ 2 ) Notice that this does not include θ g ′ for any g ′ � = g . This is an alternative way to conclude that the θ g are conditionally independent of each other. Thus θ g | · · · ind ∼ N ( µ g , τ 2 g ) where = [ τ − 2 + n g σ − 2 ] − 1 τ 2 g g [ µτ − 2 + y g n g σ − 2 ] = τ 2 µ g � n g 1 = i =1 y ig . y g n g Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 7 / 31

Sampling µ, σ 2 , τ 2 Gibbs sampling Sampling µ, σ 2 , τ 2 The full conditional for µ, σ 2 , τ 2 is p ( µ, σ 2 , τ 2 | · · · ) ∝ p ( y | θ, σ 2 ) p ( θ | µ, τ 2 ) p ( µ ) p ( σ 2 ) p ( τ 2 ) = p ( y | θ, σ 2 ) p ( σ 2 ) p ( θ | µ, τ 2 ) p ( µ ) p ( τ 2 ) So we know that σ 2 is independent of µ and τ 2 . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 8 / 31

Sampling σ 2 Gibbs sampling Sampling σ 2 Recall that ind ∼ N ( θ g , σ 2 ) and p ( σ 2 ) ∝ 1 /σ 2 . y ig Thus, we are in the scenario of normal data with a known mean and unknown variance and the unknown variance has our default prior. Thus, we should know the full conditional is � i =1 ( y ig − θ g ) 2 � � n g 2 , 1 � G σ 2 | · · · ∼ IG n . 2 g =1 To derive the full conditional, use 2 σ 2 ( y ig − θ g ) 2 � 1 i =1 ( σ 2 ) − 1 / 2 exp � n g p ( σ 2 | · · · ) ∝ � G − 1 � g =1 σ 2 � i =1 ( y ig − θ g ) 2 � σ 2 � = ( σ 2 ) − n / 2 − 1 exp � n g − 1 � G g =1 2 � i =1 ( y ig − θ g ) 2 � � n g n 2 , 1 � G which is the kernel of a IG . 2 g =1 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 9 / 31

Sampling µ, τ 2 Sampling µ, τ 2 Recall that ind ∼ N ( µ, τ 2 ) and p ( µ, τ 2 ) ∝ Ca + ( τ ; 0 , C ) . θ g This is a non-standard distribution, but is extremely close a normal model with unknown mean and variance with the standard non-informative prior p ( µ, τ 2 ) ∝ 1 /τ 2 or the conjugate normal-inverse-gamma prior. Here are some options for sampling from this distribution: random-walk Metropolis (in 2 dimensions), independent Metropolis-Hastings using posterior from standard non-informative prior as the proposal, or rejection sampling using posterior from standard non-informative prior as the proposal The posterior under the standard non-informative prior is τ 2 | · · · ∼ Inv- χ 2 ( G − 1 , s 2 θ ) and µ | τ 2 , . . . ∼ N ( θ, τ 2 / G ) � G where θ = 1 g =1 θ g and s 2 G − 1 ( θ g − θ ) 2 . What is the MH ratio? 1 θ = G Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 10 / 31

Sampling µ, τ 2 Summary Markov chain Monte Carlo for normal hierarchical model 1. Sample θ ∼ p ( θ | · · · ): a. For g = 1 , . . . , G , sample θ g ∼ N ( µ g , τ 2 g ). 2. Sample µ, σ 2 , τ 2 : a. Sample σ 2 ∼ IG ( n / 2 , SSE / 2). b. Sample µ, τ 2 using independent Metropolis-Hastings using posterior from standard non-informative prior as the proposal. ind ind ∼ t v ( µ, τ 2 )? What happens if θ g ∼ La ( µ, τ ) or θ g Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 11 / 31

Scale mixtures of normals Scale mixtures of normals Recall that if θ | φ ∼ N ( φ, V ) and φ ∼ N ( m , C ) then θ ∼ N ( m , V + C ) . This is called a location mixture. Now, if θ | φ ∼ N ( m , C φ ) and we assume a mixing distribution for φ , we have a scale mixture. Since the top level distributional assumption is normal, we refer to this as a scale mixture of normals. Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 12 / 31

Scale mixtures of normals t distribution t distribution Let θ | φ ∼ N ( m , φ C ) and φ ∼ IG ( a , b ) then � p ( θ ) = p ( θ | φ ) p ( φ ) d φ √ C ) − 1 / 2 b a φ − 1 / 2 e − ( θ − m ) 2 / 2 φ C φ − ( a +1) e − b /φ d φ � = (2 π Γ ( a ) = (2 π C ) − 1 / 2 b a φ − ( a +1 / 2+1) e − [ b +( θ − m ) 2 / 2 C ] /φ d φ � Γ ( a ) = (2 π C ) − 1 / 2 b a Γ ( a +1 / 2) Γ ( a ) [ b +( θ − m ) 2 / 2 C ] a +1 / 2 � − [2 a +1] / 2 � ( θ − m ) 2 Γ ([2 a +1] / 2) 1 + 1 Γ (2 a / 2) √ = 2 a bC / a 2 a π bC / a Thus θ ∼ t 2 a ( m , bC / a ) i.e. θ has a t distribution with 2 a degrees of freedom, location m , scale bC bC / a , and variance a − 1 . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 13 / 31

Scale mixtures of normals t distribution Hierarchical t distribution Let m = µ , C = 1, a = ν/ 2, and b = ντ 2 / 2, i.e. θ | φ ∼ N ( µ, φ ) and φ ∼ IG ( ν/ 2 , ντ 2 / 2) . Then, we have θ ∼ t ν ( µ, τ 2 ) , i.e. a t distribution with ν degrees of freedom, location µ , and scale τ 2 . Notice that the parameterization has a redundancy between C and a / b , i.e. we could have chosen C = τ 2 , a = ν/ 2, and b = ν/ 2 and we would have obtained the same marginal distribution for θ . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 14 / 31

Scale mixtures of normals t distribution Laplace distribution Let θ | φ ∼ N ( m , φ C 2 ) and φ ∼ Exp (1 / 2 b 2 ) where E [ φ ] = 2 b 2 and Var [ φ ] = 4 b 4 . Then, by an extension of equation (4) in Park and Casella (2008), we have 1 2 Cbe − | θ − m | Cb . p ( θ ) = This is the pdf for a Laplace (double exponential) distribution with location m and scale Cb which we write θ ∼ La ( m , Cb ) . and say θ has a Laplace distribution with location m and scale Cb and E [ θ ] = m and Var [ θ ] = 2[ Cb ] 2 = 2 C 2 b 2 . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 15 / 31

Scale mixtures of normals t distribution Hierarchical Laplace distribution Let m = µ , C = 1, and b = τ i.e. θ | φ ∼ N ( µ, φ ) and φ ∼ Exp (1 / 2 τ 2 ) . Then, we have θ ∼ La ( µ, τ ) , i.e. a Laplace distribution with location µ and scale τ . Notice that the parameterization has a redundancy between C and b , i.e. we could have chosen C = τ and b = 1 and we would have obtained the same marginal distribution for θ . Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 16 / 31

Hierarchical models Dr. Jarad Niemi Iowa State University August - PowerPoint PPT Presentation

Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let ind N ( g , 2 ) Y ig for i = 1 , . . . , n g , g = 1 , .

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Bayesian hierarchical models Bruno Nicenboim / Shravan Vasishth 2020-03-14 1 Bayesian

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

Hierarchical models Dr. Jarad Niemi STAT 544 - Iowa State University February 21, 2019 Jarad

Hierarchical models (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University February 21, 2019

OLD Paired t-test Richard Erickson Quantitative Ecologist DataCamp Hierarchical and Mixed

Crash course on GLMs Richard Erickson Quantitative Ecologist DataCamp Hierarchical and Mixed

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Hierarchical linear models Dr. Jarad Niemi STAT 544 - Iowa State University April 30, 2019

Estimating Distributional Parameters in Hierarchical Models Introduction: Variability in

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Hierarchical Models Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics

Linear mixed effect model- Birth rates data Richard Erickson Quantitative Ecologist DataCamp

Multilevel Mixed (hierarchical) models Christopher F Baum ECON 8823: Applied Econometrics Boston

Multilevel Mixed (hierarchical) models Christopher F Baum EC 823: Applied Econometrics Boston

!""#$%&'()*%+$),' -.,")/)0%1/$2+' 34'5$%/)/26$2)#'7.&%#+' '

Bayesian Hierarchical Models for parameter inference with missing

Using unsupervised corpus-based methods to build rule-based machine translation systems Felipe

Tonal Analysis Hidden Markov Model Graduate School of Culture Technology, KAIST Juhan Nam

Learning Hierarchical Priors in VAEs Alexej Klushyn, Nutan Chen, Richard Kurle, Botond Cseke,

Classification using Hierarchical Naive Bayes Models HNB workshop HNB workshop p.1/18

What is a hierarchical choice model? Elea McDonnell Feit Instructor DataCamp Marketing

A Spatial Bayesian Hierarchical Model for a Precipitation Return Levels Map Daniel Cooley 1 , 2

Hierarchical models Dr. Jarad Niemi Iowa State University August - PowerPoint PPT Presentation

Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let ind N ( g , 2 ) Y ig for i = 1 , . . . , n g , g = 1 , .

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Bayesian hierarchical models Bruno Nicenboim / Shravan Vasishth 2020-03-14 1 Bayesian

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

Hierarchical models Dr. Jarad Niemi STAT 544 - Iowa State University February 21, 2019 Jarad

Hierarchical models (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University February 21, 2019

OLD Paired t-test Richard Erickson Quantitative Ecologist DataCamp Hierarchical and Mixed

Crash course on GLMs Richard Erickson Quantitative Ecologist DataCamp Hierarchical and Mixed

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Hierarchical linear models Dr. Jarad Niemi STAT 544 - Iowa State University April 30, 2019

Estimating Distributional Parameters in Hierarchical Models Introduction: Variability in

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Hierarchical Models Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics

Linear mixed effect model- Birth rates data Richard Erickson Quantitative Ecologist DataCamp

Multilevel Mixed (hierarchical) models Christopher F Baum ECON 8823: Applied Econometrics Boston

Multilevel Mixed (hierarchical) models Christopher F Baum EC 823: Applied Econometrics Boston

!&quot;&quot;#$%&amp;'()*%+$),' -.,&quot;)/)0%1/$2+' 34'5$%/)/26$2)#'7.&amp;%#+' '

Bayesian Hierarchical Models for parameter inference with missing

Using unsupervised corpus-based methods to build rule-based machine translation systems Felipe

Tonal Analysis Hidden Markov Model Graduate School of Culture Technology, KAIST Juhan Nam

Learning Hierarchical Priors in VAEs Alexej Klushyn, Nutan Chen, Richard Kurle, Botond Cseke,

Classification using Hierarchical Naive Bayes Models HNB workshop HNB workshop p.1/18

What is a hierarchical choice model? Elea McDonnell Feit Instructor DataCamp Marketing

A Spatial Bayesian Hierarchical Model for a Precipitation Return Levels Map Daniel Cooley 1 , 2

!""#$%&'()*%+$),' -.,")/)0%1/$2+' 34'5$%/)/26$2)#'7.&%#+' '