Complexity and optimization of the Gibbs Sampler for multilevel - PowerPoint PPT Presentation

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Complexity and optimization of the Gibbs Sampler for multilevel linear models Giacomo Zanella joint work with Omiros Papaspiliopoulos and Gareth Roberts Department of Decision Sciences, BIDSA and IGIER Bocconi University AUEB 3rd May 2018

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Context: Bayesian multilevel models • Complex models built via combination of local and simpler distributions • Extremely powerful and successful paradigm: flexibility, interpretability, borrowing of information,. . . 1 • Naturally lend themselves to Gibbs Sampling schemes where you update a subset of Figure: Hierarchical structure variables conditional on the others induced by a multilevel model 1 Gelman&Hill (2006) Data analysis using regression and multilevel/hierarchical models. Cambridge.U.Press Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 1 / 34

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Complexity&optimization of MCMC for multilevel models Aim: improve theoretical understanding and methodological guidance for MCMC on multilevel models. This talk: • consider the Gibbs Sampler and multilevel Gaussian models • explore the interaction between model structure and algorithms’ behavior • Provide quantitative theory with methodological implications , e.g. 1. complexity statements 2. guidance on optimal implementations NB: large literature on MCMC theory deals with generic target distributions, here we consider structured data. Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 2 / 34

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Overview of the talk 1. Introduction 2. Nested linear models • Introduce multigrid decomposition • Hierarchical ordering Figure: Nested effects models • Reparametrizations 3. Crossed effect models • Multigrid analysis • Recovering scalability • Effect of sparsity 4. Conclusions and future work Figure: Crossed effects models Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 3 / 34

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Nested linear models 3-level nested model: Likelihood: y ijk | µ, a , b ∼ N ( µ + a i + b ij , τ − 1 ) i ∈ [ I ] , j ∈ [ J ] , k ∈ [ K ] e iid iid ∼ N (0 , τ − 1 ∼ N (0 , τ − 1 Prior: b ij b ) , a i ) , p ( µ ) ∝ 1 . a Standard Gibbs Sampler for ( µ, a , b ) | y 1. Sample µ ∼ p ( µ | a , b , y ) 2. Sample a i ∼ p ( a i | µ, b , y ) for all i 3. Sample b ij ∼ p ( b ij | µ, a , y ) for all i , j Question: what is the computational complexity of GS? NB: we are considering the fixed-variance scenario. Typically variance parameters are given a prior distribution and GS is embedded in a scheme updating also those. Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 4 / 34

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Complexity of MCMC For iterative sampling algorithms like MCMC Cost alg = Cost iter · T mix Cost iter typically easy to compute. For Gibbs often Cost iter = O ( N ) Technically challenging part: quantify T mix . We seek algorithms with good scalability, e.g. Cost alg ≤ O ( N ) Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 5 / 34

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Approach and main technical tool There are different notions of T mix . In this talk, we will consider the following. Definition: The rate of convergence of a Markov chain X 1 , X 2 . . . is the smallest number ρ such that �L ( X t | X 0 = x ) − π � ≤ C ( x ) ρ t The rate of convergence can be interpreted in terms of convergence time as 1 T mix = 1 − ρ Intuition: T mix ≈ number of iterations needed to get each iid sample. Example: ρ = 0 . 999 ⇒ T mix ≈ 1000 Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 6 / 34

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Gaussian Gibbs Samplers Many proofs of ρ < 1 (i.e. geometric ergodicity) under mild assumptions. However, computing ρ exactly (or even bounding it) is very difficult in practice! An important exception is given by Gaussian autoregressions. A Gibbs Sampler targeting N (0 , Σ) becomes a simple AR(1) process X t = BX t − 1 + noise where B is an explicit function of Σ. In this context, the Gibbs Sampler rate of convergence coincide with the largest eigenvalue of B , ρ ( B ). 2 3 Issue in practice is the high-dimensionality of B , which equals the number of parameters p . 2 Amit (1996) Convergence properties of the Gibbs Sampler for perturbations of Gaussians.Ann.Statist. 3 Roberts&Sahu(1997)Updating schemes, correlation structure, blocking and parameterization for the Gibbs sampler. JRSS-B Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 7 / 34

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Back to nested models Model: y ijk | µ, a , b ∼ N ( µ + a i + b ij , τ − 1 ) e MCMC: the Markov chain (( µ, a , b )( t )) ∞ t =0 induced by the Gibbs Sampler is a Gaussian auto-regression However, it is high-dimensional (1+I+IJ). Basic idea: find a decomposition of ( µ, a , b )( t ) into easier and lower-dimensional chains that allows direct analysis Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 8 / 34

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Multigrid decomposition Map ( µ, a , b ) �→ ( δ (0) , δ (1) , δ (2) ) by 1. decomposing ( µ, a , b ) into residuals at different levels of granularity: b ij = ¯ b + (¯ b i − ¯ b ) + ( b ij − ¯ b i ) = δ (0) b + δ (1) b i + δ (2) b ij = δ (0) a + δ (1) a i a i = ¯ a + ( a i − ¯ a ) = δ (0) µ µ = µ ¯ ¯ a = 1 b = 1 b i = 1 � � � where ¯ i a i , ij b ij and j b ij . I IJ J 2. re-arrange terms and consider δ (0) = ( δ (0) µ, δ (0) a , δ (0) b ) ∈ R 3 δ (1) = ( δ (1) a i , δ (1) b i ) i ∈ R 2 I δ (2) = ( δ (2) b ij ) ij ∈ R IJ Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 9 / 34

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Theorem (Multigrid decomposition of GS) Let (( µ, a , b )( t )) ∞ t =0 be the Markov chain generated by the Gibbs Sampler. Then δ (0) ( t ) , δ (1) ( t ) and δ (2) ( t ) are three independent Markov chains. Corollary: The mixing time of GS is T gibbs = max { T ( δ (0) ) , T ( δ (1) ) , T ( δ (2) ) } Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 10 / 34

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Target decomposition � = MCMC decomposition Toy example ( x , y ) bivariate gaussian with correlation ρ . Then: • x and z = y − ρ x are independent r.v.s under the target, but • the stochastic processes x ( t ) and z ( t ) induced by the Gibbs Sampler are not independent Markov chains. Cross−correlation 0.4 0.2 0.0 −20 −10 0 10 20 Lag Figure: Cross correlation between x ( t ) and z ( t ) For crossed (and nested) random effect models the multigrid decomposition for MCMC has to do with model structure. Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 11 / 34

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Multigrid decomposition - Nested model case Theorem (Hierarchical ordering of mixing times) T ( δ (0) ) ≥ T ( δ (1) ) ≥ T ( δ (2) ) ⇒ convergence behavior of GS is monotonic with granularity (coarsest=slowest) Corollary τ e T gibbs = T ( δ (0) ) = 1 + JK min { τ a , J τ b } Therefore Cost gibbs = O ( JK · N ) ⇒ mixing deteriorates as model/data size increase and total cost is super-linear! Giacomo Zanella (Bocconi University) Complexity and optimization of the Gibbs Sampler for multilevel linear models 3/05/2018 12 / 34

Complexity and optimization of the Gibbs Sampler for multilevel - PowerPoint PPT Presentation

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Complexity and optimization of the Gibbs Sampler for multilevel linear models Giacomo Zanella joint work with Omiros Papaspiliopoulos

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State)

CSE 527 Lecture 9 The Gibbs Sampler Talk Today Zasha Weinberg Combi HSB K-069, 1:30

The Gibbs Sampler CSE 527 Lecture 9 Lawrence, et al. Detecting Subtle Sequence

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Sampling and Reporting for Sampler 1 and 2 Certification Sampling and Reporting for Sampler 1 and 2

PM Sampler Placement and Sampler Errors! Why should Regulatory and Agricultural Industries Care?

Factors of Gibbs measures on subshifts What is a Gibbs measure? Two-ish definitions Equivalence

College P Planning N Night GIBBS GIBBS HIGH IGH SCHOOL SC SCHO HOOL COUNSE SELING OFFICE

A Practical Implementation of the Gibbs Sampler for Mixture of Distributions: Application to the

Just Another Gibbs Sampler: JAGS Inglis, A., Ahmed, A., Wundervald, B. and Prado, E. PhD students

Felisa J. V azquez-Abad and Lachlan L. H. Andrew D epartement dinformatique et recherche

Markov chain Monte Carlo (MCMC) methods Gibbs Sampler Example 12 (Matlab) Consider again

CSE 527 Lecture 10 More on the Gibbs Sampler Projects see web Implementation or

Gibbs Sampling Biostatistics 615/815 Lecture 21: . . 1 / 29 . Inference Implementation

2 Introduction Topics To Cover Review of Sampler Design How is sampling inaccurate

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei University of Washington

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

Standards for Knowledge Domain Representation Knowledge domain ontologies Existing

Advanced Simulation - Lecture 5 George Deligiannidis February 1st, 2016 Irreducibility and

Probabilistic & Unsupervised Learning Sampling Methods Maneesh Sahani

The Software Package A NTS InFields A NTS InFields is a Software Package for Simulation and

Lecture 12 The remaining samples x 1 ,,x S is an approximation of p*(x) Notation D = ( x

STAT 339 Hidden Markov Models III 21 April 2017 Bayesian Estimation / Model Averaging Outline

Complexity and optimization of the Gibbs Sampler for multilevel - PowerPoint PPT Presentation

Introduction Multigrid decomposition - Nested Crossed Random Effects Simulations Conclusion and future work Complexity and optimization of the Gibbs Sampler for multilevel linear models Giacomo Zanella joint work with Omiros Papaspiliopoulos

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State)

CSE 527 Lecture 9 The Gibbs Sampler Talk Today Zasha Weinberg Combi HSB K-069, 1:30

The Gibbs Sampler CSE 527 Lecture 9 Lawrence, et al. Detecting Subtle Sequence

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Sampling and Reporting for Sampler 1 and 2 Certification Sampling and Reporting for Sampler 1 and 2

PM Sampler Placement and Sampler Errors! Why should Regulatory and Agricultural Industries Care?

Factors of Gibbs measures on subshifts What is a Gibbs measure? Two-ish definitions Equivalence

College P Planning N Night GIBBS GIBBS HIGH IGH SCHOOL SC SCHO HOOL COUNSE SELING OFFICE

A Practical Implementation of the Gibbs Sampler for Mixture of Distributions: Application to the

Just Another Gibbs Sampler: JAGS Inglis, A., Ahmed, A., Wundervald, B. and Prado, E. PhD students

Felisa J. V azquez-Abad and Lachlan L. H. Andrew D epartement dinformatique et recherche

Markov chain Monte Carlo (MCMC) methods Gibbs Sampler Example 12 (Matlab) Consider again

CSE 527 Lecture 10 More on the Gibbs Sampler Projects see web Implementation or

Gibbs Sampling Biostatistics 615/815 Lecture 21: . . 1 / 29 . Inference Implementation

2 Introduction Topics To Cover Review of Sampler Design How is sampling inaccurate

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

Gibbs Sampling from -Determinantal Point Processes Alireza Rezaei University of Washington

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

Standards for Knowledge Domain Representation Knowledge domain ontologies Existing

Advanced Simulation - Lecture 5 George Deligiannidis February 1st, 2016 Irreducibility and

Probabilistic &amp; Unsupervised Learning Sampling Methods Maneesh Sahani

The Software Package A NTS InFields A NTS InFields is a Software Package for Simulation and

Lecture 12 The remaining samples x 1 ,,x S is an approximation of p*(x) Notation D = ( x

STAT 339 Hidden Markov Models III 21 April 2017 Bayesian Estimation / Model Averaging Outline

Probabilistic & Unsupervised Learning Sampling Methods Maneesh Sahani