Exploiting sparsity in model matrices Douglas Bates and Martin - PowerPoint PPT Presentation

Exploiting sparsity in model matrices Douglas Bates and Martin Maechler Department of Statistics University of Wisconsin – Madison U.S.A. Seminar f¨ ur Statistik ETH Zurich Switzerland (bates|maechler)@R-project.org (R-Core) DSC2009, Copenhagen July 14, 2009 Doug Bates, Martin Maechler (R Core) Sparse model matrices DSC2009, Copenhagen 1 / 45

Outline Model matrices with many columns 1 Applications to linear mixed models 2 The penalized least squares problem 3 Using the sparse Cholesky for mixed models 4 Evaluating the likelihood 5 More general model forms 6 Who’s the best liked prof at ETH? 7 Doug Bates, Martin Maechler (R Core) Sparse model matrices DSC2009, Copenhagen 2 / 45

In case it all starts to blur Model matrices with many columns typically have some degree of sparsity. Regularization is important in combination with many columns. Updating a factorization is much more efficient when optimizing a regularization parameter. The complexity of the factorization depends on the order of the columns. Need to consider carefully the parameterization (contrasts) and the order of terms. Mixed-effects models are regularization problems that benefit greatly from sparse matrix methods. They are implicit in lme4 . A new function sparse.model.matrix() is available in the Matrix package for non-implicit sparse model matrix construction. These slides are available at http://Matrix.R-forge.R-project.org/slides/ . Doug Bates, Martin Maechler (R Core) Sparse model matrices DSC2009, Copenhagen 4 / 45

Model matrices and sparsity In statistical models the effects of the covariates on the response are often expressed, directly or indirectly, through model matrices . A common idiom in a model fitting function using a formula argument is a call to model.frame() followed by a call to model.matrix() . Many users feel frustrated that R does not transparently handle very large model matrices, failing to realize that a naive decomposition of an n × p dense model matrix requires O ( np 2 ) flops. Large values of p are thus particularly problematic. Frequently, large values of p are a consequence of incorporating factors with a large number of levels in the model. A factor with k levels generates at least k − 1 columns as do any interactions with such a factor. The model matrix columns are generated from the indicator columns for the factor, which are very sparse. The greater the number of levels, the more sparse the indicators become. Doug Bates, Martin Maechler (R Core) Sparse model matrices DSC2009, Copenhagen 5 / 45

Sparse model matrices and regularization As stated at useR!2009, large, sparse model matrices usually require some amount of regularization for computationally feasible evaluation of coefficients and fitted values. Frequently the regularization parameter(s) are chosen to optimize a criterion, requiring evaluation of the criterion for many different trial values of the regularization parameter(s). Usually the repeated evaluations of the criterion require decomposition of a matrix with a constant structure (including the positions of the non-zeros) and varying numeric values. The sparse Cholesky factorization is ideally suited to problems requiring many evaluations of a decomposition of a matrix with constant structure and varying numeric values. Doug Bates, Martin Maechler (R Core) Sparse model matrices DSC2009, Copenhagen 6 / 45

The sparse Cholesky factorization The Matrix package for R provides sparse matrix methods, including the sparse Cholesky, by interfacing to Tim Davis’ cholmod library of C functions. This C library provides separate functions for the symbolic factorization, including determining a fill-reducing permutation , and the numeric factorization. The symbolic factorization determines the positions of the non-zeros in the result. The numeric factorization simply evaluates the numeric values. Generally it is much faster than the symbolic factorization. There are many beautiful mathematical results associated with sparse matrix operations. See Tim Davis’ 2007 SIAM book for some of these results. Doug Bates, Martin Maechler (R Core) Sparse model matrices DSC2009, Copenhagen 7 / 45

Variations of the sparse Cholesky In the Matrix package we use the formulation from the cholmod C library. Sparse matrices may be entered in the triplet formulation but operations are usually performed on the compressed-column representation (the CsparseMatrix class). If A is a positive-definite symmetric sparse matrix, the sparse Cholesky factorization consists of a permutation matrix P and a lower triangular matrix L such that LL ⊺ = P AP ⊺ . Note that L is the left factor (statisticians often consider the right factor, R = L ⊺ ). The permutation P is stored (as a vector) within the factorization. There are two variations: the LDL factorization, where the lhs is LDL ⊺ ( L unit lower triangular; D diagonal), and a supernodal LL ⊺ decomposition, which is a sparse/dense hybrid that collapses columns with similar structure to a “supernode” of the graph representation. Doug Bates, Martin Maechler (R Core) Sparse model matrices DSC2009, Copenhagen 8 / 45

Definition of linear mixed models A linear mixed model consists of two random variables: the n -dimensional response, Y , and the q -dimensional random effects, B . We observe the value, y , of Y ; we do not observe the value of B . The probability model defines one conditional and one unconditional distribution � � Zb + Xβ , σ 2 I n ( Y | B = b ) ∼ N , B ∼ N ( 0 , Σ θ ) , which depend on parameters β , θ and σ . Although the dimension of Σ θ can be huge, the dimension of the variance-component parameter vector , θ , is usually very small. The model specification determines the n × q model matrix Z (generated from indicator columns and typically very sparse), the n × p model matrix X , and the way in which θ generates Σ θ . Doug Bates, Martin Maechler (R Core) Sparse model matrices DSC2009, Copenhagen 10 / 45

Exploiting sparsity in model matrices Douglas Bates and Martin - PowerPoint PPT Presentation

Exploiting sparsity in model matrices Douglas Bates and Martin Maechler Department of Statistics University of Wisconsin Madison U.S.A. Seminar f ur Statistik ETH Zurich Switzerland (bates|maechler)@R-project.org (R-Core) DSC2009,

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Transformations and Matrices Transformations I Transformations are functions Matrices

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

MODEL VIEW AND PERSPECTIVE MATRICES OUTLINE The Three Main Matrices Model Matrix View

PSL: Exploiting Parallelism, Sparsity and Locality to Accelerate Matrix Factorization on x86

Data Mining / Intelligent Data Analysis Christian Borgelt Dept. of Mathematics / Dept. of

Estimation in Mixed Models with Dirichlet Process Random Effects Both Sides of the Story George

STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

Reliable Modeling Using Interval Analysis: Chemical Engineering Applications Mark A. Stadtherr

Mixed-Integer Nonlinear Programming Ksenia Bestuzheva Zuse Institute Berlin CO@Work 2020

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

Solving bitvectors with MCSAT: explanations from bits and pieces Stphane Graham-Lengrand, Dejan

Exploiting sparsity in model matrices Douglas Bates and Martin - PowerPoint PPT Presentation

Exploiting sparsity in model matrices Douglas Bates and Martin Maechler Department of Statistics University of Wisconsin Madison U.S.A. Seminar f ur Statistik ETH Zurich Switzerland (bates|maechler)@R-project.org (R-Core) DSC2009,

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

RegML 2016 Class 6 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT June 30, 2016 Exploiting

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

RegML2017@SIMULA Oslo Class 7 Structured sparsity Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Transformations and Matrices Transformations I Transformations are functions Matrices

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal &amp; spectral matrices) by

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

MODEL VIEW AND PERSPECTIVE MATRICES OUTLINE The Three Main Matrices Model Matrix View

PSL: Exploiting Parallelism, Sparsity and Locality to Accelerate Matrix Factorization on x86

Data Mining / Intelligent Data Analysis Christian Borgelt Dept. of Mathematics / Dept. of

Estimation in Mixed Models with Dirichlet Process Random Effects Both Sides of the Story George

STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

Reliable Modeling Using Interval Analysis: Chemical Engineering Applications Mark A. Stadtherr

Mixed-Integer Nonlinear Programming Ksenia Bestuzheva Zuse Institute Berlin CO@Work 2020

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

Solving bitvectors with MCSAT: explanations from bits and pieces Stphane Graham-Lengrand, Dejan

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by