The Matrix- F Prior for Estimating and Testing Covariance Matrices - PowerPoint PPT Presentation

The Matrix- F Prior for Estimating and Testing Covariance Matrices Joris Mulder & Luis R. Pericchi Department of Methodology & Statistics Tilburg University, the Netherlands CWI talk 2018, Amsterdam, 5-4-18 Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 1 / 44

Outline Problems with inverse gamma priors 1 Introducing the univariate F and matrix- F prior 2 The matrix- F prior in regularized regression 3 The matrix- F prior for testing covariance matrices 4 Testing a precise hypothesis Testing inequality constrained hypotheses The matrix- F prior for modeling random effects covariance matrices 5 Summary 6 Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 2 / 44

Problems with inverse gamma priors Outline Problems with inverse gamma priors 1 Introducing the univariate F and matrix- F prior 2 The matrix- F prior in regularized regression 3 The matrix- F prior for testing covariance matrices 4 Testing a precise hypothesis Testing inequality constrained hypotheses The matrix- F prior for modeling random effects covariance matrices 5 Summary 6 Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 3 / 44

Problems with inverse gamma priors Modeling variance components The inverse gamma prior is the default choice for modeling variance components, σ 2 ∼ IG ( α, β ) , with prior shape parameter α and prior scale parameter β . Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 4 / 44

Problems with inverse gamma priors Modeling variance components The inverse gamma prior is the default choice for modeling variance components, σ 2 ∼ IG ( α, β ) , with prior shape parameter α and prior scale parameter β . The inverse gamma prior is conjugate for a variance of a normal population. Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 4 / 44

Problems with inverse gamma priors Modeling variance components The inverse gamma prior is the default choice for modeling variance components, σ 2 ∼ IG ( α, β ) , with prior shape parameter α and prior scale parameter β . The inverse gamma prior is conjugate for a variance of a normal population. Default choice: α = β = ǫ > 0, with ǫ small, e.g., . 001. Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 4 / 44

Problems with inverse gamma priors Modeling variance components The inverse gamma prior is the default choice for modeling variance components, σ 2 ∼ IG ( α, β ) , with prior shape parameter α and prior scale parameter β . The inverse gamma prior is conjugate for a variance of a normal population. Default choice: α = β = ǫ > 0, with ǫ small, e.g., . 001. The inverse gamma prior is a proper neighboring prior of the popular Jeffreys prior σ − 2 . Let p N ( σ 2 | x ) σ − 2 f ( x | σ 2 ) ∝ p ( σ 2 | x ) IG ( σ 2 ; ǫ, ǫ ) f ( x | σ 2 ) , ∝ then p ( σ 2 | x ) → p N ( σ 2 | x ) , as ǫ → 0 . Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 4 / 44

Problems with inverse gamma priors Problems with the inverse gamma prior Surprisingly, the inverse gamma can unduly be highly informative as a prior for the random effects variance in a hierarchical model, N ( µ j , σ 2 ) i -th observation in group j : ∼ y ij N ( µ, τ 2 ) . random mean of group j : ∼ µ j Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 5 / 44

Problems with inverse gamma priors Problems with the inverse gamma prior Surprisingly, the inverse gamma can unduly be highly informative as a prior for the random effects variance in a hierarchical model, N ( µ j , σ 2 ) i -th observation in group j : ∼ y ij N ( µ, τ 2 ) . random mean of group j : ∼ µ j The 8 schools example of Gelman (2006) showed the effect of the inverse gamma prior on τ 2 : 8 schools: posterior on τ given 8 schools: posterior on τ given 8 schools: posterior on τ given inv−gamma (1, 1) prior on τ 2 inv−gamma (.001, .001) prior on τ 2 uniform prior on τ 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 τ τ τ Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 5 / 44

Introducing the univariate F and matrix- F prior Outline Problems with inverse gamma priors 1 Introducing the univariate F and matrix- F prior 2 The matrix- F prior in regularized regression 3 The matrix- F prior for testing covariance matrices 4 Testing a precise hypothesis Testing inequality constrained hypotheses The matrix- F prior for modeling random effects covariance matrices 5 Summary 6 Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 6 / 44

Introducing the univariate F and matrix- F prior The F prior The issue of the inverse gamma prior can be resolved by mixing the scale parameter with a gamma distribution. This results in a univariate F prior: � F ( σ 2 ; ν, δ, b ) = IG ( σ 2 ; δ 2 , ψ 2 ) × G ( ψ 2 ; ν 2 , b − 1 ) d ψ 2 , with degrees of freedom parameters ν and δ , and scale parameter b . Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 7 / 44

Introducing the univariate F and matrix- F prior The F prior The issue of the inverse gamma prior can be resolved by mixing the scale parameter with a gamma distribution. This results in a univariate F prior: � F ( σ 2 ; ν, δ, b ) = IG ( σ 2 ; δ 2 , ψ 2 ) × G ( ψ 2 ; ν 2 , b − 1 ) d ψ 2 , with degrees of freedom parameters ν and δ , and scale parameter b . Mixing a hyperparameter with another distribution is a way to robustify a prior. Example: The Student t prior is known to be more robust than a normal prior for regression analysis. The Student t prior is obtained by mixing the variance of a normal prior: � N ( β ; µ, σ 2 ) IG ( σ 2 ; ν 2 ) d σ 2 . t ( β ; µ, γ, ν ) = 2 , γ Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 7 / 44

τ τ τ τ τ τ Introducing the univariate F and matrix- F prior The F prior Setting ν = 1, the standard deviation has a half- t distribution: � δ +1 2Γ( δ +1 ) � 1 + σ 2 2 2 p ( σ | ν = 1 , δ, b ) = . √ Γ( δ b 2 ) b π Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 8 / 44

Introducing the univariate F and matrix- F prior The F prior Setting ν = 1, the standard deviation has a half- t distribution: � δ +1 2Γ( δ +1 ) � 1 + σ 2 2 2 p ( σ | ν = 1 , δ, b ) = . √ Γ( δ b 2 ) b π The F prior results in more desirable behavior than the inverse gamma prior for school data (Gelman, 2006). 3 schools: posterior on τ given 3 schools: posterior on τ given F(1,1,25)-prior on τ 2 uniform prior on τ 0 50 100 150 200 0 50 100 150 200 τ τ Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 8 / 44

Introducing the univariate F and matrix- F prior The matrix- F prior In a multivariate setting, the inverse Wishart prior is the default choice for a k × k covariance matrix. Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 9 / 44

Introducing the univariate F and matrix- F prior The matrix- F prior In a multivariate setting, the inverse Wishart prior is the default choice for a k × k covariance matrix. The inverse Wishart prior is a matrix generalization of the inverse gamma prior, and thus has similar issues. Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 9 / 44

Introducing the univariate F and matrix- F prior The matrix- F prior In a multivariate setting, the inverse Wishart prior is the default choice for a k × k covariance matrix. The inverse Wishart prior is a matrix generalization of the inverse gamma prior, and thus has similar issues. We propose to robustify the inverse Wishart by mixing the scale matrix with a Wishart distribution : � F ( Σ ; ν, δ, S ) = IW ( Σ ; δ + k − 1 , Ψ ) × W ( Ψ ; ν, B ) d Ψ , where ν controls the behavior near the origin of | Σ | , δ controls the behavior in the tails of | Σ | , and B is a scale matrix. Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 9 / 44

Introducing the univariate F and matrix- F prior The matrix- F prior In a multivariate setting, the inverse Wishart prior is the default choice for a k × k covariance matrix. The inverse Wishart prior is a matrix generalization of the inverse gamma prior, and thus has similar issues. We propose to robustify the inverse Wishart by mixing the scale matrix with a Wishart distribution : � F ( Σ ; ν, δ, S ) = IW ( Σ ; δ + k − 1 , Ψ ) × W ( Ψ ; ν, B ) d Ψ , where ν controls the behavior near the origin of | Σ | , δ controls the behavior in the tails of | Σ | , and B is a scale matrix. Setting S = I k yields the standard matrix- F distribution (Dawid, 1981). Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 9 / 44

Introducing the univariate F and matrix- F prior Properties of the matrix- F distribution Reciprocity: Σ ∼ F ( ν, δ, S ) ⇒ Σ − 1 ∼ F ( δ + k − 1 , ν − k + 1 , S − 1 ) Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 10 / 44

Introducing the univariate F and matrix- F prior Properties of the matrix- F distribution Reciprocity: Σ ∼ F ( ν, δ, S ) ⇒ Σ − 1 ∼ F ( δ + k − 1 , ν − k + 1 , S − 1 ) Invariant under marginalization: Σ ∼ F ( ν, δ, S ) ⇒ Σ 11 ∼ F ( ν, δ, S 11 ) Mulder (Tilburg University) The Matrix- F Prior CWI, Amsterdam 10 / 44

The Matrix- F Prior for Estimating and Testing Covariance Matrices - PowerPoint PPT Presentation

The Matrix- F Prior for Estimating and Testing Covariance Matrices Joris Mulder & Luis R. Pericchi Department of Methodology & Statistics Tilburg University, the Netherlands CWI talk 2018, Amsterdam, 5-4-18 Mulder (Tilburg University)

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Methods for estimating the diagonal of matrix functions Jesse Laeuchli Andreas Stathopoulos CSC

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

Week 1: Introduc/on Precision and covariance matrix 2 1.2C

Matrix Algebra of Sample Statistics James H. Steiger Department of Psychology and Human

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

The Riemannian Potato: an Automatic and Adaptive Artifact Detection Method for Online Experiments

Image Processing using Variational Data Assimilation Methods Dominique B er eziat,

notation a real variable ( Pf ) 2 0 Pf the prevision of f P : a prevision

u i u i , i i (1) N 1 k k 1 K k 1 1 K To estimate the

General Relativity without paradigm of space-time covariance: sensible quantum gravity and

Blind Chip Rate Estimation in Multirate Blind Chip Rate Estimation in Multirate CDMA

Statistical Methods for Multivariate Spatial and Spatial-Temporal Processes Mikyoung Jun

RAPIDS: PLATFORM INSIDE AND OUT Joshua Patterson 3-19-2019 RAPIDS End to End Accelerate GPU Data

The Matrix- F Prior for Estimating and Testing Covariance Matrices - PowerPoint PPT Presentation

The Matrix- F Prior for Estimating and Testing Covariance Matrices Joris Mulder & Luis R. Pericchi Department of Methodology & Statistics Tilburg University, the Netherlands CWI talk 2018, Amsterdam, 5-4-18 Mulder (Tilburg University)

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Methods for estimating the diagonal of matrix functions Jesse Laeuchli Andreas Stathopoulos CSC

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

Week 1: Introduc/on Precision and covariance matrix 2 1.2C

Matrix Algebra of Sample Statistics James H. Steiger Department of Psychology and Human

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &amp;

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

The Riemannian Potato: an Automatic and Adaptive Artifact Detection Method for Online Experiments

Image Processing using Variational Data Assimilation Methods Dominique B er eziat,

notation a real variable ( Pf ) 2 0 Pf the prevision of f P : a prevision

u i u i , i i (1) N 1 k k 1 K k 1 1 K To estimate the

General Relativity without paradigm of space-time covariance: sensible quantum gravity and

Blind Chip Rate Estimation in Multirate Blind Chip Rate Estimation in Multirate CDMA

Statistical Methods for Multivariate Spatial and Spatial-Temporal Processes Mikyoung Jun

RAPIDS: PLATFORM INSIDE AND OUT Joshua Patterson 3-19-2019 RAPIDS End to End Accelerate GPU Data

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &