The Matrix- F Prior for Estimating and Testing Covariance Matrices - - PowerPoint PPT Presentation

the matrix f prior for estimating and testing covariance
SMART_READER_LITE
LIVE PREVIEW

The Matrix- F Prior for Estimating and Testing Covariance Matrices - - PowerPoint PPT Presentation

The Matrix- F Prior for Estimating and Testing Covariance Matrices Joris Mulder & Luis R. Pericchi Department of Methodology & Statistics Tilburg University, the Netherlands CWI talk 2018, Amsterdam, 5-4-18 Mulder (Tilburg University)


slide-1
SLIDE 1

The Matrix-F Prior for Estimating and Testing Covariance Matrices

Joris Mulder & Luis R. Pericchi

Department of Methodology & Statistics Tilburg University, the Netherlands CWI talk 2018, Amsterdam, 5-4-18

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 1 / 44

slide-2
SLIDE 2

Outline

1

Problems with inverse gamma priors

2

Introducing the univariate F and matrix-F prior

3

The matrix-F prior in regularized regression

4

The matrix-F prior for testing covariance matrices Testing a precise hypothesis Testing inequality constrained hypotheses

5

The matrix-F prior for modeling random effects covariance matrices

6

Summary

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 2 / 44

slide-3
SLIDE 3

Problems with inverse gamma priors

Outline

1

Problems with inverse gamma priors

2

Introducing the univariate F and matrix-F prior

3

The matrix-F prior in regularized regression

4

The matrix-F prior for testing covariance matrices Testing a precise hypothesis Testing inequality constrained hypotheses

5

The matrix-F prior for modeling random effects covariance matrices

6

Summary

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 3 / 44

slide-4
SLIDE 4

Problems with inverse gamma priors

Modeling variance components

The inverse gamma prior is the default choice for modeling variance components, σ2 ∼ IG(α, β), with prior shape parameter α and prior scale parameter β.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 4 / 44

slide-5
SLIDE 5

Problems with inverse gamma priors

Modeling variance components

The inverse gamma prior is the default choice for modeling variance components, σ2 ∼ IG(α, β), with prior shape parameter α and prior scale parameter β. The inverse gamma prior is conjugate for a variance of a normal population.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 4 / 44

slide-6
SLIDE 6

Problems with inverse gamma priors

Modeling variance components

The inverse gamma prior is the default choice for modeling variance components, σ2 ∼ IG(α, β), with prior shape parameter α and prior scale parameter β. The inverse gamma prior is conjugate for a variance of a normal population. Default choice: α = β = ǫ > 0, with ǫ small, e.g., .001.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 4 / 44

slide-7
SLIDE 7

Problems with inverse gamma priors

Modeling variance components

The inverse gamma prior is the default choice for modeling variance components, σ2 ∼ IG(α, β), with prior shape parameter α and prior scale parameter β. The inverse gamma prior is conjugate for a variance of a normal population. Default choice: α = β = ǫ > 0, with ǫ small, e.g., .001. The inverse gamma prior is a proper neighboring prior of the popular Jeffreys prior σ−2. Let pN(σ2|x) ∝ σ−2f (x|σ2) p(σ2|x) ∝ IG(σ2; ǫ, ǫ)f (x|σ2), then p(σ2|x) → pN(σ2|x), as ǫ → 0.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 4 / 44

slide-8
SLIDE 8

Problems with inverse gamma priors

Problems with the inverse gamma prior

Surprisingly, the inverse gamma can unduly be highly informative as a prior for the random effects variance in a hierarchical model, i-th observation in group j: yij ∼ N(µj, σ2) random mean of group j: µj ∼ N(µ, τ 2).

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 5 / 44

slide-9
SLIDE 9

Problems with inverse gamma priors

Problems with the inverse gamma prior

Surprisingly, the inverse gamma can unduly be highly informative as a prior for the random effects variance in a hierarchical model, i-th observation in group j: yij ∼ N(µj, σ2) random mean of group j: µj ∼ N(µ, τ 2). The 8 schools example of Gelman (2006) showed the effect of the inverse gamma prior on τ 2:

τ

5 10 15 20 25 30

8 schools: posterior on τ given uniform prior on τ τ

5 10 15 20 25 30

8 schools: posterior on τ given inv−gamma (1, 1) prior on τ2 τ

5 10 15 20 25 30

8 schools: posterior on τ given inv−gamma (.001, .001) prior on τ2

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 5 / 44

slide-10
SLIDE 10

Introducing the univariate F and matrix-F prior

Outline

1

Problems with inverse gamma priors

2

Introducing the univariate F and matrix-F prior

3

The matrix-F prior in regularized regression

4

The matrix-F prior for testing covariance matrices Testing a precise hypothesis Testing inequality constrained hypotheses

5

The matrix-F prior for modeling random effects covariance matrices

6

Summary

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 6 / 44

slide-11
SLIDE 11

Introducing the univariate F and matrix-F prior

The F prior

The issue of the inverse gamma prior can be resolved by mixing the scale parameter with a gamma distribution. This results in a univariate F prior: F(σ2; ν, δ, b) =

  • IG(σ2; δ

2, ψ2) × G(ψ2; ν 2 , b−1)dψ2,

with degrees of freedom parameters ν and δ, and scale parameter b.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 7 / 44

slide-12
SLIDE 12

Introducing the univariate F and matrix-F prior

The F prior

The issue of the inverse gamma prior can be resolved by mixing the scale parameter with a gamma distribution. This results in a univariate F prior: F(σ2; ν, δ, b) =

  • IG(σ2; δ

2, ψ2) × G(ψ2; ν 2 , b−1)dψ2,

with degrees of freedom parameters ν and δ, and scale parameter b. Mixing a hyperparameter with another distribution is a way to robustify a prior.

Example: The Student t prior is known to be more robust than a normal prior for regression analysis. The Student t prior is obtained by mixing the variance of a normal prior: t(β; µ, γ, ν) =

  • N(β; µ, σ2)IG(σ2; ν

2 , γ 2 )dσ2.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 7 / 44

slide-13
SLIDE 13

Introducing the univariate F and matrix-F prior

The F prior

Setting ν = 1, the standard deviation has a half-t distribution: p(σ|ν = 1, δ, b) =

2Γ( δ+1 2 ) Γ( δ 2 ) √ bπ

  • 1 + σ2

b

δ+1

2

.

τ τ τ τ τ τ

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 8 / 44

slide-14
SLIDE 14

Introducing the univariate F and matrix-F prior

The F prior

Setting ν = 1, the standard deviation has a half-t distribution: p(σ|ν = 1, δ, b) =

2Γ( δ+1 2 ) Γ( δ 2 ) √ bπ

  • 1 + σ2

b

δ+1

2

. The F prior results in more desirable behavior than the inverse gamma prior for school data (Gelman, 2006).

τ

50 100 150 200

3 schools: posterior on τ given uniform prior on τ τ

50 100 150 200

3 schools: posterior on τ given F(1,1,25)-prior on τ2

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 8 / 44

slide-15
SLIDE 15

Introducing the univariate F and matrix-F prior

The matrix-F prior

In a multivariate setting, the inverse Wishart prior is the default choice for a k × k covariance matrix.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 9 / 44

slide-16
SLIDE 16

Introducing the univariate F and matrix-F prior

The matrix-F prior

In a multivariate setting, the inverse Wishart prior is the default choice for a k × k covariance matrix. The inverse Wishart prior is a matrix generalization of the inverse gamma prior, and thus has similar issues.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 9 / 44

slide-17
SLIDE 17

Introducing the univariate F and matrix-F prior

The matrix-F prior

In a multivariate setting, the inverse Wishart prior is the default choice for a k × k covariance matrix. The inverse Wishart prior is a matrix generalization of the inverse gamma prior, and thus has similar issues. We propose to robustify the inverse Wishart by mixing the scale matrix with a Wishart distribution: F(Σ; ν, δ, S) =

  • IW(Σ; δ + k − 1, Ψ) × W(Ψ; ν, B)dΨ,

where ν controls the behavior near the origin of |Σ|, δ controls the behavior in the tails of |Σ|, and B is a scale matrix.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 9 / 44

slide-18
SLIDE 18

Introducing the univariate F and matrix-F prior

The matrix-F prior

In a multivariate setting, the inverse Wishart prior is the default choice for a k × k covariance matrix. The inverse Wishart prior is a matrix generalization of the inverse gamma prior, and thus has similar issues. We propose to robustify the inverse Wishart by mixing the scale matrix with a Wishart distribution: F(Σ; ν, δ, S) =

  • IW(Σ; δ + k − 1, Ψ) × W(Ψ; ν, B)dΨ,

where ν controls the behavior near the origin of |Σ|, δ controls the behavior in the tails of |Σ|, and B is a scale matrix. Setting S = Ik yields the standard matrix-F distribution (Dawid, 1981).

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 9 / 44

slide-19
SLIDE 19

Introducing the univariate F and matrix-F prior

Properties of the matrix-F distribution

Reciprocity: Σ ∼ F(ν, δ, S) ⇒ Σ−1 ∼ F(δ + k − 1, ν − k + 1, S−1)

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 10 / 44

slide-20
SLIDE 20

Introducing the univariate F and matrix-F prior

Properties of the matrix-F distribution

Reciprocity: Σ ∼ F(ν, δ, S) ⇒ Σ−1 ∼ F(δ + k − 1, ν − k + 1, S−1) Invariant under marginalization: Σ ∼ F(ν, δ, S) ⇒ Σ11 ∼ F(ν, δ, S11)

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 10 / 44

slide-21
SLIDE 21

Introducing the univariate F and matrix-F prior

Properties of the matrix-F distribution

Reciprocity: Σ ∼ F(ν, δ, S) ⇒ Σ−1 ∼ F(δ + k − 1, ν − k + 1, S−1) Invariant under marginalization: Σ ∼ F(ν, δ, S) ⇒ Σ11 ∼ F(ν, δ, S11) Implementation in Gibbs sampler: The matrix-F prior can easily be implemented in a Gibbs sampler using a parameter expansion: Σ ∼ F(ν, δ, S) ⇔ Σ ∼ IW(Σ; δ + k − 1, Ψ) Ψ ∼ W(Ψ; ν, B) Then Ψ|Σ ∼ W(ν + δ + k − 1, (B−1 + Σ−1)−1).

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 10 / 44

slide-22
SLIDE 22

Introducing the univariate F and matrix-F prior

Properties of the matrix-F distribution

Implementation in R:

In R, draw Σ having an inverse Wishart prior: Sigma <- solve(rwish(v=n+k,S=solve(SS + B0)) In R, draw Σ having a matrix-F prior: SigmaInv <- rwish(v=nu+k,S=solve(SS + Psi) Psi <- rwish(v=nu+delta+k-1,S=solve(SigmaInv+B0Inv))

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 11 / 44

slide-23
SLIDE 23

Introducing the univariate F and matrix-F prior

Properties of the matrix-F distribution

Implementation in R:

In R, draw Σ having an inverse Wishart prior: Sigma <- solve(rwish(v=n+k,S=solve(SS + B0)) In R, draw Σ having a matrix-F prior: SigmaInv <- rwish(v=nu+k,S=solve(SS + Psi) Psi <- rwish(v=nu+delta+k-1,S=solve(SigmaInv+B0Inv))

Setting hyperparameters A minimally informative default prior can be obtained by setting ν = k, δ = 1, and B equal to a “prior guess”, or use an empirical Bayes prior scale (Kass & Natarajan, 2008).

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 11 / 44

slide-24
SLIDE 24

The matrix-F prior in regularized regression

Outline

1

Problems with inverse gamma priors

2

Introducing the univariate F and matrix-F prior

3

The matrix-F prior in regularized regression

4

The matrix-F prior for testing covariance matrices Testing a precise hypothesis Testing inequality constrained hypotheses

5

The matrix-F prior for modeling random effects covariance matrices

6

Summary

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 12 / 44

slide-25
SLIDE 25

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

A common problem in regression analysis is detecting true large effects in the case of many predictors (p ≫ n). θ

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 13 / 44

slide-26
SLIDE 26

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

A common problem in regression analysis is detecting true large effects in the case of many predictors (p ≫ n). The lasso estimate is a popular solution for this problem. θ

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 13 / 44

slide-27
SLIDE 27

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

A common problem in regression analysis is detecting true large effects in the case of many predictors (p ≫ n). The lasso estimate is a popular solution for this problem. A proper horseshoe prior for Bayesian regularized regression performs better in certain scenario’s (Carvalho et al., 2010). θ

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 13 / 44

slide-28
SLIDE 28

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

A common problem in regression analysis is detecting true large effects in the case of many predictors (p ≫ n). The lasso estimate is a popular solution for this problem. A proper horseshoe prior for Bayesian regularized regression performs better in certain scenario’s (Carvalho et al., 2010). θ

likelihood

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 14 / 44

slide-29
SLIDE 29

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

A common problem in regression analysis is detecting true large effects in the case of many predictors (p ≫ n). The lasso estimate is a popular solution for this problem. A proper horseshoe prior for Bayesian regularized regression performs better in certain scenario’s (Carvalho et al., 2010). θ

likelihood posterior

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 15 / 44

slide-30
SLIDE 30

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

A common problem in regression analysis is detecting true large effects in the case of many predictors (p ≫ n). The lasso estimate is a popular solution for this problem. A proper horseshoe prior for Bayesian regularized regression performs better in certain scenario’s (Carvalho et al., 2010). θ

likelihood

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 16 / 44

slide-31
SLIDE 31

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

A common problem in regression analysis is detecting true large effects in the case of many predictors (p ≫ n). The lasso estimate is a popular solution for this problem. A proper horseshoe prior for Bayesian regularized regression performs better in certain scenario’s (Carvalho et al., 2010). θ

likelihood posterior

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 17 / 44

slide-32
SLIDE 32

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

When predictors are grouped, e.g., when using several dummy variables to model a categorical predictor, it may be preferable to either select all predictors belonging to a certain group or none.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 18 / 44

slide-33
SLIDE 33

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

When predictors are grouped, e.g., when using several dummy variables to model a categorical predictor, it may be preferable to either select all predictors belonging to a certain group or none. The grouped-lasso is a popular solution for such grouped predictors.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 18 / 44

slide-34
SLIDE 34

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

When predictors are grouped, e.g., when using several dummy variables to model a categorical predictor, it may be preferable to either select all predictors belonging to a certain group or none. The grouped-lasso is a popular solution for such grouped predictors. A horse-shoe type prior can be constructed using the matrix-F distribution resulting in similar selection beheavior: p(θ) =

  • N(θ; 0, Σ) × F(Σ; k, 1, B)dΣ

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 18 / 44

slide-35
SLIDE 35

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

When predictors are grouped, e.g., when using several dummy variables to model a categorical predictor, it may be preferable to either select all predictors belonging to a certain group or none. The grouped-lasso is a popular solution for such grouped predictors. A horse-shoe type prior can be constructed using the matrix-F distribution resulting in similar selection beheavior: p(θ) =

  • N(θ; 0, Σ) × F(Σ; k, 1, B)dΣ

Thicker tails than a Cauchy distribution: p(θ) =

  • C(θ; 0, Ψ) × W(Ψ; k, B)dΨ

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 18 / 44

slide-36
SLIDE 36

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

When predictors are grouped, e.g., when using several dummy variables to model a categorical predictor, it may be preferable to either select all predictors belonging to a certain group or none. The grouped-lasso is a popular solution for such grouped predictors. A horse-shoe type prior can be constructed using the matrix-F distribution resulting in similar selection beheavior: p(θ) =

  • N(θ; 0, Σ) × F(Σ; k, 1, B)dΣ

Thicker tails than a Cauchy distribution: p(θ) =

  • C(θ; 0, Ψ) × W(Ψ; k, B)dΨ

Pole at θ = 0 because p(θ) → +∞ as θ → 0.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 18 / 44

slide-37
SLIDE 37

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

−3 −2 −1 1 2 3 −3 −2 −1 1

1

2 3 0.05 0.10 0.15 0.20 0.25

θ

2

θ Legend: Dashed contour = likelihood contour; solid contour = posterior contour.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 19 / 44

slide-38
SLIDE 38

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

−3 −2 −1 1 2 3 −3 −2 −1 1

1

2 3 0.05 0.10 0.15 0.20 0.25

θ

2

θ Legend: Dashed contour = likelihood contour; solid contour = posterior contour.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 20 / 44

slide-39
SLIDE 39

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

−3 −2 −1 1 2 3 −3 −2 −1 1

1

2 3 0.05 0.10 0.15 0.20 0.25

θ

2

θ Legend: Dashed contour = likelihood contour; solid contour = posterior contour.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 21 / 44

slide-40
SLIDE 40

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

−3 −2 −1 1 2 3 −3 −2 −1 1

1

2 3 0.05 0.10 0.15 0.20 0.25

θ

2

θ Legend: Dashed contour = likelihood contour; solid contour = posterior contour.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 22 / 44

slide-41
SLIDE 41

The matrix-F prior in regularized regression

The matrix-F distribution in regularized regression

−3 −2 −1 1 2 3 −3 −2 −1 1

1

2 3 0.05 0.10 0.15 0.20 0.25

θ

2

θ Legend: Dashed contour = likelihood contour; solid contour = posterior contour.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 23 / 44

slide-42
SLIDE 42

The matrix-F prior for testing covariance matrices

Outline

1

Problems with inverse gamma priors

2

Introducing the univariate F and matrix-F prior

3

The matrix-F prior in regularized regression

4

The matrix-F prior for testing covariance matrices Testing a precise hypothesis Testing inequality constrained hypotheses

5

The matrix-F prior for modeling random effects covariance matrices

6

Summary

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 24 / 44

slide-43
SLIDE 43

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (1)

Consider the following hypothesis test of a covariance matrix: H0 : Σ = Σ0 vs H1 : Σ = Σ0, when considering multivariate normal data, xi ∼ N(µ, Σ).

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 25 / 44

slide-44
SLIDE 44

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (1)

Consider the following hypothesis test of a covariance matrix: H0 : Σ = Σ0 vs H1 : Σ = Σ0, when considering multivariate normal data, xi ∼ N(µ, Σ). Bayesian hypothesis tests can be conducted using the marginal likelihood: m0(X) =

  • p(X|µ, Σ0)p0(µ)dµ

m1(X) =

  • p(X|µ, Σ)p1(µ, Σ)dµdΣ.

The test is performed using the Bayes factor: B01 = m0(X)

m1(X).

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 25 / 44

slide-45
SLIDE 45

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (1)

Consider the following hypothesis test of a covariance matrix: H0 : Σ = Σ0 vs H1 : Σ = Σ0, when considering multivariate normal data, xi ∼ N(µ, Σ). Bayesian hypothesis tests can be conducted using the marginal likelihood: m0(X) =

  • p(X|µ, Σ0)p0(µ)dµ

m1(X) =

  • p(X|µ, Σ)p1(µ, Σ)dµdΣ.

The test is performed using the Bayes factor: B01 = m0(X)

m1(X).

Problem: How to choose the priors p0 and p1?

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 25 / 44

slide-46
SLIDE 46

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (1)

Default Bayes factors, such as the intrinsic Bayes factor (Berger & Pericchi, 1996) or the fractional Bayes factor (O’Hagan, 1995), avoid the choice of a prior by updating a noninformative improper prior with a minimal subset of the data to obtain a posterior prior, and the remaining subset of the data is used for hypothesis testing.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 26 / 44

slide-47
SLIDE 47

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (1)

Default Bayes factors, such as the intrinsic Bayes factor (Berger & Pericchi, 1996) or the fractional Bayes factor (O’Hagan, 1995), avoid the choice of a prior by updating a noninformative improper prior with a minimal subset of the data to obtain a posterior prior, and the remaining subset of the data is used for hypothesis testing. In certain situations, such default Bayes factors behave as actual Bayes factors based on so-called intrinsic priors as n → ∞.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 26 / 44

slide-48
SLIDE 48

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (1)

Default Bayes factors, such as the intrinsic Bayes factor (Berger & Pericchi, 1996) or the fractional Bayes factor (O’Hagan, 1995), avoid the choice of a prior by updating a noninformative improper prior with a minimal subset of the data to obtain a posterior prior, and the remaining subset of the data is used for hypothesis testing. In certain situations, such default Bayes factors behave as actual Bayes factors based on so-called intrinsic priors as n → ∞. A proper intrinsic prior can be used to compute an “objective” Bayes factor without needing to formulate a subjective prior or without needing to split the data for prior specification and hypothesis testing.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 26 / 44

slide-49
SLIDE 49

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (1)

An intrinsic prior can be found via (Berger & Pericchi, 2004) pN

1 (µ, Σ|X(ℓ))

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 27 / 44

slide-50
SLIDE 50

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (1)

An intrinsic prior can be found via (Berger & Pericchi, 2004) pN

1 (µ, Σ|X(ℓ))pN 0 (X(ℓ))

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 28 / 44

slide-51
SLIDE 51

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (1)

An intrinsic prior can be found via (Berger & Pericchi, 2004) pI

1(µ, Σ) =

  • pN

1 (µ, Σ|X(ℓ))pN 0 (X(ℓ))dX(ℓ)

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 29 / 44

slide-52
SLIDE 52

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (1)

An intrinsic prior can be found via (Berger & Pericchi, 2004) pI

1(µ, Σ) =

  • pN

1 (µ, Σ|X(ℓ))pN 0 (X(ℓ))dX(ℓ),

where pN

0 (X(ℓ))

=

  • p(X(ℓ)|µ, Σ0)pN

0 (µ)dµ

pN

1 (µ, Σ|X(ℓ))

= p(X(ℓ)|µ, Σ)pN

1 (µ, Σ)

  • p(X(ℓ)|µ, Σ)pN

1 (µ, Σ)dµdΣ

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 30 / 44

slide-53
SLIDE 53

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (1)

Theorem When testing H0 : Σ = Σ0 versus H1 : Σ = Σ0 using iid k-variate data with xi ∼ N(µ, Σ), for i = 1, . . . , n, the intrinsic prior under H1 is given by πI

1(µ, Σ) = F(Σ; k, 1, Σ0)

based on the noninformative improper priors πN

1 (µ, Σ) = |Σ|− k+1

2

and πN

0 (µ) = 1, and a minimal training sample of size m = k + 1. This is

also the case when µ is known. Proposition The Bayes factor of H0 : Σ = Σ0 versus H1 : Σ = Σ0 based on the intrinsic prior is consistent.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 31 / 44

slide-54
SLIDE 54

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (2)

Consider the following hypothesis test of a covariance matrix: H1 : σ1 < . . . < σk vs H2 : σ1 > . . . > σk vs H3 : neither H1, nor H2, when considering multivariate normal data, xi ∼ N(µ, Σ).

σ σ σ σ

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 32 / 44

slide-55
SLIDE 55

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (2)

Consider the following hypothesis test of a covariance matrix: H1 : σ1 < . . . < σk vs H2 : σ1 > . . . > σk vs H3 : neither H1, nor H2, when considering multivariate normal data, xi ∼ N(µ, Σ). Let Hu : “Σ is pos. def.”, and p2(Σ) = pu(Σ) × I(σ1 > . . . > σk) × Pr(σ1 > . . . > σk|Hu)−1

5 10 15 20 5 10 15 20

σ1

2

σ2

2 5 10 15 20 5 10 15 20

σ1

2

σ2

2

p2 pu

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 32 / 44

slide-56
SLIDE 56

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (2)

Consider the following hypothesis test of a covariance matrix: H1 : σ1 < . . . < σk vs H2 : σ1 > . . . > σk vs H3 : neither H1, nor H2, when considering multivariate normal data, xi ∼ N(µ, Σ). Let Hu : “Σ is pos. def.”, and p2(Σ) = pu(Σ) × I(σ1 > . . . > σk) × Pr(σ1 > . . . > σk|Hu)−1

5 10 15 20 5 10 15 20

σ1

2

σ2

2 5 10 15 20 5 10 15 20

σ1

2

σ2

2

p2 pu

The Bayes factor is given by: B2u = Pr(σ1>...>σk|Hu,X)

Pr(σ1>...>σk|Hu) .

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 32 / 44

slide-57
SLIDE 57

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (2)

H1 : σ1 < . . . < σk vs H2 : σ1 > . . . > σk vs H3 : neither H1, nor H2.

σ σ σ

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 33 / 44

slide-58
SLIDE 58

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (2)

H1 : σ1 < . . . < σk vs H2 : σ1 > . . . > σk vs H3 : neither H1, nor H2. As unconstrained priors we considered

1

Σ ∼ F(3, 1, I3).

2

Σ ∼ IW(3, I3).

σ σ σ

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 33 / 44

slide-59
SLIDE 59

The matrix-F prior for testing covariance matrices Testing inequality constrained hypotheses

The matrix-F prior for testing covariance matrices (2)

H1 : σ1 < . . . < σk vs H2 : σ1 > . . . > σk vs H3 : neither H1, nor H2. As unconstrained priors we considered

1

Σ ∼ F(3, 1, I3).

2

Σ ∼ IW(3, I3).

We fixed n = 20 and let S = diag(1, s, s2), while s → ∞.

1 2 3 4 5 1 2 3 4 5 2 4 6 8 10 12

0.2 0.4 0.6 0.8 1.0

B with F B with F B with IW B with IW

12 13 13 12

F-prior IW-prior J-prior

log(B )

1k

log(s )

2

log(s )

2

P(σ < σ < σ | Y, Hu)

2 1 2 3 2 2

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 33 / 44

slide-60
SLIDE 60

The matrix-F prior for modeling random effects covariance matrices

Outline

1

Problems with inverse gamma priors

2

Introducing the univariate F and matrix-F prior

3

The matrix-F prior in regularized regression

4

The matrix-F prior for testing covariance matrices Testing a precise hypothesis Testing inequality constrained hypotheses

5

The matrix-F prior for modeling random effects covariance matrices

6

Summary

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 34 / 44

slide-61
SLIDE 61

The matrix-F prior for modeling random effects covariance matrices

The matrix-F prior for estimating hierarchical models (1)

Kass and Natarajan (2006) considered the following hierarchical Poisson regression model: yi|bi, xi ∼ Poisson(µx,b

i

) µx,b

I

= exp{β0 + β1 log(xi + 10) + β2xi + bi} bi ∼ N(0, σ2), for i = 1, . . . , 18.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 35 / 44

slide-62
SLIDE 62

The matrix-F prior for modeling random effects covariance matrices

The matrix-F prior for estimating hierarchical models (1)

Kass and Natarajan (2006) considered the following hierarchical Poisson regression model: yi|bi, xi ∼ Poisson(µx,b

i

) µx,b

I

= exp{β0 + β1 log(xi + 10) + β2xi + bi} bi ∼ N(0, σ2), for i = 1, . . . , 18. Population values: β0 = 2.203, β1 = .311, β2 = −.001, and σ2 = .04.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 35 / 44

slide-63
SLIDE 63

The matrix-F prior for modeling random effects covariance matrices

The matrix-F prior for estimating hierarchical models (1)

Kass and Natarajan (2006) considered the following hierarchical Poisson regression model: yi|bi, xi ∼ Poisson(µx,b

i

) µx,b

I

= exp{β0 + β1 log(xi + 10) + β2xi + bi} bi ∼ N(0, σ2), for i = 1, . . . , 18. Population values: β0 = 2.203, β1 = .311, β2 = −.001, and σ2 = .04. Classical risk and nonconvergence of the 95%-CI’s were determined.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 35 / 44

slide-64
SLIDE 64

The matrix-F prior for modeling random effects covariance matrices

The matrix-F prior for estimating hierarchical models (1)

Hierarchical Poisson regression model

IW (1, R∗) πus F(1, 1, R∗) F(1, 1, 103) (σ2)− 1

2

Risk β .01 ± .00 .01 ± .00 .11 ± .00 .10 ± .00 .11 ± .00 σ2 .12 ± .00 .62 ± .02 .23 ± .01 .28 ± .01 .27 ± .01 Noncoverage β0 .056 ± .007 .070 ± .008 .064 ± .007 .047 ± .007 .048 ± .008 β1 .059 ± .007 .067 ± .008 .065 ± .007 .048 ± .007 .049 ± .007 β2 .060 ± .007 .075 ± .008 .053 ± .007 .058 ± .007 .051 ± .007 σ2 .007 ± .003 .037 ± .006 .048 ± .007 .050 ± .007 .045 ± .007

IW (1, R∗) is the default (empirical Bayes) conjugate prior of Kass & Natarajan (2006); πus is the approximate uniform shrinkage prior of Natarajan & Kass (1999).

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 36 / 44

slide-65
SLIDE 65

The matrix-F prior for modeling random effects covariance matrices

The matrix-F prior for estimating hierarchical models (2)

Natarajan and Kass (1999) considered the following hierarchical logistic regression model: logit(µb

ij)

= β0 + β1tj + β2xi + β3xitj + bi0 + bi1tj bi ∼ N(0, Σ), for n = 30, tj = j − 4, for j = 1, . . . , 7.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 37 / 44

slide-66
SLIDE 66

The matrix-F prior for modeling random effects covariance matrices

The matrix-F prior for estimating hierarchical models (2)

Natarajan and Kass (1999) considered the following hierarchical logistic regression model: logit(µb

ij)

= β0 + β1tj + β2xi + β3xitj + bi0 + bi1tj bi ∼ N(0, Σ), for n = 30, tj = j − 4, for j = 1, . . . , 7. Population values: β = (−.625, .25, −.25, .125)′ and Σ = diag(.5, .25).

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 37 / 44

slide-67
SLIDE 67

The matrix-F prior for modeling random effects covariance matrices

The matrix-F prior for estimating hierarchical models (2)

Natarajan and Kass (1999) considered the following hierarchical logistic regression model: logit(µb

ij)

= β0 + β1tj + β2xi + β3xitj + bi0 + bi1tj bi ∼ N(0, Σ), for n = 30, tj = j − 4, for j = 1, . . . , 7. Population values: β = (−.625, .25, −.25, .125)′ and Σ = diag(.5, .25). Classical risk and nonconvergence of the 95%-CI’s were determined.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 37 / 44

slide-68
SLIDE 68

The matrix-F prior for modeling random effects covariance matrices

The matrix-F prior for estimating hierarchical models (2)

Hierarchical logistic regression model Results for the random effects covariance matrix Σ.

Noncoverage Interval width Prior Risk σ2

1

σ12 σ2

2

σ2

1

σ12 σ2

2

F(Σ; 2, 2, R∗) 3.32 ± .18 .034 .045 .043 2.11 1.07 .90 πus 3.10 ± .19 .035 .029 .041 2.12 1.05 .88 HW-prior 7.64 ± .50 .070 .009 .110 2.89 1.08 1.28

πus is the approximate uniform shrinkage prior of Natarajan & Kass (1999). The HW-prior is the marginally noninformative prior of Huang and Wand (2013).

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 38 / 44

slide-69
SLIDE 69

The matrix-F prior for modeling random effects covariance matrices

The matrix-F prior for estimating hierarchical models (2)

Hierarchical logistic regression model Results for the fixed effects β.

Noncoverage Interval width Prior Risk β0 β1 β2 β3 β0 β1 β2 β3 F(Σ; 2, 2, R∗) .44 ± .01 .052 .048 .055 .045 1.33 .81 1.89 1.15 πus .46 ± .02 .033 .058 .044 .045 1.44 .83 2.12 1.19 HW-prior .51 ± .02 .061 .046 .055 .044 1.45 .91 2.05 1.28 πus is the approximate uniform shrinkage prior of Natarajan & Kass (1999). The HW-prior is the marginally noninformative prior of Huang and Wand (2013).

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 39 / 44

slide-70
SLIDE 70

The matrix-F prior for modeling random effects covariance matrices

The matrix-F prior for estimating hierarchical models (2)

Hierarchical logistic regression model Results for the random effects bi.

Risk Noncoverage Interval width Prior b0 b1 b0 b1 b0 b1 F(Σ; 2, 2, R∗) 11.65 ± .13 4.67 ± .05 .058 .057 2.54 1.60 πus 11.51 ± .12 4.51 ± .05 .045 .048 2.67 1.63 HW-prior 12.46 ± .17 5.20 ± .08 .049 .046 2.80 1.77

πus is the approximate uniform shrinkage prior of Natarajan & Kass (1999). The HW-prior is the marginally noninformative prior of Huang and Wand (2013).

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 40 / 44

slide-71
SLIDE 71

Summary

Outline

1

Problems with inverse gamma priors

2

Introducing the univariate F and matrix-F prior

3

The matrix-F prior in regularized regression

4

The matrix-F prior for testing covariance matrices Testing a precise hypothesis Testing inequality constrained hypotheses

5

The matrix-F prior for modeling random effects covariance matrices

6

Summary

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 41 / 44

slide-72
SLIDE 72

Summary

Summary

The F distribution can “safely” be used as prior for the random effects covariance matrix. The matrix-F prior is competitive in terms of risk and coverage rates in generalized linear mixed models. The matrix-F prior can straightforwardly be implemented in a Gibbs sampler. A minimally informative matrix-F prior can easily be specified based

  • n a prior guess or empirical Bayes scale matrix.

The matrix-F prior can be used for constructing multivariate horseshoe type priors for estimating sparse signals. The matrix-F prior serves as an intrinsic prior when testing a covariance matrix of multivariate normal data. The matrix-F prior results in satisfactory selection behavior for testing inequality constrained hypotheses.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 42 / 44

slide-73
SLIDE 73

Summary

References

Berger, J. O. & Pericchi, L. R. (2004). Training samples in objective Bayesian model selection. The Annals of Statistics, 32, 841–869. Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97, 465–480. Dawid, A. P. (1981). Some matrix-variate distribution theory: Notational considerations and a Bayesian application. Biometrika, 68, 265–274. Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models(Comment on Article by Browne and Draper). Bayesian Analysis, 3, 515–534. Mulder, J. & Pericchi, L.R. (in press). The matrix-variate F prior for estimating and testing covariance matrices. Bayesian Analysis.

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 43 / 44

slide-74
SLIDE 74

Summary

Thank you!

Mulder (Tilburg University) The Matrix-F Prior CWI, Amsterdam 44 / 44