High-Dimensional Multivariate Bayesian Linear Regression with - - PowerPoint PPT Presentation

high dimensional multivariate bayesian linear regression
SMART_READER_LITE
LIVE PREVIEW

High-Dimensional Multivariate Bayesian Linear Regression with - - PowerPoint PPT Presentation

High-Dimensional Multivariate Bayesian Linear Regression with Shrinkage Priors Ray Bai Department of Statistics, University of Florida Joint work with Dr. Malay Ghosh March 20, 2018 Ray Bai (University of Florida) MBSP March 20, 2018 1 / 48


slide-1
SLIDE 1

High-Dimensional Multivariate Bayesian Linear Regression with Shrinkage Priors

Ray Bai

Department of Statistics, University of Florida Joint work with Dr. Malay Ghosh

March 20, 2018

Ray Bai (University of Florida) MBSP March 20, 2018 1 / 48

slide-2
SLIDE 2

Overview

1

Overview of High-Dimensional Multivariate Linear Regression

2

Multivariate Bayesian Model with Shrinkage Priors (MBSP)

3

Posterior Consistency of MBSP Low-Dimensional Case Ultrahigh-Dimensional Case

4

Implementation of the MBSP Model

5

Simulation Study

6

Yeast Cell Cycle Data Analysis

Ray Bai (University of Florida) MBSP March 20, 2018 2 / 48

slide-3
SLIDE 3

Simultaneous Prediction and Estimation

There are many scenarios where we would want to simultaneously predict q continuous response variables y1, ..., yq: Longitudinal data: The q response variables represent measurements at q consecutive time points.

mRNA levels at different time points children’s heights at different ages of development CD4 cell counts over time for HIV/AIDS patients

The data have a group structure: The q response variables represent a “group.”

In genomics, genes within the same pathway often act together in regulating a biological system.

Ray Bai (University of Florida) MBSP March 20, 2018 3 / 48

slide-4
SLIDE 4

Multivariate Linear Regression

Consider the multivariate linear regression model, Y = XB + E, where Y = (y1, ..., yq) is an n × q response matrix of n samples and q response variables, X is an n × p matrix of n samples and p covariates, B ∈ Rp×q is the coefficient matrix, and E = (ε1, ..., εn)T is an n × q noise matrix, where εi

i.i.d.

∼ Nq(0, Σ), i = 1, ..., n. Throughout, we assume that X is centered, so there is no intercept term.

Ray Bai (University of Florida) MBSP March 20, 2018 4 / 48

slide-5
SLIDE 5

Multivariate Linear Regression

For the multivariate linear regression model, Yn×q = Xn×pBp×q + En×q, where E = (ε1, ..., εn)T, εi

i.i.d.

∼ Nq(0, Σ), i = 1, ..., n, Σ represents the covariance structure of the q response variables. We wish to estimate the coefficient matrix B. Model selection from the p covariates is also often desired. This can be done using multivariate generalizations of AIC, BIC, or Mallow’s Cp.

Ray Bai (University of Florida) MBSP March 20, 2018 5 / 48

slide-6
SLIDE 6

Multivariate Linear Regression

For the multivariate linear regression model, the usual maximum likelihood estimator (MLE) is the ordinary least squares estimator,

  • B = (XTX)−1XTY.

The MLE is only unique if p ≤ n. It is well-known that the MLE is an inconsistent estimator of B if p/n → c, c > 0. Variable selection using AIC, BIC, and Mallow’s Cp is infeasible for large p, since it requires searching over a model space of 2p models.

Ray Bai (University of Florida) MBSP March 20, 2018 6 / 48

slide-7
SLIDE 7

High-Dimensional Multivariate Linear Regression

To handle cases where p is large (including the p > n regime), frequentists typically use penalized regression (e.g. Li et al. (2015), Vincent and HAnsen (2014), Wilms and Croux (2017)): min

B ||Y − XB||2 2 + λ p

i=1

||bi||2, where bi represents the ith row of B and λ > 0 is a tuning parameter. The group lasso penalty, || · ||2, shrinks entire rows of B to exactly 0, leading to a sparse estimate of B and facilitating variable selection from the p estimators. We can use adaptive group lasso penalty to avoid overshrinkage of bi, i = 1, ..., p.

Ray Bai (University of Florida) MBSP March 20, 2018 7 / 48

slide-8
SLIDE 8

Bayesian High-Dimensional Multivariate Linear Regression

The Bayesian approach is to put a prior distribution on B, π(B). That is, given the model, Y = XB + E and data (X, Y), we have π(B|Y) ∝ f (Y|X, B)π(B). Inference can be conducted through the posterior, π(B|Y).

Ray Bai (University of Florida) MBSP March 20, 2018 8 / 48

slide-9
SLIDE 9

Bayesian High-Dimensional Multivariate Linear Regression

To achieve sparsity and variable selection, a common approach is to place spike-and-slab priors on the rows of B (e.g. Brown et al. (1998), Liquet et

  • al. (2017)):

bT

i i.i.d.

∼ (1 − p)δ{0} + pNq(0, τ2V), i = 1, ..., p. δ{0} represents a point mass at 0 ∈ Rq, and V is a q × q symmetric positive definite matrix. τ2 can be treated as a tuning parameter, or a prior can be placed on τ2. A prior can also be placed on p so that the model adapts to the underlying sparsity. Usually, we put a Beta prior on p.

Ray Bai (University of Florida) MBSP March 20, 2018 9 / 48

slide-10
SLIDE 10

Bayesian High-Dimensional Multivariate Linear Regression

For the spike-and-slab approach, bT

i i.i.d.

∼ (1 − p)δ{0} + pNq(0, τ2V), i = 1, ..., p, τ2 ∼ µ(τ2), p ∼ B(a, b), Taking the posterior median will give a point estimate of B with rows equal to 0T, thus recovering a sparse estimate of B and facilitating variable selection. Due to the point mass at 0, this model can be very, very slow for large p.

Ray Bai (University of Florida) MBSP March 20, 2018 10 / 48

slide-11
SLIDE 11

Bayesian High-Dimensional Multivariate Linear Regression

Due to the computational inefficiency of discontinuous priors, it is often desirable to put a continuous prior on the parameters of interest. For the multivariate linear regression model, Y = XB + E,

  • ur aim to estimate B.

This requires putting a prior density on a p × q matrix. A popular continuous prior to place on B is the matrix-normal prior.

Ray Bai (University of Florida) MBSP March 20, 2018 11 / 48

slide-12
SLIDE 12

The Matrix-Normal Prior

Definition

A random matrix X is said to have the matrix-normal density if X has the density function (on the space Ra×b): f (X) = |U|−b/2|V|−a/2 (2π)ab/2 e− 1

2 tr[U−1(X−M)V−1(X−M)T],

where M ∈ Ra×b, and U and V are positive semi-definite matrices of dimension a × a and b × b respectively. If X is distributed as a matrix-normal distribution with pdf above, we write X ∼ MNa×b(M, U, V).

Ray Bai (University of Florida) MBSP March 20, 2018 12 / 48

slide-13
SLIDE 13

Multivariate Bayesian Model with Shrinkage Priors (MBSP)

By adding an additional layer in the Bayesian hierarchy, we can obtain a row-sparse estimate of B. This row-sparse estimate also facilitates variable selection from the p variables. Our model is specified as follows: Y|X, B, Σ ∼ MNn×q(XB, In, Σ), B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξp), Σ), ξi

ind

∼ π(ξi), i = 1, ..., p, where τ > 0 is a tuning parameter, and π(ξi) is a polynomial-tailed prior density of the form, π(ξi) = K(ξi)−a−1L(ξi), where K > 0 is the constant of proportionality, a is positive real number, and L is a a positive measurable, non-constant, slowly varying function

  • ver (0, ∞).

Ray Bai (University of Florida) MBSP March 20, 2018 13 / 48

slide-14
SLIDE 14

Examples of Polynomial-Tailed Priors

Prior π(ξi)/C L(ξi) Student’s t ξ−a−1

i

exp(−a/ξi) exp −a/ξi Horseshoe ξ−1/2

i

(1 + ξi)−1 ξa

i /(1 + ξi)

Horseshoe+ ξ−1/2

i

(ξi − 1)−1 log(ξi) ξa

i (ξi − 1)−1 log(ξi)

NEG (1 + ξi)−1−a {ξi/(1 + ξi)}a+1 TPBN ξu−1

i

(1 + ξi)−a−u {ξi/(1 + ξi)}a+u GDP ∞

λ2 2 exp

  • − λ2ξi

2

  • λ2a−1 exp(−ηλ)dλ

0 ta exp(−t − η

  • 2t/ξi)dt

HIB ξu−1

i

(1 + ξi)−(a+u) exp

s 1+ξi

  • {ξi/(1 + ξi)}a+u

×

  • φ2 + 1−φ2

1+ξi

−1 × exp

s 1+ξi

φ2 + 1−φ2

1+ξi

−1

Table: Polynomial-tailed priors, their respective prior densities for π(ξi) up to normalizing

constant C, and the slowly-varying component L(ξi).

Ray Bai (University of Florida) MBSP March 20, 2018 14 / 48

slide-15
SLIDE 15

Sparse Estimation of B: Examples

If π(ξj) ind ∼ Inverse-Gamma(αj, γj

2 ), then the marginal density for B, π(B),

under the MBSP model is proportional to

p

j=1

  • ||bj(τΣ)−1/2||2

2 + γj

−(αj+ q

2 )

, which corresponds to a multivariate t-distribution. Here bj denotes the jth row of B.

Ray Bai (University of Florida) MBSP March 20, 2018 15 / 48

slide-16
SLIDE 16

Sparse Estimation of B: Examples

If π(ξj) ∝ ξq/2−1

j

(1 + ξj)−1, then the joint density π(B, ξ1, ..., ξp) under the MBSP model is proportional to

p

j=1

ξ−1

j

(1 + ξj)−1e

− 1

2ξj ||bj(τΣ)−1/2||2 2,

and integrating out the ξj’s gives a multivariate horseshoe density function.

Ray Bai (University of Florida) MBSP March 20, 2018 16 / 48

slide-17
SLIDE 17

Notation

For any two sequences of positive real numbers {an} and {bn} with bn = 0,

an = O(bn) if

  • an

bn

  • ≤ M for all n, for some positive real number M

independent of n an = o(bn) if limn→∞ an

bn = 0. Therefore, an = o(1) if limn→∞ an = 0.

For a vector v ∈ Rn, ||v||2 :=

  • ∑n

i=1 v2 i denote the ℓ2 norm.

For a matrix A ∈ Ra×b with entries aij, ||A||F :=

  • tr(ATA)

=

  • ∑a

i=1 ∑b j=1 a2 ij denotes the Frobenius norm of A.

For a symmetric matrix A, we denote its minimum and maximum eigenvalues by λmin(A) and λmax(A) respectively.

Ray Bai (University of Florida) MBSP March 20, 2018 17 / 48

slide-18
SLIDE 18

Posterior Consistency

Suppose that the data is generated from a true model, Yn = XB0 + En, where Yn := (Yn,1, ..., Yn,q) and En ∼ MNn×q(O, In, Σ). Letting P0 denote the probability measure underlying the true model above, we define the following notion of posterior consistency:

Definition

(strong posterior consistency) Let Bn = {Bn : ||Bn − B0||F > ε}, where ε > 0. The sequence of posterior distributions of Bn under prior πn(Bn) is said to be strongly consistent under the true model if, for any ε > 0, Πn(Bn|Yn) = Πn(||Bn − B0||F > ε|Yn) → 0 a.s. P0 as n → ∞.

Ray Bai (University of Florida) MBSP March 20, 2018 18 / 48

slide-19
SLIDE 19

Sufficient Conditions for Posterior Consistency

For our theoretical analysis, we assume that q < n is fixed and Σ is known. In practice, Σ is often unknown and can be estimated from the data using an Inverse Wishart prior on Σ or by obtaining a separate estimate Σ (e.g. the MLE) and plugging Σ into our model as an empirical Bayes estimate. Theory is developed separately for: pn = o(n) (low-dimensional setting) pn ≥ O(n) (ultrahigh-dimensional setting)

Ray Bai (University of Florida) MBSP March 20, 2018 19 / 48

slide-20
SLIDE 20

Regularity Conditions for the Low-Dimensional Case

(A1) pn = o(n) and pn ≤ n for all n ≥ 1. (A2) There exist constants c1, c2 so that 0 < c1 < lim sup

n→∞ λmin

XT

n Xn

n

  • ≤ lim sup

n→∞ λmax

XT

n Xn

n

  • < c2 < ∞.

(A3) There exist constants d1 and d2 so that 0 < d1 < λmin(Σ) ≤ λmax(Σ) < d2 < ∞.

Ray Bai (University of Florida) MBSP March 20, 2018 20 / 48

slide-21
SLIDE 21

Sufficient Conditions for Posterior Consistency When p = o(n)

Theorem

Assume that conditions (A1)-(A3) hold. Then the posterior of Bn under any prior πn(Bn) is strongly consistent. That is, for any ε > 0, Πn(Bn|Yn) = Πn(Bn : ||Bn − B0||F > ε|Yn) → 0 P0 a.s. as n → ∞ if Πn

  • Bn : ||Bn − B0||F <

∆ nρ/2

  • > exp(−kn)

for all 0 < ∆ < ε2c1d1/2

1

48c1/2

2

d2 and 0 < k < ε2c1 32d2 − 3∆c1/2

2

2d1/2

1

, where ρ > 0. This theorem applies to any prior on Bn. Provided the prior satisfies the above condition and p = o(n), the posterior is strongly consistent.

Ray Bai (University of Florida) MBSP March 20, 2018 21 / 48

slide-22
SLIDE 22

The MBSP Model

Recall the MBSP model: Y|X, B, Σ ∼ MNn×q(XB, In, Σ), B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξpn), Σ), ξi

ind

∼ π(ξi), i = 1, ..., pn, where τn > 0 and π(ξi) is a polynomial-tailed density of the form, π(ξi) = K(ξi)−a−1L(ξi), To achieve posterior consistency, we require mild conditions on the slowly varying component L(·), τn > 0, and the true unknown coefficients matrix B0.

Ray Bai (University of Florida) MBSP March 20, 2018 22 / 48

slide-23
SLIDE 23

Additional Assumptions under the MBSP Model

(i) For the slowly varying function L(t) in the priors for ξi, 1 ≤ i ≤ pn, limt→∞ L(t) ∈ (0, ∞). That is, there exists c0(> 0) such that L(t) ≥ c0 for all t ≥ t0, for some t0 which depends on both L and c0. (ii) There exists M > 0 so that supj,k |b0

jk| ≤ M < ∞ for all n, i.e. the

maximum entry in B0 is uniformly bounded above in absolute value. (iii) 0 < τn < 1 for all n, and τn = o

  • 1

pnnρ

  • for some ρ > 0.

Ray Bai (University of Florida) MBSP March 20, 2018 23 / 48

slide-24
SLIDE 24

Posterior Consistency of MBSP (low-dimensional case)

Theorem

Suppose that we have the MBSP model with polynomial-tailed priors for ξ1, ..., ξp. Provided that Assumptions (A1)-(A3) and (i)-(iii) hold, our model achieves strong posterior consistency. That is, for any ε > 0, Πn(Bn : ||Bn − B0||F > ε|Yn) → 0 P0 a.s. as n → 0.

Ray Bai (University of Florida) MBSP March 20, 2018 24 / 48

slide-25
SLIDE 25

Ultrahigh-Dimensional Case

We have shown that the MBSP model achieves posterior consistency under mild conditions if pn = o(n). What if pn > n and pn ≥ O(n)? It turns out that with some additional regularity conditions on the model size and the design matrix, we can achieve posterior consistency in this ultrahigh-dimensional setting!

Ray Bai (University of Florida) MBSP March 20, 2018 25 / 48

slide-26
SLIDE 26

Regularity Conditions for the Ultrahigh-dimensional Case

(B1) pn > n for all n ≥ 1, and log(pn) = O(nd) for some 0 < d < 1. (B2) The rank of Xn is n. (B3) Let J denote a set of indices, where J ⊂ {1, ..., pn} such that |J | ≤ n. Let XJ denote the submatrix of X that contains the columns with indices in J . For any such set J , there exists a finite constant

  • c1(> 0) so that lim infn→∞ λmin
  • XT

J XJ

n

c1. (B4) There is finite constant c2(> 0) so that lim sup

n→∞ λmax

XT

n Xn

n

c2 < ∞. (B5) There exist constants d1 and d2 so that 0 < d1 < λmin(Σ) ≤ λmax(Σ) < d2 < ∞. (B6) The true model S∗ ⊂ {1, ..., pn} is nonempty for all n and s∗ = |S∗| = o(n/log(pn)).

Ray Bai (University of Florida) MBSP March 20, 2018 26 / 48

slide-27
SLIDE 27

Sufficient Conditions for Posterior Consistency When log p = o(n)

Theorem

Assume that conditions B1-B6 hold. Then the posterior of Bn under any prior πn(Bn) is strongly consistent. That is, for any ε > 0, Πn(Bn|Yn) = Πn(Bn : ||Bn − B0||F > ε|Yn) → 0 P0 a.s. as n → ∞ if Πn

  • Bn : ||Bn − B0||F <

∆ nρ/2

  • > exp(−kn)

for all 0 < ∆ < ε2

c1d1/2

1

48 c1/2

2

d2 and 0 < k < ε2 c1 32d2 − 3 ∆ c1/2

2

2d1/2

1

, where ρ > 0. This theorem applies to any prior on Bn. Provided the prior satisfies the above condition and log p = o(n), the posterior is strongly consistent.

Ray Bai (University of Florida) MBSP March 20, 2018 27 / 48

slide-28
SLIDE 28

The MBSP Model

Recall the MBSP model: Y|X, B, Σ ∼ MNn×q(XB, In, Σ), B|ξ1, ..., ξp, Σ ∼ MNp×q(O, τ diag(ξ1, ..., ξpn), Σ), ξi

ind

∼ π(ξi), i = 1, ..., pn, where τn > 0 and π(ξi) is a polynomial-tailed density of the form, π(ξi) = K(ξi)−a−1L(ξi), To achieve posterior consistency, we require mild conditions on the slowly varying component L(·), τn > 0, and the true unknown coefficients matrix B0.

Ray Bai (University of Florida) MBSP March 20, 2018 28 / 48

slide-29
SLIDE 29

Additional Assumptions under the MBSP Model

(i) For the slowly varying function L(t) in the priors for ξi, 1 ≤ i ≤ pn, limt→∞ L(t) ∈ (0, ∞). That is, there exists c0(> 0) such that L(t) ≥ c0 for all t ≥ t0, for some t0 which depends on both L and c0. (ii) There exists M > 0 so that supj,k |b0

jk| ≤ M < ∞ for all n, i.e. the

maximum entry in B0 is uniformly bounded above in absolute value. (iii) 0 < τn < 1 for all n, and τn = o

  • 1

pnnρ

  • for some ρ > 0.

Note that these are the same conditions as in the low-dimensional setting! The same rate for τn works for both low-dimensional and high-dimensional cases.

Ray Bai (University of Florida) MBSP March 20, 2018 29 / 48

slide-30
SLIDE 30

Posterior Consistency of MBSP (ultrahigh-dimensional case)

Theorem

Suppose that we have the MBSP model with polynomial-tailed priors for ξ1, ..., ξp. Provided that Assumptions (B1)-(B6) and (i)-(iii) hold, our model achieves strong posterior consistency. That is, for any ε > 0, Πn(Bn : ||Bn − B0||F > ε|Yn) → 0 P0 a.s. as n → 0.

Ray Bai (University of Florida) MBSP March 20, 2018 30 / 48

slide-31
SLIDE 31

Three Parameter Beta Normal (TPBN) Family

A random variable y said to follow the three parameter beta density, denoted as TPB(u, a, τ), if π(y) = Γ(u + a) Γ(u)Γ(a)τaya−1(1 − y)u−1 {1 − (1 − τ)y}−(u+a) . In univariate regression, a global-local shrinkage prior of the form βi|τ, ξi

ind

∼ N(0, τξi), i = 1, ..., n, π(ξi)

ind

Γ(u+a) Γ(u)Γ(a)ξu−1 i

(1 + ξi)−(u+a), i = 1, ..., n, may therefore be represented alternatively as βi|νi

ind

∼ N(0, ν−1

i

− 1), νi

ind

∼ TPB(u, a, τ).

Ray Bai (University of Florida) MBSP March 20, 2018 31 / 48

slide-32
SLIDE 32

Three Parameter Beta Normal (TPBN) Family

After integrating out νi in βi|νi

ind

∼ N(0, ν−1

i

− 1), νi

ind

∼ TPB(u, a, τ), the marginal prior for βi is said to belong to the three parameter beta normal (TPBN) family. Special cases of the TPBN family include: the horseshoe prior (u = 0.5, a = 0.5), the Strawderman-Berger prior (u = 1, a = 0.5), the normal-exponential-gamma (NEG) prior (u = 1, a > 0).

Ray Bai (University of Florida) MBSP March 20, 2018 32 / 48

slide-33
SLIDE 33

Three Parameter Beta Normal (TPBN) Model

By Proposition 1 of Armagan et al. (2011), the TPBN prior can also be written as a hierarchical mixture of two Gamma distributions, βi|ψi ∼ N(0, ψi), ψi|ζi ∼ G(u, ζi), ζi ∼ G(a, τ), where ψi = ξiτ. Using the TPBN family as our chosen prior and placing a conjugate prior

  • n Σ, we can construct a specific variant of the MBSP model which we

call the MBSP-TPBN model.

Ray Bai (University of Florida) MBSP March 20, 2018 33 / 48

slide-34
SLIDE 34

MBSP-TPBN Model

Reparametrizing ψi = τξi, i = 1, ..., p, we have: Y|X, B, Σ ∼ MNn×q(XB, In, Σ), B|ψ1, ..., ψp, Σ ∼ MNp×q(O, diag(ψ1, ..., ψp), Σ), ψi|ζi

ind

∼ G(u, ζi), i = 1, ..., p, ζi

i.i.d.

∼ G(a, τ), i = 1, ..., p, Σ ∼IW(d, kIq), The MBSP-TPBN model admits a Gibbs sampler.

Ray Bai (University of Florida) MBSP March 20, 2018 34 / 48

slide-35
SLIDE 35

Variable Selection

Although the MBSP model and the MBSP-TPBN model produce robust estimates for B, they do not produce exact zeros. In order to use the MBSP model for variable selection, we recommend looking at the 95% credible intervals for each entry bij in row i and column j. If the credible intervals for every single entry in row i, 1 ≤ i ≤ p, contain zero, then we classify predictor i as an irrelevant predictor. If at least one credible interval in row i, 1 ≤ i ≤ p does not contain zero, then we classify i as an active predictor.

Ray Bai (University of Florida) MBSP March 20, 2018 35 / 48

slide-36
SLIDE 36

Simulation Study

For our simulation study, we implement the MBSP-TPBN model with the horseshoe prior (a = u = 0.5), one of the most popular polynomial priors. We also set: τ =

1 p√n log n

d = 3 k = variance of residuals, Y − XB(0), where B(0) is the initial guess in the Gibbs sampler (taken as a ridge estimator).

Ray Bai (University of Florida) MBSP March 20, 2018 36 / 48

slide-37
SLIDE 37

Simulation Study

Our primary interest is in the p > n case. We consider three different simulation settings with varying levels of sparsity: Experiment 1 (p > n): n = 50, p = 200, q = 5. 20 of the predictors are randomly picked as active (sparse model). Experiment 2 (p > n): n = 60, p = 100, q = 6. 40 of the predictors are randomly picked as active (dense model). Experiment 3 (p ≫ n): n = 100, p = 500, q = 3. 10 of the predictors are randomly picked as active (ultra-sparse model).

Ray Bai (University of Florida) MBSP March 20, 2018 37 / 48

slide-38
SLIDE 38

Simulation Study Metrics

As our point estimate for B, we take the posterior median B = ( Bij)p×q. We also perform variable selection by inspecting the 95% credible intervals. We compute the following metrics, averaged across 100 replications: MSEest = 100 × || B − B||2

F /(pq),

MSEpred = 100 × ||X B − XB||2

F /(nq),

FDR = FP / (TP + FP), FNR = FN / (TN + FN), MP = (FP + FN)/(pq), where FP, TP, FN, and TN denote the number of false positives, true positives, false negatives, and true negatives respectively.

Ray Bai (University of Florida) MBSP March 20, 2018 38 / 48

slide-39
SLIDE 39

Simulation Study

Experiment 1: n = 50, p = 200, q = 5. 20 active predictors Method MSEest MSEpred FDR FNR MP MBSP 1.36 117.52 0.0117 0.0013 MBGL-SS 57.25 694.81 0.858 0.02 0.619 LSGL 8.65 169.30 0.788 0.374 SRRR 17.46 161.70 0.698 0.307 Experiment 2: n = 60, p = 100, q = 6. 40 active predictors Method MSEest MSEpred FDR FNR MP MBSP 10.969 172.84 0.0249 0.0107 MBGL-SS 204.33 318.80 0.505 0.1265 0.415 LSGL 44.635 188.81 0.544 0.479 SRRR 242.67 193.64 0.594 0.587 Experiment 3: n = 100, p = 500, q = 3. 10 active predictors Method MSEest MSEpred FDR FNR MP MBSP 0.185 64.14 0.048 0.0011 MBGL-SS 1.327 155.51 0.483 0.0005 0.092 LSGL 0.2305 72.894 0.849 0.117 SRRR 0.9841 49.428 0.688 0.104

Table: Simulation results for MBSP-TPBN, compared with thee other methods, averaged

across 100 replications.

Ray Bai (University of Florida) MBSP March 20, 2018 39 / 48

slide-40
SLIDE 40

Yeast Cell Cycle Data Analysis

Transcription factors (TFs) are sequence-specific DNA binding proteins which regulate the transcription of genes from DNA to mRNA by binding specific DNA sequences. We want to know which TFs are significant. In this yeast cell cycle data set (first studied by Chun and Keles (2010)): mRNA levels are measured at 18 time points seven minutes apart (every 7 minutes for a duration of 119 minutes). The 542 × 18 response matrix Y consists of 542 cell-cycle-regulated genes from an α factor arrested method, with columns corresponding to the mRNA levels at the 18 distinct time points. The 542 × 106 design matrix X consists of the binding information of a total of 106 TFs. We fit the MBSP model to this data set. We assess its predictive performance using 5-fold cross validation and perform variable selection from the 106 TFs.

Ray Bai (University of Florida) MBSP March 20, 2018 40 / 48

slide-41
SLIDE 41

Yeast Cell Cycle Data Analysis

Method Number of Proteins Selected MSPE MBSP 10 18.491 MBGL-SS 7 20.093 LSGL 4 22.819 SRRR 44 18.204

Table: Results for analysis of the yeast cell cycle data set. The MSPE has been scaled by a

factor of 100. In particular, all four models selected the three TFs, ACE2, SWI5, and SWI6 as significant.

The SRRR method has the lowest MSPE but it recovers a non- parsimonious model. In contrast, MBSP has good predictive performance and recovers a parsimonious model.

Ray Bai (University of Florida) MBSP March 20, 2018 41 / 48

slide-42
SLIDE 42

Yeast Cell Cycle Data Analysis

20 40 60 80 100 120 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

ACE2

20 40 60 80 100 120 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

HIR1

20 40 60 80 100 120 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

NDD1

20 40 60 80 100 120 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

SWI6

Figure: Plots of the estimates and 95% credible bands for four of the 10 TFs that were deemed

as significant by the MBSP-TPBN model. The x-axis indicates time (minutes) and the y-axis indicates the estimated coefficients.

Ray Bai (University of Florida) MBSP March 20, 2018 42 / 48

slide-43
SLIDE 43

Summary of MBSP Model

We have introduced a new Bayesian approach known as the Multivariate Bayesian model with Shrinkage Priors (MBSP) for the multivariate linear regression model, Y = XB + E. Our model produces a row-sparse estimate of the p × q matrix, B, allowing for sparse estimation and variable selection from the p variables. Our model can consistently estimate B even when p ≫ n and p grows at nearly exponential rate with n (i.e. p = O(end), 0 < d < 1.) A wide variety of polynomial-tailed shrinkage priors may be used, so

  • ur model and our theoretical results are quite general.

We illustrated practical application of our model with the three parameter beta normal family (MBSP-TPBN), using the horseshoe prior as a special case.

Ray Bai (University of Florida) MBSP March 20, 2018 43 / 48

slide-44
SLIDE 44

Future Work

Open problems: Theoretical investigation of MBSP (and Bayesian multivariate regression models in general) when q → ∞ and when Σ is treated as unknown. Moving beyond consistency, deriving a particular contraction rate of the MBSP’s posterior around B0. Applying polynomial-tailed priors to reduced rank regression and partial least squares regression.

Ray Bai (University of Florida) MBSP March 20, 2018 44 / 48

slide-45
SLIDE 45

Pre-print of Paper

A pre-print of the paper for this presentation is available at: https://arxiv.org/abs/1711.07635 Accepted pending minor revision at Journal of Multivariate Analysis.

Ray Bai (University of Florida) MBSP March 20, 2018 45 / 48

slide-46
SLIDE 46

References

Armagan, A., Clyde, M., and Dunson, D.B. (2011) “Generalized Beta Mixtures of Gaussians.” Advances in Neural Information Processing Systems 24, 523-531. Armagan, A., Dunson, D.B., Lee, J., Bajwa, W., and Strawn, N. (2013) “Posterior Consistency in Linear Models Under Shrinkage Priors.” Biometrika, 100(4): 1011-1018. Brown, P.J., Vannucci, M., and Fearn, T. (1998) “Multivariate Bayesian Variable Selection and Prediction.” Journal of the Royal Statistical Society: Series B, 60(3): 627-641. Carvalho, C.M., Polson, N.G., and Scott, J.G. (2010) “The Horseshoe Estimator for Sparse Signals.” Biometrika, 97(2):465-480.

Ray Bai (University of Florida) MBSP March 20, 2018 46 / 48

slide-47
SLIDE 47

References

Chen, L. and Huang, J.Z. (2012) “Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection.” Journal of the American Statistical Association, 107(500): 1533-1545. Li, Y., Nan, B., and Zhu, J. (2015) “Multivariate Sparse Group Lasso for the Multivariate Multiple Linear Regression with an Arbitrary Group Structure.” Biometrics, 71(2): 354-363. Liquet, B., Mengersen, K., Pettitt, A.N., and Sutton, M. (2017) “Bayesian Variable Selection Regression of Multivariate Responses for Group Data.” Bayesian Analysis 12(4): 1039-1067. Tang, X., Xu, X., Ghosh, M., and Ghosh, P. (2017) “Bayesian Variable Selection and Estimation Based on Global-Local Shrinkage Priors.” Sankhya A.

Ray Bai (University of Florida) MBSP March 20, 2018 47 / 48

slide-48
SLIDE 48

Questions?

Ray Bai (University of Florida) MBSP March 20, 2018 48 / 48