[PPT] - Joint variable and rank selection for parsimonious estimation of PowerPoint Presentation

SLIDE 1

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

Joint variable and rank selection for parsimonious estimation of high dimensional matrices

Florentina Bunea Department of Statistical Science Cornell University High-dimensional Problems in Statistics Workshop ETH, September 2011

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 2

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

1 Framework and motivation 2

Joint Rank and Row Selection JRRS Methods The construction of the one-step JRRS estimator Row and rank sparsity oracle inequalities via one-step JRRS One-step JRRS to select the best estimator from a finite list

3 Two-step JRRS estimators

Rank Constrained Group Lasso RCGL Adaptive RCGL for joint row and rank selection Row and rank sparsity oracle inequalities via two-step JRRS

4 Numerical performance and examples 5 Summary

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 3

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

A rank and row sparse model

Model: Y = XA + E; E noise matrix. Data: m × n matrix Y and m × p matrix X. Target: p × n matrix A ← → pn unknown parameters Rank of A is r ≤ n ∧ p. Nbr of non-zero rows of A is |J| ≤ p. Row and Rank Sparse Target ← → r(|J| + n − r) free param. Full rank + all rows + large n and p = Hopeless, if m small. Low rank + Small |J| = HOPE, if m small. Estimate A under joint rank and row constraints.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 4

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

Why rank and row sparse Y = XA + E ?

Multivariate response regression

Measure n response variables for m subjects: Yi ∈ Rn, 1 ≤ i ≤ m. Measure p predictor variables for m subjects: Xi ∈ Rp, 1 ≤ i ≤ m. No (rank / row ) constraints on A ⇐ ⇒ n separate univ. Zero rows in A ⇐ ⇒ Not all predictors in the model. Low rank of A ⇐ ⇒ Only few orthogonal scores relevant.

Goal: Estimation tailored to row and rank sparsity Use only a subset of the predictors to construct few scores, with high predictive power, under JOINT rank and row restrictions

n A.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 5

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

Why row and rank sparse Y = XA + E ? Contd.

Supervised row and rank sparse PCA.
Provides framework for row and rank sparse PCA and CCA.
Building block in functional data analysis (with predictors).

Y = matrix of discretized trajectories for n subjects; X = matrix of basis functions evaluated at discrete data points + possibly other predictors of interest.

Building block in multiple time series analysis.

(Macro-economics and forecasting) Y = matrix of n time series observed over m time periods (n types of interest rates) X = Y in the past + other predictive time series (other potentially connected macro-economic factors).

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 6

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

A historical perspective on sparse Y = XA + E

Rank Sparse Models

Reduced-Rank Regression: Y = XA + E, rank (A) = k = known.

Asymptotic results m → ∞: Anderson (1951, 1999, 2002); Rao (1979); Reinsel and Velu (1998); Izenman (1975; 2008).

Low rank approximations: Y = XA + E, rank (A) = r = unknown.

Adaptive estimation + Finite sample theoretical analysis, valid for any m, n, p and any r. Rank Selection Criterion (RSC): Bunea, She and Wegkamp (2011). Nuclear Norm Penalized (NNP) estimators: Cand` es and Plan; Tao (2009+), Rhode and Tsybakov (2011), Negahban and Wainwright (2011); Koltchinskii, Lounici, and Tsybakov (2011).

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 7

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

A historical perspective on sparse Y = XA + E contd.

Row-Sparse Models

Predictor Xj not in the model ⇐

⇒ The j-th row of A is zero.

Individual variable selection in multivariate response regression
Group selection in univariate response regression.

Popular method: The Group Lasso. Yuan and Lin ( 2006); Lounici, Pontil, Tsybakov and van der Geer (2011).

No rank and row sparse models; no adaptive methods tailored to both.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 8

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

Joint rank and row selection: JRRS

Will develop new criteria, for joint rank and predictor selection.
r ≤ n ∧ |J|, rank(X) = q ≤ m ∧ p; |J| ≤ p; r and J unknown.
Optimal risk rates achievable adaptively by

the G-Lasso, RSC/NNP and (to show) JRRS. G-Lasso: |J|n, in row-sparse models RSC or NNP: (p + n)r, in rank-sparse models JRRS: (|J| + n)r, in rank and row-sparse models

JRRS rates never worse and typically much better.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 9

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary The construction of the one-step JRRS estimator Row and rank sparsity oracle inequalities via one-step JRRS One-step JRRS to select the best estimator from a finite list

A penalized least squares estimator

Y is a m × n matrix;

X is a m × p matrix.

M2

F is the sum of the squared entries of M ∈ Mp×n.

Candidate model B ∈ Mp×n has number of parameters

(n + |J(B)| − rank(B))rank(B) ≤ (n + |J(B)|)rank(B). The one-step JRRS estimator

A

= arg min

B ∈ Mp×n

{Y − XB2

F + cσ2(2n + |J(B)|)rank(B)}.

Generalizes to multivariate response models

the AIC/Cp-type criteria developed for univariate response.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 10

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary The construction of the one-step JRRS estimator Row and rank sparsity oracle inequalities via one-step JRRS One-step JRRS to select the best estimator from a finite list

More on the one-step JRRS penalty

B ∈ Mp×n with J(B) non-zero rows.
JRRS penalty

pen(B) ∝ σ2(n + |J(B)|)rank(B)

B ∈ Mp×n ( ignoring non-zero rows), rank(X) = q.
RSC penalty

pen(B) ∝ σ2(n + q)rank(B)

Squared ”error level” in full model = Ed2

1(PE) ≈ σ2(n + q),

E with iid sub-Gaussian entries, P = X(X ′X)−X ′.

JRRS generalizes RSC to allow for variable selection.
To reduce rank and select variables work with:

Ed2

1(PJ(B)E) ≈ σ2(n + |J(B)|).

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 11

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary The construction of the one-step JRRS estimator Row and rank sparsity oracle inequalities via one-step JRRS One-step JRRS to select the best estimator from a finite list

Oracle-type bounds for the risk of the one-step JRRS

rank(A) = r,

non-zero rows of A with indices in J(A) = J. Adaptation to Row and Rank Sparsity via one-step JRRS For all A and X E

XA − X

A2

inf

B

XA − XB2 + σ2(n + |J(B)|)r(B)
σ2{n + |J|}r.
RHS = the best bias-variance trade-off across B.

A is adaptive: it mimics the behavior of an optimal estimator computed knowing r and J. Minimax rate, under suitable conditions.

Bound valid for any m, n, p.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 12

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary The construction of the one-step JRRS estimator Row and rank sparsity oracle inequalities via one-step JRRS One-step JRRS to select the best estimator from a finite list

Select the best from a finite list

If p > 20, JRRS estimation over all B becomes

computationally intractable

B = {B1, . . . , BL} = Finite (large) collection of (random)

matrices with different sparsity patterns; may depend on data X and Y . Optimal selection from a finite list via JRRS For all A and X E

XA − X

A2

inf

1≤j≤L

XA − XBj2 + σ2(n + J(Bj))r(Bj)
.
A

= arg min

B ∈ B

{Y − XB2

F + cσ2(2n + |J(B)|)rank(B)}.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 13

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary Rank Constrained Group Lasso RCGL Adaptive RCGL for joint row and rank selection Row and rank sparsity oracle inequalities via two-step JRRS

Rank Constrained Group Lasso: main building block

One-step JRRS penalty pen(B) ∝ (n + |J(B)|)rank(B).

J(B) forces complete enumeration; for large p that’s a problem!

Idea: use convex relation B2,1 = p

j=1 bj2.

Set λk ∝ σ
kd2

1(X), for each k.

Bk = arg min

rank(B)≤k

Y − XB2

F + λkB2,1

.

Bk is a Rank-Constrained G-Lasso. (RCGL) Other ”group” penalties possible.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 14

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary Rank Constrained Group Lasso RCGL Adaptive RCGL for joint row and rank selection Row and rank sparsity oracle inequalities via two-step JRRS

Bk = arg minrank(B)≤k

Y − XB2

F + λkB2,1

.
For k = n ∧ p, estimator

Bk is G-Lasso.

For λ = 0, estimator

Bk is a reduced-rank estimator.

Otherwise,

Bk is a synthesis of the two; new algorithm needed. Efficient algorithm Bunea, She and Wegkamp (2011).

Works in high dimensions.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 15

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary Rank Constrained Group Lasso RCGL Adaptive RCGL for joint row and rank selection Row and rank sparsity oracle inequalities via two-step JRRS

Two-step JRRS: Method 1

Method 1 Step 1. Use the Rank Selection Criterion RSC to estimate consistently r by r. Step 2. Compute the Rank Constrained G-Lasso estimator

Bk with k =

r to obtain the final estimator B = B

r.

Major Practical Advantage: Easy tuning, backed up by theory.

For Step 1: Same tuning parameter of RSC gives best MSE and

correct rank. Can use CV safely; other alternatives exist.

For Step 2: We want best MSE, CV safe.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 16

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary Rank Constrained Group Lasso RCGL Adaptive RCGL for joint row and rank selection Row and rank sparsity oracle inequalities via two-step JRRS

Two-step JRRS: Method 2

Method 2 Step 1. Pre-specify a grid of values Λ for λ. Use RCGL to construct B = { Bk,λ : k ∈ {1, . . . , q}, λ ∈ Λ}. Step 2. Compute

B = arg min

B∈B

{Y − XB2

F + pen(B)},

with pen(B) ∝ σ2(n + |J(B)|)rank(B).

Requires a 2-D grid search: more computationally involved than Met. 1.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 17

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary Rank Constrained Group Lasso RCGL Adaptive RCGL for joint row and rank selection Row and rank sparsity oracle inequalities via two-step JRRS

Oracle-type bounds for the risk of the two-step JRRS

Method 1 (RSC + RCGL) →

B; Method 2 (RCGL + AIC-M) → B Adaptation to Row and Rank Sparsity via two-step JRRS For all A and for X satisfying Assumption 1 E

XA − X

B2

inf

B

XA − XB2 + σ2(n + J(B))r(B)
σ2{n + J(A)}r(A).

If, in addition, dr(XA) > 2 √ 2σ(√n + √q), same inequality holds for B.

RHS = the best bias-variance trade-off across all matrices B.

B, B are adaptive: mimic the behavior of an optimal estimator computed knowing r(A) and J(A).

Bound valid for any m, n, p; computationally efficient.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 18

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary Rank Constrained Group Lasso RCGL Adaptive RCGL for joint row and rank selection Row and rank sparsity oracle inequalities via two-step JRRS

Mild conditions on the design matrix

Assumption 1 There exists a set J ⊂ {1, . . . , p} and a number δJ > 0 such that 1 mXB2

F ≥ δJ

j∈J

bj2

2,

for all B = [b1 · · · bp]T ∈ Rp×n

Only a sub-matrix of X ′X has a non-zero smallest eigen-value.

Mild condition.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 19

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

Large p - small m numerical performance comparison

m = 30, |J| = 15, p = 100, n = 10, r = 2, σ2 = 1.
Performance comparison between:

rank and row reduction via RSC→RCGL and G-LASSO→RSC,

nly row via G-LASSO, and only rank via RSC.
All optimally tuned on a very large independent set.

Method MSE RSC→RCGL 363 G-LASSO→RSC 402 G-LASSO 511 RSC 1905

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 20

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

Large m - small p numerical performance comparison

m = 100, |J| = 15, p = 25, n = 25, r = 5, σ2 = 1.
Performance comparison between:

rank and row reduction via RSC→RCGL, G-LASSO→RSC,

nly row via G-LASSO, and only rank via RSC
All optimally tuned on a very large independent set.

Method MSE RSC→RCGL 8.1 G-LASSO→RSC 8.1 RSC 11.5 G-LASSO 17.7

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 21

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

A study of the effect of HIV-infection

n human cognitive abilities
HIV-Neuroimaging laboratory at Brown University, PI R. Cohen.
m = 62 HIV+ patients, also infected with Hepatitis C,

and with a history of drug abuse

n = 13 neuro-cognitive indices (NCIs) from five domains:

attention/working memory, speed of information processing psychomotor abilities, executive function, and learning and memory.

p = 234 predictors (a) clinical and demographic predictors and (b)

brain volumetric and diffusion tensor imaging (DTI) derived measures of several white-matter regions of interest, such as fractional anisotropy, mean diffusivity, axial diffusivity, and radial diffusivity, along with all volumetrics × DTI interactions.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 22

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

RSC and JRRS: two rank-1 models

Both methods: One new predictive score S.
Left = RSC;

MSE = 193; S = lin. comb. of p = 234 predictors.

Right = JRRS; MSE = 138; S = lin. comb. of |J| = 10 predictors.
−40

−20 20 40 60 80 Original predictors Weights

HIV_stage hcv_current age Education kmsk_alc kmsk_cocopi fa_cc1 fa_cc234 fa_cc5 md_cc1 md_cc234 md_cc5 fa_ic1 fa_ic2 fa_ic3 fa_ic4 md_ic1 md_ic2 md_ic3 md_ic4 whitematter cortex thalamus caudate putamen pallidum hippocampus amygdala cc_splenium cc_body cc_genu HIV_stage.fa_cc1 HIV_stage.fa_cc234 HIV_stage.fa_cc5 HIV_stage.md_cc1 HIV_stage.md_cc234 HIV_stage.md_cc5 HIV_stage.fa_ic1 HIV_stage.fa_ic2 HIV_stage.fa_ic3 HIV_stage.fa_ic4 HIV_stage.md_ic1 HIV_stage.md_ic2 HIV_stage.md_ic3 HIV_stage.md_ic4 fa_cc1.fa_cc1 fa_cc234.fa_cc234 fa_cc5.fa_cc5 md_cc1.md_cc1 md_cc234.md_cc234 md_cc5.md_cc5 fa_ic1.fa_ic1 fa_ic2.fa_ic2 fa_ic3.fa_ic3 fa_ic4.fa_ic4 md_ic1.md_ic1 md_ic2.md_ic2 md_ic3.md_ic3 md_ic4.md_ic4 HIV_stage.whitematter HIV_stage.cortex HIV_stage.thalamus HIV_stage.caudate HIV_stage.putamen HIV_stage.pallidum HIV_stage.hippocampus HIV_stage.amygdala HIV_stage.cc_splenium HIV_stage.cc_body HIV_stage.cc_genu fa_cc1.whitematter fa_cc1.cortex fa_cc1.thalamus fa_cc1.caudate fa_cc1.putamen fa_cc1.pallidum fa_cc1.hippocampus fa_cc1.amygdala fa_cc1.cc_splenium fa_cc1.cc_body fa_cc1.cc_genu fa_cc234.whitematter fa_cc234.cortex fa_cc234.thalamus fa_cc234.caudate fa_cc234.putamen fa_cc234.pallidum fa_cc234.hippocampus fa_cc234.amygdala fa_cc234.cc_splenium fa_cc234.cc_body fa_cc234.cc_genu fa_cc5.whitematter fa_cc5.cortex fa_cc5.thalamus fa_cc5.caudate fa_cc5.putamen fa_cc5.pallidum fa_cc5.hippocampus fa_cc5.amygdala fa_cc5.cc_splenium fa_cc5.cc_body fa_cc5.cc_genu md_cc1.whitematter md_cc1.cortex md_cc1.thalamus md_cc1.caudate md_cc1.putamen md_cc1.pallidum md_cc1.hippocampus md_cc1.amygdala md_cc1.cc_splenium md_cc1.cc_body md_cc1.cc_genu md_cc234.whitematter md_cc234.cortex md_cc234.thalamus md_cc234.caudate md_cc234.putamen md_cc234.pallidum md_cc234.hippocampus md_cc234.amygdala md_cc234.cc_splenium md_cc234.cc_body md_cc234.cc_genu md_cc5.whitematter md_cc5.cortex md_cc5.thalamus md_cc5.caudate md_cc5.putamen md_cc5.pallidum md_cc5.hippocampus md_cc5.amygdala md_cc5.cc_splenium md_cc5.cc_body md_cc5.cc_genu fa_ic1.whitematter fa_ic1.cortex fa_ic1.thalamus fa_ic1.caudate fa_ic1.putamen fa_ic1.pallidum fa_ic1.hippocampus fa_ic1.amygdala fa_ic1.cc_splenium fa_ic1.cc_body fa_ic1.cc_genu fa_ic2.whitematter fa_ic2.cortex fa_ic2.thalamus fa_ic2.caudate fa_ic2.putamen fa_ic2.pallidum fa_ic2.hippocampus fa_ic2.amygdala fa_ic2.cc_splenium fa_ic2.cc_body fa_ic2.cc_genu fa_ic3.whitematter fa_ic3.cortex fa_ic3.thalamus fa_ic3.caudate fa_ic3.putamen fa_ic3.pallidum fa_ic3.hippocampus fa_ic3.amygdala fa_ic3.cc_splenium fa_ic3.cc_body fa_ic3.cc_genu fa_ic4.whitematter fa_ic4.cortex fa_ic4.thalamus fa_ic4.caudate fa_ic4.putamen fa_ic4.pallidum fa_ic4.hippocampus fa_ic4.amygdala fa_ic4.cc_splenium fa_ic4.cc_body fa_ic4.cc_genu md_ic1.whitematter md_ic1.cortex md_ic1.thalamus md_ic1.caudate md_ic1.putamen md_ic1.pallidum md_ic1.hippocampus md_ic1.amygdala md_ic1.cc_splenium md_ic1.cc_body md_ic1.cc_genu md_ic2.whitematter md_ic2.cortex md_ic2.thalamus md_ic2.caudate md_ic2.putamen md_ic2.pallidum md_ic2.hippocampus md_ic2.amygdala md_ic2.cc_splenium md_ic2.cc_body md_ic2.cc_genu md_ic3.whitematter md_ic3.cortex md_ic3.thalamus md_ic3.caudate md_ic3.putamen md_ic3.pallidum md_ic3.hippocampus md_ic3.amygdala md_ic3.cc_splenium md_ic3.cc_body md_ic3.cc_genu md_ic4.whitematter md_ic4.cortex md_ic4.thalamus md_ic4.caudate md_ic4.putamen md_ic4.pallidum md_ic4.hippocampus md_ic4.amygdala md_ic4.cc_splenium md_ic4.cc_body md_ic4.cc_genu whitematter.whitematter cortex.cortex thalamus.thalamus caudate.caudate putamen.putamen pallidum.pallidum hippocampus.hippocampus amygdala.amygdala cc_splenium.cc_splenium cc_body.cc_body cc_genu.cc_genu

−10

−5 5 Original predictors Weights

HIV_stage hcv_current age Education kmsk_alc kmsk_cocopi fa_cc1 fa_cc234 fa_cc5 md_cc1 md_cc234 md_cc5 fa_ic1 fa_ic2 fa_ic3 fa_ic4 md_ic1 md_ic2 md_ic3 md_ic4 whitematter cortex thalamus caudate putamen pallidum hippocampus amygdala cc_splenium cc_body cc_genu HIV_stage.fa_cc1 HIV_stage.fa_cc234 HIV_stage.fa_cc5 HIV_stage.md_cc1 HIV_stage.md_cc234 HIV_stage.md_cc5 HIV_stage.fa_ic1 HIV_stage.fa_ic2 HIV_stage.fa_ic3 HIV_stage.fa_ic4 HIV_stage.md_ic1 HIV_stage.md_ic2 HIV_stage.md_ic3 HIV_stage.md_ic4 fa_cc1.fa_cc1 fa_cc234.fa_cc234 fa_cc5.fa_cc5 md_cc1.md_cc1 md_cc234.md_cc234 md_cc5.md_cc5 fa_ic1.fa_ic1 fa_ic2.fa_ic2 fa_ic3.fa_ic3 fa_ic4.fa_ic4 md_ic1.md_ic1 md_ic2.md_ic2 md_ic3.md_ic3 md_ic4.md_ic4 HIV_stage.whitematter HIV_stage.cortex HIV_stage.thalamus HIV_stage.caudate HIV_stage.putamen HIV_stage.pallidum HIV_stage.hippocampus HIV_stage.amygdala HIV_stage.cc_splenium HIV_stage.cc_body HIV_stage.cc_genu fa_cc1.whitematter fa_cc1.cortex fa_cc1.thalamus fa_cc1.caudate fa_cc1.putamen fa_cc1.pallidum fa_cc1.hippocampus fa_cc1.amygdala fa_cc1.cc_splenium fa_cc1.cc_body fa_cc1.cc_genu fa_cc234.whitematter fa_cc234.cortex fa_cc234.thalamus fa_cc234.caudate fa_cc234.putamen fa_cc234.pallidum fa_cc234.hippocampus fa_cc234.amygdala fa_cc234.cc_splenium fa_cc234.cc_body fa_cc234.cc_genu fa_cc5.whitematter fa_cc5.cortex fa_cc5.thalamus fa_cc5.caudate fa_cc5.putamen fa_cc5.pallidum fa_cc5.hippocampus fa_cc5.amygdala fa_cc5.cc_splenium fa_cc5.cc_body fa_cc5.cc_genu md_cc1.whitematter md_cc1.cortex md_cc1.thalamus md_cc1.caudate md_cc1.putamen md_cc1.pallidum md_cc1.hippocampus md_cc1.amygdala md_cc1.cc_splenium md_cc1.cc_body md_cc1.cc_genu md_cc234.whitematter md_cc234.cortex md_cc234.thalamus md_cc234.caudate md_cc234.putamen md_cc234.pallidum md_cc234.hippocampus md_cc234.amygdala md_cc234.cc_splenium md_cc234.cc_body md_cc234.cc_genu md_cc5.whitematter md_cc5.cortex md_cc5.thalamus md_cc5.caudate md_cc5.putamen md_cc5.pallidum md_cc5.hippocampus md_cc5.amygdala md_cc5.cc_splenium md_cc5.cc_body md_cc5.cc_genu fa_ic1.whitematter fa_ic1.cortex fa_ic1.thalamus fa_ic1.caudate fa_ic1.putamen fa_ic1.pallidum fa_ic1.hippocampus fa_ic1.amygdala fa_ic1.cc_splenium fa_ic1.cc_body fa_ic1.cc_genu fa_ic2.whitematter fa_ic2.cortex fa_ic2.thalamus fa_ic2.caudate fa_ic2.putamen fa_ic2.pallidum fa_ic2.hippocampus fa_ic2.amygdala fa_ic2.cc_splenium fa_ic2.cc_body fa_ic2.cc_genu fa_ic3.whitematter fa_ic3.cortex fa_ic3.thalamus fa_ic3.caudate fa_ic3.putamen fa_ic3.pallidum fa_ic3.hippocampus fa_ic3.amygdala fa_ic3.cc_splenium fa_ic3.cc_body fa_ic3.cc_genu fa_ic4.whitematter fa_ic4.cortex fa_ic4.thalamus fa_ic4.caudate fa_ic4.putamen fa_ic4.pallidum fa_ic4.hippocampus fa_ic4.amygdala fa_ic4.cc_splenium fa_ic4.cc_body fa_ic4.cc_genu md_ic1.whitematter md_ic1.cortex md_ic1.thalamus md_ic1.caudate md_ic1.putamen md_ic1.pallidum md_ic1.hippocampus md_ic1.amygdala md_ic1.cc_splenium md_ic1.cc_body md_ic1.cc_genu md_ic2.whitematter md_ic2.cortex md_ic2.thalamus md_ic2.caudate md_ic2.putamen md_ic2.pallidum md_ic2.hippocampus md_ic2.amygdala md_ic2.cc_splenium md_ic2.cc_body md_ic2.cc_genu md_ic3.whitematter md_ic3.cortex md_ic3.thalamus md_ic3.caudate md_ic3.putamen md_ic3.pallidum md_ic3.hippocampus md_ic3.amygdala md_ic3.cc_splenium md_ic3.cc_body md_ic3.cc_genu md_ic4.whitematter md_ic4.cortex md_ic4.thalamus md_ic4.caudate md_ic4.putamen md_ic4.pallidum md_ic4.hippocampus md_ic4.amygdala md_ic4.cc_splenium md_ic4.cc_body md_ic4.cc_genu whitematter.whitematter cortex.cortex thalamus.thalamus caudate.caudate putamen.putamen pallidum.pallidum hippocampus.hippocampus amygdala.amygdala cc_splenium.cc_splenium cc_body.cc_body cc_genu.cc_genu

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 23

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

JRRS selected rank 1 and only 10 predictors.
Education is one of them, confirming past findings.
The fractional anisotropy at corpus callosum stands out among

the very many DTI-derived measures, in terms of predictive power.

New finding in the lab and first quantitative confirmation.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 24

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

Summary

Methods Adaptation Assumptions Restrictions to RR-sparsity

n X and/or A
n p

One-step JRRS (AIC-M) Yes None p ≤ 20 Two-step JRRS1 Restricted Eigenvalue; (RSC → RCGL ) Yes dr(XA) > ”noise level” None Two-step JRRS2 (RCGL→ AIC-M) Yes Restricted Eigenvalue None GL → RSC Yes Mutual coherence et al. None minj aj2 > noise level

RSC → RCGL easy to tune in practice; backed up by theory. Best !
RCGL→ AIC-M tuning requires search over a 2-D grid. Second best !
GL → RSC: (1) Most restrictive theoretical assumptions;

(2) Requires tuning for consistent group selection, open problem!

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 25

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

Summary: Our contribution

Jointly rank and row-sparse models and their estimation

1

Introduced jointly rank and row sparse models.

2

Offered new procedures tailored to the new class of models.

3

Showed that the one-step JRRS is a theoretically optimal adaptive procedure: Finite sample oracle inequalities for E|XA − X A2

F for all A and X.

4

Introduced computationally efficient two-step JRRS.

5

Two-step JRRS satisfy finite sample oracle inequalities under minimal conditions on X.

6

Guaranteed small EXA − X A2

F if A of low rank and few non-zero

rows. Analysis valid for all m, n, p, rank r and J. In particular, r

and |J| can grow with m and n.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of

SLIDE 26

Framework and motivation Joint Rank and Row Selection Methods Two-step JRRS estimators Numerical performance and examples Summary

Bibliography and acknowledgment

Talk based on

Florentina Bunea, Yiyuan She and Marten Wegkamp

Joint variable and rank selection for parsimonious estimation of high dimensional matrices ; Cornell University Technical Report, 2011.

Florentina Bunea, Yiyuan She and Marten Wegkamp

Optimal selection of reduced rank estimators of high-dimensional matrices; Annals of Statistics, Vol 39, 2011.

Research partially supported by NSF-DMS 1007444.

Florentina Bunea Department of Statistical Science Cornell University Joint variable and rank selection for parsimonious estimation of