[PPT] - Hypothesis Testing for High-Dimensional Regression: Nearly Optimal PowerPoint Presentation

SLIDE 1

Hypothesis Testing for High-Dimensional Regression: Nearly Optimal Sample Size

Adel Javanmard

Stanford University- UC Berkeley Based on joint work with

Andrea Montanari

January 2015

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 1 / 34

SLIDE 2

Outline

1

Problem definition

2

Debiasing approach

3

Hypothesis testing under nearly optimal sample size

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 2 / 34

SLIDE 3

Problem definition

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 3 / 34

SLIDE 4

Linear model

We focus on linear models:

Y = Xθ0 +W Y ∈ Rn (response), X ∈ Rn×p (design matrix), θ0 ∈ Rp (parameters)

Noise vector has independent entries with

E(Wi) = 0, E(W2

i ) = σ2 ,

E(|Wi|2+κ) < ∞, for some κ > 0.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 4 / 34

SLIDE 5

Problem

Confidence intervals: For each i ∈ {1,...,p}, θ i,θ i ∈ R such that

P

θ0,i ∈ [θ i,θ i]
≥ 1−α

We would like |θ i −θ i| as small as possible. Hypothesis testing:

H0,i : θ0,i = 0, HA,i : θ0,i = 0

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 5 / 34

SLIDE 6

LASSO

θ ≡ argmin

θ∈Rp

1 2ny−Xθ2

2 +λθ1

.

[Tibshirani 1996, Chen, Donoho 1996] Distribution of

θ?

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34

SLIDE 7

LASSO

θ ≡ argmin

θ∈Rp

1 2ny−Xθ2

2 +λθ1

.

[Tibshirani 1996, Chen, Donoho 1996] Distribution of

θ?

Debiasing approach: (LASSO is biased towards small ℓ1 norm.)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34

SLIDE 8

LASSO

θ ≡ argmin

θ∈Rp

1 2ny−Xθ2

2 +λθ1

.

[Tibshirani 1996, Chen, Donoho 1996] Distribution of

θ?

Debiasing approach: (LASSO is biased towards small ℓ1 norm.)

θ

debiasing

− − − − − − − − − → θ d

We characterize distribution of

θ d.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34

SLIDE 9

Debiasing approach

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 7 / 34

SLIDE 10

Classical setting (n ≫ p)

We know everything about the least-square estimator:

θ LS = 1

n

Σ−1XTY ,

where

Σ ≡ (XTX)/n is empirical covariance.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 8 / 34

SLIDE 11

Classical setting (n ≫ p)

We know everything about the least-square estimator:

θ LS = 1

n

Σ−1XTY ,

where

Σ ≡ (XTX)/n is empirical covariance.

Confidence intervals:

[θ i,θ i] = [ θ LS

i

−cα∆i, θ LS

i

+cα∆i], ∆i ≡ σ

(

Σ−1)ii n

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 8 / 34

SLIDE 12

High-dimensional setting (n < p)

θ LS = 1

n

Σ−1XTY

Problem in high dimension:

Σ is not invertible!

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 9 / 34

SLIDE 13

High-dimensional setting (n < p)

θ LS = 1

n

Σ−1XTY

Take your favorite M ∈ Rp×p:

θ ∗ = 1

nMXTY = 1 nMXTXθ0 + 1 nMXTW = θ0 +(M Σ−I)θ0

bias

+ 1 nMXTW

Gaussian error

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 9 / 34

SLIDE 14

Debiased estimator

θ ∗ = θ0 +(M

Σ−I)θ0

bias

+ 1 nMXTW

Gaussian error

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34

SLIDE 15

Debiased estimator

θ ∗ = θ0 +(M

Σ−I)θ0

bias

+ 1 nMXTW

Gaussian error

Let us (try to) subtract the bias

θ u =

θ ∗ −(M Σ−I) θ Lasso

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34

SLIDE 16

Debiased estimator

θ ∗ = θ0 +(M

Σ−I)θ0

bias

+ 1 nMXTW

Gaussian error

Let us (try to) subtract the bias

θ u =

θ ∗ −(M Σ−I) θ Lasso Debiased estimator (

θ = θ Lasso)

θ d ≡

θ + 1 nMXT(Y −X θ)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34

SLIDE 17

Debiased estimator: Choosing M?

θ d ≡

θ + 1 nMXT(y−X θ)

Low dimensional projection estimator (LDPE) Start with a linear estimator, debias by a nonlinear estimator M constructed via nodewise LASSO on X

[C-H. Zhang, S. S. Zhang]

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 11 / 34

SLIDE 18

Debiased estimator: Choosing M?

θ d ≡

θ + 1 nMXT(y−X θ)

Low dimensional projection estimator (LDPE) Start with a linear estimator, debias by a nonlinear estimator M constructed via nodewise LASSO on X

[C-H. Zhang, S. S. Zhang]

Approximate inverse of

Σ: nodewise LASSO on X

(under row-sparsity assumption on Σ−1)

[S. van de Geer, P . Bühlmann, Y. Ritov, R. Dezeure]

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 11 / 34

SLIDE 19

Debiased estimator: Choosing M?

Our approach: Optimizing two objectives (bias and variance of

θ d)

[A. Javanmard, A. Montanari]

√n( θ d −θ0) = √n(M Σ−I)(θ0 − θ)

bias↓

+Z Z|X ∼ N(0, σ2M ΣMT

noise

covariance



),
Σ = 1

nXXT

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34

SLIDE 20

Debiased estimator: Choosing M?

Our approach: Find M by solving an optimization problem:

[A. Javanmard, A. Montanari]

minimize

M

max

1≤i≤p(M

ΣMT)i,i

subject to

|M Σ−I|∞ ≤ ξ

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34

SLIDE 21

Debiased estimator: Choosing M?

Our approach: Find M by solving an optimization problem:

[A. Javanmard, A. Montanari]

minimize

mi

mT

i

Σmi

subject to

Σmi −ei∞ ≤ ξ

The optimization can be decoupled and solved in parallel.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34

SLIDE 22

Main theorems

Theorem [Javanmard, Montanari 2013] (Deterministic designs)

Let X be any deterministic design that satisfies compatibility condition for the set S = supp(θ0), (|S| ≤ s0), with constant φ0 . Further define the coherence parameter

µ∗ ≡ min

M∈Rp×p|M

Σ−I|∞ .

Let K ≡ maxi∈[p]

Σii. Then, letting λ = cσ

logp/n, we have

√n( θ d −θ0) = Z +∆, Z ∼ N(0,σ2M ΣMT) P

∆∞ ≥ 4cµ∗σs0

φ2

√logp

≤ 2p−c0 ,

c0 =

c2 32K −1

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 13 / 34

SLIDE 23

Main theorems

Theorem [Javanmard, Montanari 2013] (Deterministic designs)

Let X be any deterministic design that satisfies compatibility condition for the set S = supp(θ0), (|S| ≤ s0), with constant φ0 . Further define the coherence parameter

µ∗ ≡ min

M∈Rp×p|M

Σ−I|∞ .

Let K ≡ maxi∈[p]

Σii. Then, letting λ = cσ

logp/n, we have

√n( θ d −θ0) = Z +∆, Z ∼ N(0,σ2M ΣMT) P

∆∞ ≥ 4cµ∗σs0

φ2

√logp

≤ 2p−c0 ,

c0 =

c2 32K −1

Remark:

µ∗ ≤ 1 n max

i=j |Xei,Xej|.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 13 / 34

SLIDE 24

Main theorems

Theorem [Javanmard, Montanari 2013] (Random designs)

Let Σ be such that σmin(Σ) ≥ Cmin > 0 and σmax(Σ) ≤ Cmax < ∞ and

maxi∈[p] Σii ≤ 1. Assume XΣ−1 has independent subgaussian rows with mean

zero and subgaussian norm K. Letting λ = cσ

logp/n, we have

√n( θ d −θ0) = Z +∆, Z|X ∼ N(0,σ2M ΣMT), P

∆∞ ≥ (16cσ

Cmin )s0 logp √n

≤ 4e−c1n +4p−c2 ,

for some explicit constants c1 = C(K), c2 = C(c,K,Cmin,Cmax).

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 14 / 34

SLIDE 25

Main theorems

Theorem [Javanmard, Montanari 2013] (Random designs)

Let Σ be such that σmin(Σ) ≥ Cmin > 0 and σmax(Σ) ≤ Cmax < ∞ and

maxi∈[p] Σii ≤ 1. Assume XΣ−1 has independent subgaussian rows with mean

zero and subgaussian norm K. Letting λ = cσ

logp/n, we have

√n( θ d −θ0) = Z +∆, Z|X ∼ N(0,σ2M ΣMT), P

∆∞ ≥ (16cσ

Cmin )s0 logp √n

≤ 4e−c1n +4p−c2 ,

for some explicit constants c1 = C(K), c2 = C(c,K,Cmin,Cmax). Remark on sample size: If

n (s0 logp)2 → ∞ then ∆∞ = op(1).

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 14 / 34

SLIDE 26

Consequences

Confidence intervals for single parameters:

lim

n→∞P

θ0,i ∈ [θ i,θ i]
≥ 1−α

|θ i −θ i| ≤ (2+o(1))cα

σ2

n (Σ−1)ii

(n<p)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 15 / 34

SLIDE 27

Consequences

Confidence intervals for single parameters:

lim

n→∞P

θ0,i ∈ [θ i,θ i]
≥ 1−α

|θ i −θ i| ≤ (2+o(1))cα

σ2

n (Σ−1)ii

(n<p)

|θ i −θ i| ≤ 2cα

σ2

n ( Σ−1)ii

Least square (n>p)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 15 / 34

SLIDE 28

Consequences

Confidence intervals for single parameters:

lim

n→∞P

θ0,i ∈ [θ i,θ i]
≥ 1−α

|θ i −θ i| ≤ (2+o(1))cα

σ2

n (Σ−1)ii

(n<p)

|θ i −θ i| ≤ 2cα

σ2

n ( Σ−1)ii

Least square (n>p) Remark: No need for irrepresentability / θmin condition (common assumptions for support recovery)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 15 / 34

SLIDE 29

Consequences

Confidence intervals for single parameters:

lim

n→∞P

θ0,i ∈ [θ i,θ i]
≥ 1−α

|θ i −θ i| ≤ (2+o(1))cα

σ2

n (Σ−1)ii

(n<p) Hypothesis testing: minimax optimal statistical power

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 15 / 34

SLIDE 30

Framework

Hypothesis testing

H0,i : θ0,i = 0, HA,i : θ0,i = 0.

Two-sided p-values:

Pi = 2

1−Φ(|

θ d

i |

τ )

with Φ(·) cdf of standard normal.

Decision rule:

Ti,X(y) =

1

if Pi ≤ α (reject the null hypothesis H0,i),

therwise

(accept the null hypothesis).

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 16 / 34

SLIDE 31

Theorem [Javanmard, Montanari, 2013]

Consider designs with subgaussian rows and let S ≡ supp(θ0). Assume that s0 ≡ |S| = o(√n/logp). Then, for any fixed sequence of integers i = i(n), we have that for i /

∈ S lim

n→∞Pθ0(Ti,X(y) = 1) ≤ α .

Further, assuming that for all i ∈ S, |θ0,i| ≥ µ, we have

lim inf

p→∞

1 1−βi,n(α,µ)Pθ0(Ti,X(y) = 1) ≥ 1, 1−βi,n(α,µ) ≡ G

α,

√nµ σ

(Σ−1)ii
.

with G(α,u) given by ...

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 17 / 34

SLIDE 32

Theorem [Javanmard, Montanari, 2013]

Consider designs with subgaussian rows and let S ≡ supp(θ0). Assume that s0 ≡ |S| = o(√n/logp). Then, for any fixed sequence of integers i = i(n), we have that for i /

∈ S lim

n→∞Pθ0(Ti,X(y) = 1) ≤ α .

Further, assuming that for all i ∈ S, |θ0,i| ≥ µ, we have

lim inf

p→∞

1 1−βi,n(α,µ)Pθ0(Ti,X(y) = 1) ≥ 1, 1−βi,n(α,µ) ≡ G

α,

√nµ σ

(Σ−1)ii
.

with G(α,u) given by ... Minimax optimal power over the family of s0-sparse vectors θ0.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 17 / 34

SLIDE 33

Related work on bias-correction

Ridge projection and bias correction

[P . Bühlmann]

(Remaining) bias is not negligible. Conservative tests Low dimensional projection estimator (LDPE)

[C-H. Zhang, S. S. Zhang]

Initial projection based on nodewise LASSO on X. Bias correction via LASSO.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 18 / 34

SLIDE 34

Hypothesis testing under nearly optimal sample size

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 19 / 34

SLIDE 35

Smaller sample size

Estimation, prediction: n s0 logp.

[Candés, Tao 2007, Bickel et al. 2009]

Hypothesis testing, confidence intervals: n (s0 logp)2. [This talk] Bias corrected ridge regression [P

. Bühlmann]

LDPE [C-H. Zhang, S. S. Zhang] Desparsified LASSO [S. van de Geer et. al.]

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 20 / 34

SLIDE 36

Smaller sample size

Estimation, prediction: n s0 logp.

[Candés, Tao 2007, Bickel et al. 2009]

Hypothesis testing, confidence intervals: n (s0 logp)2. [This talk] Bias corrected ridge regression [P

. Bühlmann]

LDPE [C-H. Zhang, S. S. Zhang] Desparsified LASSO [S. van de Geer et. al.] Can we match the optimal sample size, n s0 logp ?

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 20 / 34

SLIDE 37

Theorem [Javanmard, Montanari 2013]

Consider designs with subgassian rows and assume n s0(logp)2. Then

limsup

p→∞

1 p−s0 ∑

i∈Sc

Pθ0(Ti,X(y) = 1) ≤ α .

Further, assuming that for all i ∈ S, |θ0,i| ≥ µ, we have

liminf

p→∞

1 1−β ∗

n (α,θ0)

1 s0 ∑

i∈S

Pθ0(Ti,X(y) = 1)

≥ 1,

where

1−β ∗

n (α,θ0) ≡ 1

s0 ∑

i∈S

G

α,

√n|θ0,i| σ

(Σ−1)ii
.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 21 / 34

SLIDE 38

Theorem [Javanmard, Montanari 2013]

Consider designs with subgassian rows and assume n s0(logp)2. Then

limsup

p→∞

1 p−s0 ∑

i∈Sc

Pθ0(Ti,X(y) = 1) ≤ α .

Further, assuming that for all i ∈ S, |θ0,i| ≥ µ, we have

liminf

p→∞

1 1−β ∗

n (α,θ0)

1 s0 ∑

i∈S

Pθ0(Ti,X(y) = 1)

≥ 1,

where

1−β ∗

n (α,θ0) ≡ 1

s0 ∑

i∈S

G

α,

√n|θ0,i| σ

(Σ−1)ii
.

Controls average type I error Minimax optimal average power

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 21 / 34

SLIDE 39

High-level idea of the proof

Recall

√n( θ d −θ0) = Z + ∆

bias

. ∆∞ = Op s0 logp √n

−

− − − − − − − − → n (s0 logp)2

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 22 / 34

SLIDE 40

High-level idea of the proof

Recall

√n( θ d −θ0) = Z + ∆

bias

. ∆∞ = Op s0 logp √n

−

− − − − − − − − → n (s0 logp)2

To ensure average performance, we do not need to control ∆∞.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 22 / 34

SLIDE 41

High-level idea of the proof

A new norm:

∆(∞,k) ≡ max

A⊂[p],|A|≥k

∆A2

|A|

.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 23 / 34

SLIDE 42

High-level idea of the proof

A new norm:

∆(∞,k) ≡ max

A⊂[p],|A|≥k

∆A2

|A|

.

Properties of ·∞,k: Non-increasing in k As k gets smaller, it gives tighter control on the individual entries of ∆.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 23 / 34

SLIDE 43

High-level idea of the proof

A new norm:

∆(∞,k) ≡ max

A⊂[p],|A|≥k

∆A2

|A|

.

Properties of ·∞,k: Non-increasing in k As k gets smaller, it gives tighter control on the individual entries of ∆.

Lemma ∆(∞,cs0) = O s0 n logp

.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 23 / 34

SLIDE 44

A few steps of the proof

Any set |A| ≥ cs0 can be partitioned as

A = A1 ∪A2 ∪...∪AL

with Ai disjoint and cs0 ≤ |Ai| ≤ 2cs0.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 24 / 34

SLIDE 45

A few steps of the proof

Any set |A| ≥ cs0 can be partitioned as

A = A1 ∪A2 ∪...∪AL

with Ai disjoint and cs0 ≤ |Ai| ≤ 2cs0. (WLOG, we can assume cs0 ≤ |A| ≤ 2cs0.)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 24 / 34

SLIDE 46

A few steps of the proof

Any set |A| ≥ cs0 can be partitioned as

A = A1 ∪A2 ∪...∪AL

with Ai disjoint and cs0 ≤ |Ai| ≤ 2cs0. (WLOG, we can assume cs0 ≤ |A| ≤ 2cs0.)

∆ ≡ √n(M Σ−I)( θ −θ0).

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 24 / 34

SLIDE 47

A few steps of the proof

Any set |A| ≥ cs0 can be partitioned as

A = A1 ∪A2 ∪...∪AL

with Ai disjoint and cs0 ≤ |Ai| ≤ 2cs0. (WLOG, we can assume cs0 ≤ |A| ≤ 2cs0.)

∆ ≡ √n(Σ−1 Σ−I)( θ −θ0).

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 24 / 34

SLIDE 48

A few steps of the proof

Any set |A| ≥ cs0 can be partitioned as

A = A1 ∪A2 ∪...∪AL

with Ai disjoint and cs0 ≤ |Ai| ≤ 2cs0. (WLOG, we can assume cs0 ≤ |A| ≤ 2cs0.)

∆ ≡ √n(Σ−1 Σ−I)( θ −θ0).

Let T = supp(θ0)∪supp(

θ). We have ∆A2 ≤ √n(Σ−1 Σ−I)A,T2( θ −θ0)T2 .

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 24 / 34

SLIDE 49

A few steps of the proof (cont’d)

Applying tail bounds we get

sup

|A|≤cs0 |T|≤c′s0

(Σ−1 Σ−I)A,T2 = O

s0 logp

n

.

We also know that

θ −θ02 = O
s0 logp

n

.

Combining the bounds, we get

∆A2 |A| ≤ O s0 n logp

.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 25 / 34

SLIDE 50

A few steps of the proof (cont’d)

Applying tail bounds we get

sup

|A|≤cs0 |T|≤c′s0

(Σ−1 Σ−I)A,T2 = O

s0 logp

n

.

We also know that

θ −θ02 = O
s0 logp

n

.

Combining the bounds, we get

∆A2 |A| ≤ O s0 n logp

.

n s0(logp)2

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 25 / 34

SLIDE 51

Standard gaussian design

Suppose Xij ∼ N(0,1) independently.

θ d =

θ + 1 nXT(y−X θ)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 26 / 34

SLIDE 52

Standard gaussian design

Suppose Xij ∼ N(0,1) independently.

θ d =

θ + 1 nXT(y−X θ)

SDL test [J., Montanari 2013]

θ d =

θ + d n XT(y−X θ)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 26 / 34

SLIDE 53

Standard gaussian design

Suppose Xij ∼ N(0,1) independently.

θ d =

θ + 1 nXT(y−X θ)

SDL test [J., Montanari 2013]

θ d

=

θ + d

n XT(y−X θ)

d

=

1− 1

n θ0 −1

Based on the analysis of Approximate Message Passing (AMP).

[M. Bayati, D. Donoho, A. Maleki, A. Montanari]

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 26 / 34

SLIDE 54

Exact asymptotic characterization

θ d ≡

θ + d nXT(y−X θ) Theorem [M. Bayati, A. Montanari 2012]

Consider the standard gaussian setting where n/p → δ, s0/p → ε and

nσ2 → σ2

∞. If δ ≥ ε log(1/ε), then on finite-dimensional marginals

θ d = θ0 +τZ ,

Z ∼ N(0,Ip×p),

with τ,d given by ...

δ ≥ ε log(1/ε) − − − − → n ≥ s0 log(p/s0)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 27 / 34

SLIDE 55

Effect of factor d

d =

1−

θ0 n −1 = 1+O s0 n

Adel Javanmard (Stanford- UC Berkeley)

Hypothesis Testing Jan 2015 28 / 34

SLIDE 56

Effect of factor d

d =

1−

θ0 n −1 = 1+O s0 n

Density
10
5

5 10 0.0 0.1 0.2 0.3 0.4

with factor d

Density

10
5

5 10 0.0 0.1 0.2 0.3 0.4

without factor d

Figure: Histogram of v = (

θ d −θ0)/τ for n = 3s0 (ε = 0.2, δ = 0.6) and p = 3000.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 28 / 34

SLIDE 57

Effect of factor d

d =

1−

θ0 n −1 = 1+O s0 n

Density
6
4
2

2 4 6 0.0 0.1 0.2 0.3 0.4

with factor d

Density

6
4
2

2 4 6 0.0 0.1 0.2 0.3 0.4

without factor d

Figure: Histogram of v = (

θ d −θ0)/τ for n = 30s0 (ε = 0.02, δ = 0.6) and p = 3000.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 29 / 34

SLIDE 58

Exact asymptotic characterization

500 1000 1500 2000 2500 3000 3500 0.0 0.5 1.0 1.5 2.0 2.5 3.0 with d without d

n = 3s0

500 1000 1500 2000 2500 3000 3500 0.0 0.5 1.0 1.5 2.0 2.5 3.0 with d without d

n = 30s0 Figure: Empirical kurtosis of v = (

θ d −θ0)/τ with and without normalization factor d.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 30 / 34

SLIDE 59

Theorem [Javanmard, Montanari, 2013]

Consider the setting where n/p → δ, s0/p → ε and δ ≥ ε log(1/ε). Then, for

i / ∈ S we have lim

p→∞Pθ0(Ti,X(y) = 1) = α .

Further, assuming that for all i ∈ S, |θ0,i| ≥ µ, we have

lim

p→∞Pθ0(Ti,X(y) = 1) ≥ G

α, µ

τ

,

with

G(α,u) ≡ 2−Φ

Φ−1(1− α

2 )+u

−Φ
Φ−1(1− α

2 )−u

.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 31 / 34

SLIDE 60

Summary

Random designs n s0(logp)2

guarantee on average type I error and power requires good estimate of precision matrix

(can be done e.g., under sparsity assumption)

n s0 logp ? Standard gaussian designs n s0 log(p/s0)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 32 / 34

SLIDE 61

[1] A. Javanmard and A. Montanari, Confidence Intervals and Hypothesis Testing for High-Dimensional Regression. JMLR, 2014 [2] A. Javanmard and A. Montanari, Nearly Optimal Sample Size in Hypothesis Testing for High-Dimensional Regression. Allerton, 2013. [3] A. Javanmard and A. Montanari, Hypothesis Testing in High-Dimensional Regression under the Gaussian Random Design Model: Asymptotic Theory. IEEE transaction on Info. Theory, 2013.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 33 / 34

SLIDE 62

Thanks!

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 34 / 34