Hypothesis Testing for High-Dimensional Regression: Nearly Optimal - - PowerPoint PPT Presentation

hypothesis testing for high dimensional regression nearly
SMART_READER_LITE
LIVE PREVIEW

Hypothesis Testing for High-Dimensional Regression: Nearly Optimal - - PowerPoint PPT Presentation

Hypothesis Testing for High-Dimensional Regression: Nearly Optimal Sample Size Adel Javanmard Stanford University- UC Berkeley Based on joint work with Andrea Montanari January 2015 Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing


slide-1
SLIDE 1

Hypothesis Testing for High-Dimensional Regression: Nearly Optimal Sample Size

Adel Javanmard

Stanford University- UC Berkeley Based on joint work with

Andrea Montanari

January 2015

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 1 / 34

slide-2
SLIDE 2

Outline

1

Problem definition

2

Debiasing approach

3

Hypothesis testing under nearly optimal sample size

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 2 / 34

slide-3
SLIDE 3

Problem definition

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 3 / 34

slide-4
SLIDE 4

Linear model

We focus on linear models:

Y = Xθ0 +W Y ∈ Rn (response), X ∈ Rn×p (design matrix), θ0 ∈ Rp (parameters)

Noise vector has independent entries with

E(Wi) = 0, E(W2

i ) = σ2 ,

E(|Wi|2+κ) < ∞, for some κ > 0.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 4 / 34

slide-5
SLIDE 5

Problem

Confidence intervals: For each i ∈ {1,...,p}, θ i,θ i ∈ R such that

P

  • θ0,i ∈ [θ i,θ i]
  • ≥ 1−α

We would like |θ i −θ i| as small as possible. Hypothesis testing:

H0,i : θ0,i = 0, HA,i : θ0,i = 0

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 5 / 34

slide-6
SLIDE 6

LASSO

  • θ ≡ argmin

θ∈Rp

1 2ny−Xθ2

2 +λθ1

  • .

[Tibshirani 1996, Chen, Donoho 1996] Distribution of

θ?

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34

slide-7
SLIDE 7

LASSO

  • θ ≡ argmin

θ∈Rp

1 2ny−Xθ2

2 +λθ1

  • .

[Tibshirani 1996, Chen, Donoho 1996] Distribution of

θ?

Debiasing approach: (LASSO is biased towards small ℓ1 norm.)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34

slide-8
SLIDE 8

LASSO

  • θ ≡ argmin

θ∈Rp

1 2ny−Xθ2

2 +λθ1

  • .

[Tibshirani 1996, Chen, Donoho 1996] Distribution of

θ?

Debiasing approach: (LASSO is biased towards small ℓ1 norm.)

  • θ

debiasing

− − − − − − − − − → θ d

We characterize distribution of

θ d.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 6 / 34

slide-9
SLIDE 9

Debiasing approach

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 7 / 34

slide-10
SLIDE 10

Classical setting (n ≫ p)

We know everything about the least-square estimator:

  • θ LS = 1

n

  • Σ−1XTY ,

where

Σ ≡ (XTX)/n is empirical covariance.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 8 / 34

slide-11
SLIDE 11

Classical setting (n ≫ p)

We know everything about the least-square estimator:

  • θ LS = 1

n

  • Σ−1XTY ,

where

Σ ≡ (XTX)/n is empirical covariance.

  • Confidence intervals:

[θ i,θ i] = [ θ LS

i

−cα∆i, θ LS

i

+cα∆i], ∆i ≡ σ

  • (

Σ−1)ii n

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 8 / 34

slide-12
SLIDE 12

High-dimensional setting (n < p)

  • θ LS = 1

n

  • Σ−1XTY

Problem in high dimension:

  • Σ is not invertible!

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 9 / 34

slide-13
SLIDE 13

High-dimensional setting (n < p)

  • θ LS = 1

n

  • Σ−1XTY

Take your favorite M ∈ Rp×p:

  • θ ∗ = 1

nMXTY = 1 nMXTXθ0 + 1 nMXTW = θ0 +(M Σ−I)θ0

  • bias

+ 1 nMXTW

  • Gaussian error

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 9 / 34

slide-14
SLIDE 14

Debiased estimator

  • θ ∗ = θ0 +(M

Σ−I)θ0

  • bias

+ 1 nMXTW

  • Gaussian error

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34

slide-15
SLIDE 15

Debiased estimator

  • θ ∗ = θ0 +(M

Σ−I)θ0

  • bias

+ 1 nMXTW

  • Gaussian error

Let us (try to) subtract the bias

  • θ u =

θ ∗ −(M Σ−I) θ Lasso

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34

slide-16
SLIDE 16

Debiased estimator

  • θ ∗ = θ0 +(M

Σ−I)θ0

  • bias

+ 1 nMXTW

  • Gaussian error

Let us (try to) subtract the bias

  • θ u =

θ ∗ −(M Σ−I) θ Lasso Debiased estimator (

θ = θ Lasso)

  • θ d ≡

θ + 1 nMXT(Y −X θ)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 10 / 34

slide-17
SLIDE 17

Debiased estimator: Choosing M?

  • θ d ≡

θ + 1 nMXT(y−X θ)

Low dimensional projection estimator (LDPE) Start with a linear estimator, debias by a nonlinear estimator M constructed via nodewise LASSO on X

[C-H. Zhang, S. S. Zhang]

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 11 / 34

slide-18
SLIDE 18

Debiased estimator: Choosing M?

  • θ d ≡

θ + 1 nMXT(y−X θ)

Low dimensional projection estimator (LDPE) Start with a linear estimator, debias by a nonlinear estimator M constructed via nodewise LASSO on X

[C-H. Zhang, S. S. Zhang]

Approximate inverse of

Σ: nodewise LASSO on X

(under row-sparsity assumption on Σ−1)

[S. van de Geer, P . Bühlmann, Y. Ritov, R. Dezeure]

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 11 / 34

slide-19
SLIDE 19

Debiased estimator: Choosing M?

Our approach: Optimizing two objectives (bias and variance of

θ d)

[A. Javanmard, A. Montanari]

√n( θ d −θ0) = √n(M Σ−I)(θ0 − θ)

  • bias↓

+Z Z|X ∼ N(0, σ2M ΣMT

  • noise

covariance

  • ),
  • Σ = 1

nXXT

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34

slide-20
SLIDE 20

Debiased estimator: Choosing M?

Our approach: Find M by solving an optimization problem:

[A. Javanmard, A. Montanari]

minimize

M

max

1≤i≤p(M

ΣMT)i,i

subject to

|M Σ−I|∞ ≤ ξ

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34

slide-21
SLIDE 21

Debiased estimator: Choosing M?

Our approach: Find M by solving an optimization problem:

[A. Javanmard, A. Montanari]

minimize

mi

mT

i

Σmi

subject to

  • Σmi −ei∞ ≤ ξ

The optimization can be decoupled and solved in parallel.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 12 / 34

slide-22
SLIDE 22

Main theorems

Theorem [Javanmard, Montanari 2013] (Deterministic designs)

Let X be any deterministic design that satisfies compatibility condition for the set S = supp(θ0), (|S| ≤ s0), with constant φ0 . Further define the coherence parameter

µ∗ ≡ min

M∈Rp×p|M

Σ−I|∞ .

Let K ≡ maxi∈[p]

Σii. Then, letting λ = cσ

  • logp/n, we have

√n( θ d −θ0) = Z +∆, Z ∼ N(0,σ2M ΣMT) P

  • ∆∞ ≥ 4cµ∗σs0

φ2

√logp

  • ≤ 2p−c0 ,

c0 =

c2 32K −1

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 13 / 34

slide-23
SLIDE 23

Main theorems

Theorem [Javanmard, Montanari 2013] (Deterministic designs)

Let X be any deterministic design that satisfies compatibility condition for the set S = supp(θ0), (|S| ≤ s0), with constant φ0 . Further define the coherence parameter

µ∗ ≡ min

M∈Rp×p|M

Σ−I|∞ .

Let K ≡ maxi∈[p]

Σii. Then, letting λ = cσ

  • logp/n, we have

√n( θ d −θ0) = Z +∆, Z ∼ N(0,σ2M ΣMT) P

  • ∆∞ ≥ 4cµ∗σs0

φ2

√logp

  • ≤ 2p−c0 ,

c0 =

c2 32K −1

Remark:

µ∗ ≤ 1 n max

i=j |Xei,Xej|.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 13 / 34

slide-24
SLIDE 24

Main theorems

Theorem [Javanmard, Montanari 2013] (Random designs)

Let Σ be such that σmin(Σ) ≥ Cmin > 0 and σmax(Σ) ≤ Cmax < ∞ and

maxi∈[p] Σii ≤ 1. Assume XΣ−1 has independent subgaussian rows with mean

zero and subgaussian norm K. Letting λ = cσ

  • logp/n, we have

√n( θ d −θ0) = Z +∆, Z|X ∼ N(0,σ2M ΣMT), P

  • ∆∞ ≥ (16cσ

Cmin )s0 logp √n

  • ≤ 4e−c1n +4p−c2 ,

for some explicit constants c1 = C(K), c2 = C(c,K,Cmin,Cmax).

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 14 / 34

slide-25
SLIDE 25

Main theorems

Theorem [Javanmard, Montanari 2013] (Random designs)

Let Σ be such that σmin(Σ) ≥ Cmin > 0 and σmax(Σ) ≤ Cmax < ∞ and

maxi∈[p] Σii ≤ 1. Assume XΣ−1 has independent subgaussian rows with mean

zero and subgaussian norm K. Letting λ = cσ

  • logp/n, we have

√n( θ d −θ0) = Z +∆, Z|X ∼ N(0,σ2M ΣMT), P

  • ∆∞ ≥ (16cσ

Cmin )s0 logp √n

  • ≤ 4e−c1n +4p−c2 ,

for some explicit constants c1 = C(K), c2 = C(c,K,Cmin,Cmax). Remark on sample size: If

n (s0 logp)2 → ∞ then ∆∞ = op(1).

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 14 / 34

slide-26
SLIDE 26

Consequences

Confidence intervals for single parameters:

lim

n→∞P

  • θ0,i ∈ [θ i,θ i]
  • ≥ 1−α

|θ i −θ i| ≤ (2+o(1))cα

  • σ2

n (Σ−1)ii

(n<p)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 15 / 34

slide-27
SLIDE 27

Consequences

Confidence intervals for single parameters:

lim

n→∞P

  • θ0,i ∈ [θ i,θ i]
  • ≥ 1−α

|θ i −θ i| ≤ (2+o(1))cα

  • σ2

n (Σ−1)ii

(n<p)

|θ i −θ i| ≤ 2cα

  • σ2

n ( Σ−1)ii

Least square (n>p)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 15 / 34

slide-28
SLIDE 28

Consequences

Confidence intervals for single parameters:

lim

n→∞P

  • θ0,i ∈ [θ i,θ i]
  • ≥ 1−α

|θ i −θ i| ≤ (2+o(1))cα

  • σ2

n (Σ−1)ii

(n<p)

|θ i −θ i| ≤ 2cα

  • σ2

n ( Σ−1)ii

Least square (n>p) Remark: No need for irrepresentability / θmin condition (common assumptions for support recovery)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 15 / 34

slide-29
SLIDE 29

Consequences

Confidence intervals for single parameters:

lim

n→∞P

  • θ0,i ∈ [θ i,θ i]
  • ≥ 1−α

|θ i −θ i| ≤ (2+o(1))cα

  • σ2

n (Σ−1)ii

(n<p) Hypothesis testing: minimax optimal statistical power

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 15 / 34

slide-30
SLIDE 30

Framework

Hypothesis testing

H0,i : θ0,i = 0, HA,i : θ0,i = 0.

Two-sided p-values:

Pi = 2

  • 1−Φ(|

θ d

i |

τ )

  • with Φ(·) cdf of standard normal.

Decision rule:

Ti,X(y) =

  • 1

if Pi ≤ α (reject the null hypothesis H0,i),

  • therwise

(accept the null hypothesis).

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 16 / 34

slide-31
SLIDE 31

Theorem [Javanmard, Montanari, 2013]

Consider designs with subgaussian rows and let S ≡ supp(θ0). Assume that s0 ≡ |S| = o(√n/logp). Then, for any fixed sequence of integers i = i(n), we have that for i /

∈ S lim

n→∞Pθ0(Ti,X(y) = 1) ≤ α .

Further, assuming that for all i ∈ S, |θ0,i| ≥ µ, we have

lim inf

p→∞

1 1−βi,n(α,µ)Pθ0(Ti,X(y) = 1) ≥ 1, 1−βi,n(α,µ) ≡ G

  • α,

√nµ σ

  • (Σ−1)ii
  • .

with G(α,u) given by ...

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 17 / 34

slide-32
SLIDE 32

Theorem [Javanmard, Montanari, 2013]

Consider designs with subgaussian rows and let S ≡ supp(θ0). Assume that s0 ≡ |S| = o(√n/logp). Then, for any fixed sequence of integers i = i(n), we have that for i /

∈ S lim

n→∞Pθ0(Ti,X(y) = 1) ≤ α .

Further, assuming that for all i ∈ S, |θ0,i| ≥ µ, we have

lim inf

p→∞

1 1−βi,n(α,µ)Pθ0(Ti,X(y) = 1) ≥ 1, 1−βi,n(α,µ) ≡ G

  • α,

√nµ σ

  • (Σ−1)ii
  • .

with G(α,u) given by ... Minimax optimal power over the family of s0-sparse vectors θ0.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 17 / 34

slide-33
SLIDE 33

Related work on bias-correction

Ridge projection and bias correction

[P . Bühlmann]

(Remaining) bias is not negligible. Conservative tests Low dimensional projection estimator (LDPE)

[C-H. Zhang, S. S. Zhang]

Initial projection based on nodewise LASSO on X. Bias correction via LASSO.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 18 / 34

slide-34
SLIDE 34

Hypothesis testing under nearly optimal sample size

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 19 / 34

slide-35
SLIDE 35

Smaller sample size

Estimation, prediction: n s0 logp.

[Candés, Tao 2007, Bickel et al. 2009]

Hypothesis testing, confidence intervals: n (s0 logp)2. [This talk] Bias corrected ridge regression [P

. Bühlmann]

LDPE [C-H. Zhang, S. S. Zhang] Desparsified LASSO [S. van de Geer et. al.]

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 20 / 34

slide-36
SLIDE 36

Smaller sample size

Estimation, prediction: n s0 logp.

[Candés, Tao 2007, Bickel et al. 2009]

Hypothesis testing, confidence intervals: n (s0 logp)2. [This talk] Bias corrected ridge regression [P

. Bühlmann]

LDPE [C-H. Zhang, S. S. Zhang] Desparsified LASSO [S. van de Geer et. al.] Can we match the optimal sample size, n s0 logp ?

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 20 / 34

slide-37
SLIDE 37

Theorem [Javanmard, Montanari 2013]

Consider designs with subgassian rows and assume n s0(logp)2. Then

limsup

p→∞

1 p−s0 ∑

i∈Sc

Pθ0(Ti,X(y) = 1) ≤ α .

Further, assuming that for all i ∈ S, |θ0,i| ≥ µ, we have

liminf

p→∞

1 1−β ∗

n (α,θ0)

1 s0 ∑

i∈S

Pθ0(Ti,X(y) = 1)

  • ≥ 1,

where

1−β ∗

n (α,θ0) ≡ 1

s0 ∑

i∈S

G

  • α,

√n|θ0,i| σ

  • (Σ−1)ii
  • .

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 21 / 34

slide-38
SLIDE 38

Theorem [Javanmard, Montanari 2013]

Consider designs with subgassian rows and assume n s0(logp)2. Then

limsup

p→∞

1 p−s0 ∑

i∈Sc

Pθ0(Ti,X(y) = 1) ≤ α .

Further, assuming that for all i ∈ S, |θ0,i| ≥ µ, we have

liminf

p→∞

1 1−β ∗

n (α,θ0)

1 s0 ∑

i∈S

Pθ0(Ti,X(y) = 1)

  • ≥ 1,

where

1−β ∗

n (α,θ0) ≡ 1

s0 ∑

i∈S

G

  • α,

√n|θ0,i| σ

  • (Σ−1)ii
  • .

Controls average type I error Minimax optimal average power

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 21 / 34

slide-39
SLIDE 39

High-level idea of the proof

Recall

√n( θ d −θ0) = Z + ∆

  • bias

. ∆∞ = Op s0 logp √n

− − − − − − − − → n (s0 logp)2

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 22 / 34

slide-40
SLIDE 40

High-level idea of the proof

Recall

√n( θ d −θ0) = Z + ∆

  • bias

. ∆∞ = Op s0 logp √n

− − − − − − − − → n (s0 logp)2

To ensure average performance, we do not need to control ∆∞.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 22 / 34

slide-41
SLIDE 41

High-level idea of the proof

A new norm:

∆(∞,k) ≡ max

A⊂[p],|A|≥k

∆A2

  • |A|

.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 23 / 34

slide-42
SLIDE 42

High-level idea of the proof

A new norm:

∆(∞,k) ≡ max

A⊂[p],|A|≥k

∆A2

  • |A|

.

Properties of ·∞,k: Non-increasing in k As k gets smaller, it gives tighter control on the individual entries of ∆.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 23 / 34

slide-43
SLIDE 43

High-level idea of the proof

A new norm:

∆(∞,k) ≡ max

A⊂[p],|A|≥k

∆A2

  • |A|

.

Properties of ·∞,k: Non-increasing in k As k gets smaller, it gives tighter control on the individual entries of ∆.

Lemma ∆(∞,cs0) = O s0 n logp

  • .

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 23 / 34

slide-44
SLIDE 44

A few steps of the proof

Any set |A| ≥ cs0 can be partitioned as

A = A1 ∪A2 ∪...∪AL

with Ai disjoint and cs0 ≤ |Ai| ≤ 2cs0.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 24 / 34

slide-45
SLIDE 45

A few steps of the proof

Any set |A| ≥ cs0 can be partitioned as

A = A1 ∪A2 ∪...∪AL

with Ai disjoint and cs0 ≤ |Ai| ≤ 2cs0. (WLOG, we can assume cs0 ≤ |A| ≤ 2cs0.)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 24 / 34

slide-46
SLIDE 46

A few steps of the proof

Any set |A| ≥ cs0 can be partitioned as

A = A1 ∪A2 ∪...∪AL

with Ai disjoint and cs0 ≤ |Ai| ≤ 2cs0. (WLOG, we can assume cs0 ≤ |A| ≤ 2cs0.)

∆ ≡ √n(M Σ−I)( θ −θ0).

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 24 / 34

slide-47
SLIDE 47

A few steps of the proof

Any set |A| ≥ cs0 can be partitioned as

A = A1 ∪A2 ∪...∪AL

with Ai disjoint and cs0 ≤ |Ai| ≤ 2cs0. (WLOG, we can assume cs0 ≤ |A| ≤ 2cs0.)

∆ ≡ √n(Σ−1 Σ−I)( θ −θ0).

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 24 / 34

slide-48
SLIDE 48

A few steps of the proof

Any set |A| ≥ cs0 can be partitioned as

A = A1 ∪A2 ∪...∪AL

with Ai disjoint and cs0 ≤ |Ai| ≤ 2cs0. (WLOG, we can assume cs0 ≤ |A| ≤ 2cs0.)

∆ ≡ √n(Σ−1 Σ−I)( θ −θ0).

Let T = supp(θ0)∪supp(

θ). We have ∆A2 ≤ √n(Σ−1 Σ−I)A,T2( θ −θ0)T2 .

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 24 / 34

slide-49
SLIDE 49

A few steps of the proof (cont’d)

Applying tail bounds we get

sup

|A|≤cs0 |T|≤c′s0

(Σ−1 Σ−I)A,T2 = O

  • s0 logp

n

  • .

We also know that

  • θ −θ02 = O
  • s0 logp

n

  • .

Combining the bounds, we get

∆A2 |A| ≤ O s0 n logp

  • .

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 25 / 34

slide-50
SLIDE 50

A few steps of the proof (cont’d)

Applying tail bounds we get

sup

|A|≤cs0 |T|≤c′s0

(Σ−1 Σ−I)A,T2 = O

  • s0 logp

n

  • .

We also know that

  • θ −θ02 = O
  • s0 logp

n

  • .

Combining the bounds, we get

∆A2 |A| ≤ O s0 n logp

  • .

n s0(logp)2

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 25 / 34

slide-51
SLIDE 51

Standard gaussian design

Suppose Xij ∼ N(0,1) independently.

  • θ d =

θ + 1 nXT(y−X θ)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 26 / 34

slide-52
SLIDE 52

Standard gaussian design

Suppose Xij ∼ N(0,1) independently.

  • θ d =

θ + 1 nXT(y−X θ)

SDL test [J., Montanari 2013]

  • θ d =

θ + d n XT(y−X θ)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 26 / 34

slide-53
SLIDE 53

Standard gaussian design

Suppose Xij ∼ N(0,1) independently.

  • θ d =

θ + 1 nXT(y−X θ)

SDL test [J., Montanari 2013]

  • θ d

=

  • θ + d

n XT(y−X θ)

d

=

  • 1− 1

n θ0 −1

Based on the analysis of Approximate Message Passing (AMP).

[M. Bayati, D. Donoho, A. Maleki, A. Montanari]

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 26 / 34

slide-54
SLIDE 54

Exact asymptotic characterization

  • θ d ≡

θ + d nXT(y−X θ) Theorem [M. Bayati, A. Montanari 2012]

Consider the standard gaussian setting where n/p → δ, s0/p → ε and

nσ2 → σ2

∞. If δ ≥ ε log(1/ε), then on finite-dimensional marginals

  • θ d = θ0 +τZ ,

Z ∼ N(0,Ip×p),

with τ,d given by ...

δ ≥ ε log(1/ε) − − − − → n ≥ s0 log(p/s0)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 27 / 34

slide-55
SLIDE 55

Effect of factor d

d =

  • 1−

θ0 n −1 = 1+O s0 n

  • Adel Javanmard (Stanford- UC Berkeley)

Hypothesis Testing Jan 2015 28 / 34

slide-56
SLIDE 56

Effect of factor d

d =

  • 1−

θ0 n −1 = 1+O s0 n

  • Density
  • 10
  • 5

5 10 0.0 0.1 0.2 0.3 0.4

with factor d

Density

  • 10
  • 5

5 10 0.0 0.1 0.2 0.3 0.4

without factor d

Figure: Histogram of v = (

θ d −θ0)/τ for n = 3s0 (ε = 0.2, δ = 0.6) and p = 3000.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 28 / 34

slide-57
SLIDE 57

Effect of factor d

d =

  • 1−

θ0 n −1 = 1+O s0 n

  • Density
  • 6
  • 4
  • 2

2 4 6 0.0 0.1 0.2 0.3 0.4

with factor d

Density

  • 6
  • 4
  • 2

2 4 6 0.0 0.1 0.2 0.3 0.4

without factor d

Figure: Histogram of v = (

θ d −θ0)/τ for n = 30s0 (ε = 0.02, δ = 0.6) and p = 3000.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 29 / 34

slide-58
SLIDE 58

Exact asymptotic characterization

500 1000 1500 2000 2500 3000 3500 0.0 0.5 1.0 1.5 2.0 2.5 3.0 with d without d

n = 3s0

500 1000 1500 2000 2500 3000 3500 0.0 0.5 1.0 1.5 2.0 2.5 3.0 with d without d

n = 30s0 Figure: Empirical kurtosis of v = (

θ d −θ0)/τ with and without normalization factor d.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 30 / 34

slide-59
SLIDE 59

Theorem [Javanmard, Montanari, 2013]

Consider the setting where n/p → δ, s0/p → ε and δ ≥ ε log(1/ε). Then, for

i / ∈ S we have lim

p→∞Pθ0(Ti,X(y) = 1) = α .

Further, assuming that for all i ∈ S, |θ0,i| ≥ µ, we have

lim

p→∞Pθ0(Ti,X(y) = 1) ≥ G

  • α, µ

τ

  • ,

with

G(α,u) ≡ 2−Φ

  • Φ−1(1− α

2 )+u

  • −Φ
  • Φ−1(1− α

2 )−u

  • .

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 31 / 34

slide-60
SLIDE 60

Summary

Random designs n s0(logp)2

guarantee on average type I error and power requires good estimate of precision matrix

(can be done e.g., under sparsity assumption)

n s0 logp ? Standard gaussian designs n s0 log(p/s0)

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 32 / 34

slide-61
SLIDE 61

[1] A. Javanmard and A. Montanari, Confidence Intervals and Hypothesis Testing for High-Dimensional Regression. JMLR, 2014 [2] A. Javanmard and A. Montanari, Nearly Optimal Sample Size in Hypothesis Testing for High-Dimensional Regression. Allerton, 2013. [3] A. Javanmard and A. Montanari, Hypothesis Testing in High-Dimensional Regression under the Gaussian Random Design Model: Asymptotic Theory. IEEE transaction on Info. Theory, 2013.

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 33 / 34

slide-62
SLIDE 62

Thanks!

Adel Javanmard (Stanford- UC Berkeley) Hypothesis Testing Jan 2015 34 / 34