De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel - - PowerPoint PPT Presentation

de biasing the lasso optimal sample size for gaussian
SMART_READER_LITE
LIVE PREVIEW

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel - - PowerPoint PPT Presentation

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel Javanmard (USC ) Hypothesis Testing


slide-1
SLIDE 1

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

Adel Javanmard

USC Marshall School of Business Data Science and Operations department

Based on joint work with

Andrea Montanari

Oct 2015

Adel Javanmard (USC ) Hypothesis Testing October 2015 1 / 39

slide-2
SLIDE 2

An example

Kaggle challenge: Identify patients diagnosed with type-2 diabetes

Adel Javanmard (USC ) Hypothesis Testing October 2015 2 / 39

slide-3
SLIDE 3

Statistical model

Data (Y1,X1),...,(Yn,Xn):

Yi = Patient i gets type-2 diabetes ∈ {0,1} Xi = Features of patient i ∈ Rp Yi ∼ fθ0(·|Xi) θ0 ∈ Rp θ0,j = contribution of feature j

Adel Javanmard (USC ) Hypothesis Testing October 2015 3 / 39

slide-4
SLIDE 4

Statistical model

Data (Y1,X1),...,(Yn,Xn):

Yi = Patient i gets type-2 diabetes ∈ {0,1} Xi = Features of patient i ∈ Rp Yi ∼ fθ0(·|Xi) θ0 ∈ Rp θ0,j = contribution of feature j

Adel Javanmard (USC ) Hypothesis Testing October 2015 3 / 39

slide-5
SLIDE 5

Statistical model

Data (Y1,X1),...,(Yn,Xn):

Yi = Patient i gets type-2 diabetes ∈ {0,1} Xi = Features of patient i ∈ Rp Yi ∼ fθ0(·|Xi) θ0 ∈ Rp θ0,j = contribution of feature j

Adel Javanmard (USC ) Hypothesis Testing October 2015 3 / 39

slide-6
SLIDE 6

Regularized estimator

  • θ ≡ argmin

θ∈Rp

  • L (θ)

logistic loss

+λ θ1

  • regularizer
  • .

Convex optimization Variable selection

Adel Javanmard (USC ) Hypothesis Testing October 2015 4 / 39

slide-7
SLIDE 7

Practice fusion data set (Kaggle)

Database

n = 500: patients p = 805: medical information

(meds, lab results, diagnosis, . . . )

Adel Javanmard (USC ) Hypothesis Testing October 2015 5 / 39

slide-8
SLIDE 8

200 400 600 800 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4

(HDL) cholesterol Year of birth Globulin Blood pressure Billirubin

  • θ

Regularized logreg selects 62 features

(λ chosen via cross validation resulting AUC = 0.75)

Shall we trust our findings?

Adel Javanmard (USC ) Hypothesis Testing October 2015 6 / 39

slide-9
SLIDE 9

200 400 600 800 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4

(HDL) cholesterol Year of birth Globulin Blood pressure Billirubin

  • θ

Regularized logreg selects 62 features

(λ chosen via cross validation resulting AUC = 0.75)

Shall we trust our findings?

Adel Javanmard (USC ) Hypothesis Testing October 2015 6 / 39

slide-10
SLIDE 10

In summary

Will focus on linear model and Lasso Compute confidence intervals/p-values

Adel Javanmard (USC ) Hypothesis Testing October 2015 7 / 39

slide-11
SLIDE 11

Outline

1

Problem definition

2

Debiasing approach

3

Hypothesis testing under nearly optimal sample size

Adel Javanmard (USC ) Hypothesis Testing October 2015 8 / 39

slide-12
SLIDE 12

Problem definition

Adel Javanmard (USC ) Hypothesis Testing October 2015 9 / 39

slide-13
SLIDE 13

Linear model

We focus on linear models:

Y = Xθ0 +W Y ∈ Rn (response), X ∈ Rn×p (design matrix), θ0 ∈ Rp (parameters)

Noise vector has independent entries with

E(Wi) = 0, E(W2

i ) = σ2 ,

E(|Wi|2+κ) < ∞, for some κ > 0.

Adel Javanmard (USC ) Hypothesis Testing October 2015 10 / 39

slide-14
SLIDE 14

Problem

Confidence intervals: For each i ∈ {1,...,p}, θ i,θ i ∈ R such that

P

  • θ0,i ∈ [θ i,θ i]
  • ≥ 1−α

We would like |θ i −θ i| as small as possible. Hypothesis testing:

H0,i : θ0,i = 0, HA,i : θ0,i = 0

Adel Javanmard (USC ) Hypothesis Testing October 2015 11 / 39

slide-15
SLIDE 15

LASSO

  • θ ≡ argmin

θ∈Rp

1 2ny−Xθ2

2 +λθ1

  • .

[Tibshirani 1996, Chen, Donoho 1996] Distribution of

θ?

Adel Javanmard (USC ) Hypothesis Testing October 2015 12 / 39

slide-16
SLIDE 16

LASSO

  • θ ≡ argmin

θ∈Rp

1 2ny−Xθ2

2 +λθ1

  • .

[Tibshirani 1996, Chen, Donoho 1996] Distribution of

θ?

Debiasing approach: (LASSO is biased towards small ℓ1 norm.)

Adel Javanmard (USC ) Hypothesis Testing October 2015 12 / 39

slide-17
SLIDE 17

LASSO

  • θ ≡ argmin

θ∈Rp

1 2ny−Xθ2

2 +λθ1

  • .

[Tibshirani 1996, Chen, Donoho 1996] Distribution of

θ?

Debiasing approach: (LASSO is biased towards small ℓ1 norm.)

  • θ

debiasing

− − − − − − − − − → θ d

We characterize distribution of

θ d.

Adel Javanmard (USC ) Hypothesis Testing October 2015 12 / 39

slide-18
SLIDE 18

Debiasing approach

Adel Javanmard (USC ) Hypothesis Testing October 2015 13 / 39

slide-19
SLIDE 19

Classical setting (n ≫ p)

We know everything about the least-square estimator:

  • θ LS = 1

n

  • Σ−1XTY ,

where

Σ ≡ (XTX)/n is empirical covariance.

Adel Javanmard (USC ) Hypothesis Testing October 2015 14 / 39

slide-20
SLIDE 20

Classical setting (n ≫ p)

We know everything about the least-square estimator:

  • θ LS = 1

n

  • Σ−1XTY ,

where

Σ ≡ (XTX)/n is empirical covariance.

  • Confidence intervals:

[θ i,θ i] = [ θ LS

i

−cα∆i, θ LS

i

+cα∆i], ∆i ≡ σ

  • (

Σ−1)ii n

Adel Javanmard (USC ) Hypothesis Testing October 2015 14 / 39

slide-21
SLIDE 21

High-dimensional setting (n < p)

  • θ LS = 1

n

  • Σ−1XTY

Problem in high dimension:

  • Σ is not invertible!

Adel Javanmard (USC ) Hypothesis Testing October 2015 15 / 39

slide-22
SLIDE 22

High-dimensional setting (n < p)

  • θ LS = 1

n

  • Σ−1XTY

Take your favorite M ∈ Rp×p:

  • θ ∗ = 1

nMXTY = 1 nMXTXθ0 + 1 nMXTW = θ0 +(M Σ−I)θ0

  • bias

+ 1 nMXTW

  • Gaussian error

Adel Javanmard (USC ) Hypothesis Testing October 2015 15 / 39

slide-23
SLIDE 23

Debiased estimator

  • θ ∗ = θ0 +(M

Σ−I)θ0

  • bias

+ 1 nMXTW

  • Gaussian error

Adel Javanmard (USC ) Hypothesis Testing October 2015 16 / 39

slide-24
SLIDE 24

Debiased estimator

  • θ ∗ = θ0 +(M

Σ−I)θ0

  • bias

+ 1 nMXTW

  • Gaussian error

Let us (try to) subtract the bias

  • θ d =

θ ∗ −(M Σ−I) θ Lasso

Adel Javanmard (USC ) Hypothesis Testing October 2015 16 / 39

slide-25
SLIDE 25

Debiased estimator

  • θ ∗ = θ0 +(M

Σ−I)θ0

  • bias

+ 1 nMXTW

  • Gaussian error

Let us (try to) subtract the bias

  • θ d =

θ ∗ −(M Σ−I) θ Lasso Debiased estimator (

θ = θ Lasso)

  • θ d ≡

θ + 1 nMXT(Y −X θ)

Adel Javanmard (USC ) Hypothesis Testing October 2015 16 / 39

slide-26
SLIDE 26

Debiased estimator: Choosing M?

  • θ d ≡

θ + 1 nMXT(y−X θ)

Gaussian design (xi ∼ N(0,Σ)) Assume known Σ (relevant in semi-supervised learning) M = Σ−1

[Javanmard, Montanari 2012]

Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

slide-27
SLIDE 27

Debiased estimator: Choosing M?

  • θ d ≡

θ + 1 nMXT(y−X θ)

Gaussian design (xi ∼ N(0,Σ)) Assume known Σ (relevant in semi-supervised learning) M = Σ−1

[Javanmard, Montanari 2012]

Does this remind you anything?

  • θ d ≡

θ +Σ−1 1 nXT(y−X θ)

Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

slide-28
SLIDE 28

Debiased estimator: Choosing M?

  • θ d ≡

θ + 1 nMXT(y−X θ)

Gaussian design (xi ∼ N(0,Σ)) Assume known Σ (relevant in semi-supervised learning) M = Σ−1

[Javanmard, Montanari 2012]

Does this remind you anything?

  • θ d ≡

θ +Σ−1 1 nXT(y−X θ)

(pseudo-) Newton method

Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

slide-29
SLIDE 29

Debiased estimator: Choosing M?

  • θ d ≡

θ + 1 nMXT(y−X θ)

Gaussian design (xi ∼ N(0,Σ)) Assume known Σ (relevant in semi-supervised learning) M = Σ−1

[Javanmard, Montanari 2012]

Approximate inverse of

Σ: nodewise LASSO on X

(under row-sparsity assumption on Σ−1)

[S. van de Geer, P . Bühlmann, Y. Ritov, R. Dezeure 2014]

Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

slide-30
SLIDE 30

Debiased estimator: Choosing M?

Our approach: Optimizing two objectives (bias and variance of

θ d)

[A. Javanmard, A. Montanari 2014]

√n( θ d −θ0) = √n(M Σ−I)(θ0 − θ)

  • bias↓

+Z Z|X ∼ N(0, σ2M ΣMT

  • noise

covariance

  • ),
  • Σ = 1

nXXT

Adel Javanmard (USC ) Hypothesis Testing October 2015 18 / 39

slide-31
SLIDE 31

Debiased estimator: Choosing M?

Our approach: Find M by solving an optimization problem:

[A. Javanmard, A. Montanari]

minimize

M

max

1≤i≤p(M

ΣMT)i,i

subject to

|M Σ−I|∞ ≤ ξ

Adel Javanmard (USC ) Hypothesis Testing October 2015 18 / 39

slide-32
SLIDE 32

Debiased estimator: Choosing M?

Our approach: Find M by solving an optimization problem:

[A. Javanmard, A. Montanari]

minimize

mi

mT

i

Σmi

subject to

  • Σmi −ei∞ ≤ ξ

The optimization can be decoupled and solved in parallel.

Adel Javanmard (USC ) Hypothesis Testing October 2015 18 / 39

slide-33
SLIDE 33

What does it look like?

Density

  • 10
  • 5

5 10 0.0 0.1 0.2 0.3 0.4

  • θ d

i

Can estimate σ ‘Ground truth’ from ntot = 10,000 records.

Adel Javanmard (USC ) Hypothesis Testing October 2015 19 / 39

slide-34
SLIDE 34

Confidence intervals

Neglecting the bias (

σ estimator of σ)

  • θ d

i ≈ N(θ0,i,∆2 i ),

∆2

i ≡

σ2 n (M ΣMT)ii [θ i,θ i] = [ θ d

i −cα∆i,

θ d

i +cα∆i]

Adel Javanmard (USC ) Hypothesis Testing October 2015 20 / 39

slide-35
SLIDE 35

Confidence intervals

Neglecting the bias (

σ estimator of σ)

  • θ d

i ≈ N(θ0,i,∆2 i ),

∆2

i ≡

σ2 n (M ΣMT)ii [θ i,θ i] = [ θ d

i −cα∆i,

θ d

i +cα∆i]

Adel Javanmard (USC ) Hypothesis Testing October 2015 20 / 39

slide-36
SLIDE 36

What does it look like?

200 400 600 800 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4

Billirubin Globulin (HDL) cholesterol Year of birth

coefficients features

  • θ
  • θ d

Adel Javanmard (USC ) Hypothesis Testing October 2015 21 / 39

slide-37
SLIDE 37

UCI crime dataset

−10 −5 5 10 15 20 25 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

  • θ d

i

n = 84, p = 102, ntot = 1994.

Adel Javanmard (USC ) Hypothesis Testing October 2015 22 / 39

slide-38
SLIDE 38

A theorem

Theorem [Javanmard, Montanari 2013] (Deterministic designs)

Let X be any deterministic design that satisfies compatibility condition. Define the coherence parameter

µ∗ ≡ min

M∈Rp×p|M

Σ−I|∞ .

Let s0 = |supp(θ0)|. Then

√n( θ d −θ0) = Z

  • Gaussian

+ ∆

  • Bias

∆∞ ≤ cµ∗σs0

  • logp,

w.h.p.

Adel Javanmard (USC ) Hypothesis Testing October 2015 23 / 39

slide-39
SLIDE 39

A theorem

Theorem [Javanmard, Montanari 2013] (Deterministic designs)

Let X be any deterministic design that satisfies compatibility condition. Define the coherence parameter

µ∗ ≡ min

M∈Rp×p|M

Σ−I|∞ .

Let s0 = |supp(θ0)|. Then

√n( θ d −θ0) = Z

  • Gaussian

+ ∆

  • Bias

∆∞ ≤ cµ∗σs0

  • logp,

w.h.p.

Remark:

µ∗ ≤ 1 n max

i=j |Xei,Xej|.

Adel Javanmard (USC ) Hypothesis Testing October 2015 23 / 39

slide-40
SLIDE 40

A theorem

Theorem [Javanmard, Montanari 2013] (Random designs)

Consider population covariance Σ with bounded eigenvalues and assume assume XΣ−1 has independent subgaussian rows. Then

√n( θ d −θ0) = Z

  • Gaussian

+ ∆

  • Bias

∆∞ ≤ cσ s0 logp √n , w.h.p.

Adel Javanmard (USC ) Hypothesis Testing October 2015 24 / 39

slide-41
SLIDE 41

A theorem

Theorem [Javanmard, Montanari 2013] (Random designs)

Consider population covariance Σ with bounded eigenvalues and assume assume XΣ−1 has independent subgaussian rows. Then

√n( θ d −θ0) = Z

  • Gaussian

+ ∆

  • Bias

∆∞ ≤ cσ s0 logp √n , w.h.p.

Remark on sample size: If

n (s0 logp)2 → ∞ then ∆∞ = op(1).

Adel Javanmard (USC ) Hypothesis Testing October 2015 24 / 39

slide-42
SLIDE 42

Consequences

Confidence intervals for single parameters:

lim

n→∞P

  • θ0,i ∈ [θ i,θ i]
  • ≥ 1−α

|θ i −θ i| ≤ 2cα

  • σ2

n (Σ−1)ii

(n<p)

Adel Javanmard (USC ) Hypothesis Testing October 2015 25 / 39

slide-43
SLIDE 43

Consequences

Confidence intervals for single parameters:

lim

n→∞P

  • θ0,i ∈ [θ i,θ i]
  • ≥ 1−α

|θ i −θ i| ≤ 2cα

  • σ2

n (Σ−1)ii

(n<p)

|θ i −θ i| ≤ 2cα

  • σ2

n ( Σ−1)ii

Least square (n>p)

Adel Javanmard (USC ) Hypothesis Testing October 2015 25 / 39

slide-44
SLIDE 44

Consequences

Confidence intervals for single parameters:

lim

n→∞P

  • θ0,i ∈ [θ i,θ i]
  • ≥ 1−α

|θ i −θ i| ≤ 2cα

  • σ2

n (Σ−1)ii

(n<p)

|θ i −θ i| ≤ 2cα

  • σ2

n ( Σ−1)ii

Least square (n>p) Remark: No need for irrepresentability / θmin condition (common assumptions for support recovery)

Adel Javanmard (USC ) Hypothesis Testing October 2015 25 / 39

slide-45
SLIDE 45

Hypothesis testing (based on de-biased estimator)

Null/alternative hypothesis:

H0,i : θ0,i = 0, HA,i : θ0,i = 0.

Two-sided p-values:

Pi = 2

  • 1−Φ(|

θ d

i |

τ )

  • .

with Φ(·) cdf of standard normal. We provide precise characterization of type I and type II error. Test (using de-biased estimator) has minimax optimal statistical power.

Adel Javanmard (USC ) Hypothesis Testing October 2015 26 / 39

slide-46
SLIDE 46

Related work on bias-correction

Ridge projection and bias correction

[P . Bühlmann]

(Remaining) bias is not negligible. Conservative tests Low dimensional projection estimator (LDPE)

[C-H. Zhang, S. S. Zhang]

Initial projection based on nodewise LASSO on X. Bias correction via LASSO.

Adel Javanmard (USC ) Hypothesis Testing October 2015 27 / 39

slide-47
SLIDE 47

Further related work

Debiasing:

Group sparsity [R. Mitra & C.H.Zhang 2014] Confidence interval for inverse covariance estimation [J. Jankova, S.v.d. Geer 2015] Genomics [Q.Zhao et. al. 2015, B. Rakitsch 2015] Econometrics [A. Belloni & V. Chernozhukov 2014, D. Kozbur 2015]

Other methods for uncertainty assessment

Uncertainty quantification under group sparsity [Q.Zhou 2015] Post double selection [Belloni et. al. 2014]

Adel Javanmard (USC ) Hypothesis Testing October 2015 28 / 39

slide-48
SLIDE 48

Hypothesis testing under nearly optimal sample size

Adel Javanmard (USC ) Hypothesis Testing October 2015 29 / 39

slide-49
SLIDE 49

Smaller sample size

Estimation, prediction: n s0 logp.

[Candés, Tao 2007, Bickel et al. 2009]

Hypothesis testing, confidence intervals: n (s0 logp)2. [This talk] Bias corrected ridge regression [P

. Bühlmann]

LDPE [C-H. Zhang, S. S. Zhang] Desparsified LASSO [S. van de Geer et. al.]

Adel Javanmard (USC ) Hypothesis Testing October 2015 30 / 39

slide-50
SLIDE 50

Smaller sample size

Estimation, prediction: n s0 logp.

[Candés, Tao 2007, Bickel et al. 2009]

Hypothesis testing, confidence intervals: n (s0 logp)2. [This talk] Bias corrected ridge regression [P

. Bühlmann]

LDPE [C-H. Zhang, S. S. Zhang] Desparsified LASSO [S. van de Geer et. al.] Can we match the optimal sample size, n s0 logp ?

Adel Javanmard (USC ) Hypothesis Testing October 2015 30 / 39

slide-51
SLIDE 51

Where is the bottleneck?

Adel Javanmard (USC ) Hypothesis Testing October 2015 31 / 39

slide-52
SLIDE 52

Where is the bottleneck?

The bias is given by

∆ = √n(Ω Σ−I)(θ0 − θ Lasso).

Earlier work bound bias a simple ℓ1 −ℓ∞ inequality:

∆∞ ≤ √n|M Σ−I|∞θ ∗ − θ Lasso1 ≤ √n×C

  • logp

n ×Cs0σ

  • logp

n ≤ C2σ s0 logp √n .

Adel Javanmard (USC ) Hypothesis Testing October 2015 31 / 39

slide-53
SLIDE 53

Plan for this part

Focus on Gaussian design: xi ∼ N(0,Σ) Assume that Σ is known. (See paper for unknown covariance. ) We show that the required sample rate is indeed artifact of the argument!

Adel Javanmard (USC ) Hypothesis Testing October 2015 32 / 39

slide-54
SLIDE 54

Plan for this part

Focus on Gaussian design: xi ∼ N(0,Σ) Assume that Σ is known. (See paper for unknown covariance. ) We show that the required sample rate is indeed artifact of the argument! De-biased estimator is asymptotically Gaussian under condition n s0(logp)2.

Adel Javanmard (USC ) Hypothesis Testing October 2015 32 / 39

slide-55
SLIDE 55

‘Leave-one-out’ technique

Fix coordinate i. Define

  • θ p

≡ argmin

θ

1 2ny−Xθ2 +λθ1 subject to

  • θ p

i = θ0,i

Adel Javanmard (USC ) Hypothesis Testing October 2015 33 / 39

slide-56
SLIDE 56

‘Leave-one-out’ technique

Fix coordinate i. Define

  • θ p

≡ argmin

θ

1 2ny−Xθ2 +λθ1 subject to

  • θ p

i = θ0,i

We then have

y−X θ p = w+✘✘✘✘✘

✘ ❳❳❳❳❳ ❳

˜ xi(θ0,i − θ p

i )+X∼i(θ0,∼i −

θ p

∼i)

Adel Javanmard (USC ) Hypothesis Testing October 2015 33 / 39

slide-57
SLIDE 57

‘Leave-one-out’ technique

Fix coordinate i. Define

  • θ p

≡ argmin

θ

1 2ny−Xθ2 +λθ1 subject to

  • θ p

i = θ0,i

We then have

y−X θ p = w+✘✘✘✘✘

✘ ❳❳❳❳❳ ❳

˜ xi(θ0,i − θ p

i )+X∼i(θ0,∼i −

θ p

∼i)

  • θ p is the Lasso estimator when ˜

xi is left out!

Adel Javanmard (USC ) Hypothesis Testing October 2015 33 / 39

slide-58
SLIDE 58

‘Leave-one-out’ technique

Let v be the ith column of XΣ−1. The bias is given by

∆i = R1 +R2 +R3

Adel Javanmard (USC ) Hypothesis Testing October 2015 34 / 39

slide-59
SLIDE 59

‘Leave-one-out’ technique

Let v be the ith column of XΣ−1. The bias is given by

∆i = R1 +R2 +R3 R1 = √n

  • 1− v,

xi n

  • (

θ Lasso

i

−θ ∗

i )

R2 = vT √nX∼i(θ0,∼i − θ p

∼i)

R3 = vT √nX∼i( θ p

∼i −

θ Lasso

∼i

)

Adel Javanmard (USC ) Hypothesis Testing October 2015 34 / 39

slide-60
SLIDE 60

‘Leave-one-out’ technique

Let v be the ith column of XΣ−1. The bias is given by

∆i = R1 +R2 +R3 R1 = √n

  • 1− v,

xi n

  • (

θ Lasso

i

−θ ∗

i ) concentration

− − − − → R2 = vT √nX∼i(θ0,∼i − θ p

∼i) independence

− − − − → R3 = vT √nX∼i( θ p

∼i −

θ Lasso

∼i

)

perturbation

− − − − →

Adel Javanmard (USC ) Hypothesis Testing October 2015 34 / 39

slide-61
SLIDE 61

Summary

Combining the bounds on R1,R2,R3, we obtain

∆∞ ≤ C s0 n logp, w.h.p

Adel Javanmard (USC ) Hypothesis Testing October 2015 35 / 39

slide-62
SLIDE 62

Summary

Combining the bounds on R1,R2,R3, we obtain

∆∞ ≤ C s0 n logp, w.h.p

Therefore,

∆∞ → 0

provided that

n ≥ s0(logp)2 .

Adel Javanmard (USC ) Hypothesis Testing October 2015 35 / 39

slide-63
SLIDE 63

Numerical illustration

Fix p = 3000 Design matrix X with rows i.i.d. from N(0,Σ)

Σij = 0.8|i−j|

Define δ = n/p (undersampling rate) and ε = s0/p (sparsity proportion)

δc: Critical value above which the de-biased estimator is Gaussian.

Adel Javanmard (USC ) Hypothesis Testing October 2015 36 / 39

slide-64
SLIDE 64

Numerical illustration

Fix p = 3000 Design matrix X with rows i.i.d. from N(0,Σ)

Σij = 0.8|i−j|

Define δ = n/p (undersampling rate) and ε = s0/p (sparsity proportion)

δc: Critical value above which the de-biased estimator is Gaussian.

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75

ε = s0/p δc

Adel Javanmard (USC ) Hypothesis Testing October 2015 36 / 39

slide-65
SLIDE 65

How to define δc?

  • Fix ε and change δ = n/p.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 −1 1 2 3 4 5 6 7 8 m(γδ)± SE(γδ) m(γδ)

δ = n/p (ε = 0.2)

δc = 0.57

Empirical kurtosis (100 realizations)

Adel Javanmard (USC ) Hypothesis Testing October 2015 37 / 39

slide-66
SLIDE 66

Conclusion

De-biasing regularized estimators Compute confidence intervals/p-values for high dimensional models Optimal sample size for Gaussian designs

Adel Javanmard (USC ) Hypothesis Testing October 2015 38 / 39

slide-67
SLIDE 67

Thanks!

Adel Javanmard (USC ) Hypothesis Testing October 2015 39 / 39