From selective inference to adaptive data analysis Xiaoying Tian - - PowerPoint PPT Presentation

from selective inference to adaptive data analysis
SMART_READER_LITE
LIVE PREVIEW

From selective inference to adaptive data analysis Xiaoying Tian - - PowerPoint PPT Presentation

From selective inference to adaptive data analysis Xiaoying Tian Harris December 9, 2016 Acknowledgement My advisor: Jonathan Taylor Other coauthors: Snigdha Panigrahi Jelena Markovic Nan Bi Model selection Observe data ( y


slide-1
SLIDE 1

From selective inference to adaptive data analysis

Xiaoying Tian Harris December 9, 2016

slide-2
SLIDE 2

Acknowledgement

My advisor:

◮ Jonathan Taylor

Other coauthors:

◮ Snigdha Panigrahi ◮ Jelena Markovic ◮ Nan Bi

slide-3
SLIDE 3

Model selection

◮ Observe data (y, X), X ∈ Rn×p, y ∈ Rn

slide-4
SLIDE 4

Model selection

◮ Observe data (y, X), X ∈ Rn×p, y ∈ Rn ◮ model = lm(y ∼ X1 + X2 + X3 + X4)

slide-5
SLIDE 5

Model selection

◮ Observe data (y, X), X ∈ Rn×p, y ∈ Rn ◮ model = lm(y ∼ X1 + X2 + X3 + X4)

model = lm(y ∼ X1 + X2 + X4)

slide-6
SLIDE 6

Model selection

◮ Observe data (y, X), X ∈ Rn×p, y ∈ Rn ◮ model = lm(y ∼ X1 + X2 + X3 + X4)

model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4)

slide-7
SLIDE 7

Model selection

◮ Observe data (y, X), X ∈ Rn×p, y ∈ Rn ◮ model = lm(y ∼ X1 + X2 + X3 + X4)

model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4)

◮ Inference after model selection

  • 1. Use data to select a set of variables E
  • 2. Normal z-test to get p-values
slide-8
SLIDE 8

Model selection

◮ Observe data (y, X), X ∈ Rn×p, y ∈ Rn ◮ model = lm(y ∼ X1 + X2 + X3 + X4)

model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4)

◮ Inference after model selection

  • 1. Use data to select a set of variables E
  • 2. Normal z-test to get p-values

◮ Problem: inflated significance

  • 1. Normal z-tests need adjustment
  • 2. Selection is biased towards “significance”
slide-9
SLIDE 9

Inflated Significance

Setup:

◮ X ∈ R100×200 has i.i.d normal entries ◮ y = Xβ + ǫ, ǫ ∼ N(0, I) ◮ β = (5, . . . , 5 10

, 0, . . . , 0)

◮ LASSO, nonzero coefficient set E ◮ z-test, null pvalues for i ∈ E, i ∈ {1, . . . , 10}

0.0 0.1 0.2 0.3 0.4 0.5 p-values 0.0 0.1 0.2 0.3 0.4 0.5 frequencies

null pvalues after selection

slide-10
SLIDE 10

Inflated Significance

Setup:

◮ X ∈ R100×200 has i.i.d normal entries ◮ y = Xβ + ǫ, ǫ ∼ N(0, I) ◮ β = (5, . . . , 5 10

, 0, . . . , 0)

◮ LASSO, nonzero coefficient set E ◮ z-test, null pvalues for i ∈ E, i ∈ {1, . . . , 10}

0.0 0.2 0.4 0.6 0.8 1.0 p-values 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 frequencies

selective p-values after selection

slide-11
SLIDE 11

Selective inference: features and caveat

◮ Specific to particular selection procedures ◮ Exact post-selection test ◮ More powerful test

slide-12
SLIDE 12

Selective inference: popping the hood

Consider the selection for “big effects”:

◮ X1, . . . , Xn i.i.d

∼ N(0, 1), X =

n

i=1 Xi

n ◮ Select for “big effects”, X > 1 ◮ Observation: X obs = 1.1, with n = 5 ◮ Normal z-test v.s. selective test for H0 : µ = 0.

  • 1.5
  • 1.0
  • 0.5

0.0 0.5 1.0 1.5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

  • riginal distribution for ¯

X

0.0 0.5 1.0 1.5 2.0 1 2 3 4 5 6

conditional distribution after selection

slide-13
SLIDE 13

Selective inference: popping the hood

Consider the selection for “big effects”:

◮ X1, . . . , Xn i.i.d

∼ N(0, 1), X =

n

i=1 Xi

n ◮ Select for “big effects”, X > 1 ◮ Observation: X obs = 1.1, with n = 5 ◮ Normal z-test v.s. selective test for H0 : µ = 0.

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

  • riginal distribution for ¯

X

0.0 0.5 1.0 1.5 2.0 1 2 3 4 5 6

conditional distribution after selection

slide-14
SLIDE 14

Selective inference: in a nutshell

◮ Selection, e.g. X > 1. ◮ Change of the reference measure

◮ the conditional distribution, e.g. N(µ, 1

n), truncated at 1.

◮ Target of inference may depend on the outcome of selection

◮ Example: selection by LASSO

slide-15
SLIDE 15

What is the “selected” model?

Suppose a set of variables E are suggested by the data for further investigation.

◮ Selected model by Fithian et al. (2014):

ME = {N(XEβE, σ2

EI), βE ∈ R|E|, σ2 E > 0}.

Target is βE.

◮ Full model by Lee et al. (2016), Berk et al. (2013):

M = {N(µ, σ2I), µ ∈ Rn}. Target is βE(µ) = X †

Eµ.

◮ Nonparametric model:

M = {⊗nF : (X, Y ) ∼ F}. Target is βE(F) = EF[X T

E XE]−1EF[XE · Y ].

slide-16
SLIDE 16

What is the “selected” model?

Suppose a set of variables E are suggested by the data for further investigation.

◮ Selected model by Fithian et al. (2014):

ME = {N(XEβE, σ2

EI), βE ∈ R|E|, σ2 E > 0}.

Target is βE.

◮ Full model by Lee et al. (2016), Berk et al. (2013):

M = {N(µ, σ2I), µ ∈ Rn}. Target is βE(µ) = X †

Eµ.

◮ Nonparametric model:

M = {⊗nF : (X, Y ) ∼ F}. Target is βE(F) = EF[X T

E XE]−1EF[XE · Y ].

A tool for valid inference after exploratory data analysis.

slide-17
SLIDE 17

Selective inference on a DAG

E X∗, y∗ ω X, y ¯ E ◮ Incoporate randomness through ω

  • 1. (X ∗, y ∗) = (X, y)
  • 2. (X ∗, y ∗) = (X1, y1)
  • 3. (X ∗, y ∗) = (X, y + ω)

◮ Reference measure conditioning on

E, the yellow node.

◮ Target of inference can be E

  • 1. Not E, but depends on the data

through E

  • 2. “Liberating” target of inference

from selection

  • 3. E incorporate knowledge from

previous literature.

slide-18
SLIDE 18

From selective inference to adaptive data analysis

Denote the data by S

E ω S ¯ E

slide-19
SLIDE 19

From selective inference to adaptive data analysis

Denote the data by S

ω2 ω1 ¯ E E1 E2 S

slide-20
SLIDE 20

Reference measure after selection

◮ Given any point null F0, use the conditional distribution F ∗ 0 as

reference measure, dF ∗ dF0 (S) = ℓF(S).

◮ ℓF is called the selective likelihood ratio. Depends on the

selection algorithm and the randomization distribution ω ∼ G.

◮ Tests of the form H0 : θ(F) = θ0 can be reduced to testing

point nulls, e.g.

◮ Score test ◮ Conditioning in exponential families

slide-21
SLIDE 21

Computing the reference measure after selection

◮ Selection map ˆ

Q results from an optimization problem, ˆ β(S, ω) = arg min

β

ℓ(S; β) + P(β) + ωTβ. E is the active set of ˆ β.

◮ Selection region A(S) = {ω : ˆ

Q(S, ω) = E}, ω ∼ G dF ∗ dF0 (S) =

  • A(S)

dG(ω).

E ω S

{ ˆ Q(S, ω) = E} is difficult to describe.

slide-22
SLIDE 22

Computing the reference measure after selection

◮ Selection map ˆ

Q results from an optimization problem, ˆ β(S, ω) = arg min

β

ℓ(S; β) + P(β) + ωTβ. E is the active set of ˆ β.

◮ Selection region A(S) = {ω : ˆ

Q(S, ω) = E}, ω ∼ G dF ∗ dF0 (S) =

  • A(S)

dG(ω). Let ˆ z(S, ω) be the subgradient of the optimization problem.

E ˆ z−E ˆ βE S

{(ˆ βE, ˆ z−E) ∈ B}, B depends only on the penalty P.

slide-23
SLIDE 23

Monte-Carlo sampler for the conditional distribution

Suppose F0 has density f0 and G has density g,

E ˆ z−E ˆ βE S

dF ∗ dF0 (S) =

  • B

g(ψ(S, ˆ βE, ˆ z−E))d ˆ βEd ˆ z−E, where ω = ψ(S, ˆ βE, ˆ z−E).

◮ The reparametrization map ψ is easy to compute, Harris et al.

(2016)

◮ In simulation, we jointly sample (S, ˆ

βE, ˆ z−E) from the density below, f0(S)g(ψ(S, ˆ βE, ˆ z−E))1B. Samples of S can be used as reference measure for selective inference.

slide-24
SLIDE 24

Interactive Data Analysis

Easily generalizable in a sequential/interactive fashion.

E ˆ z−E ˆ βE S

f0(S)g(ψ(S, ˆ βE, ˆ z−E))1B.

slide-25
SLIDE 25

Interactive Data Analysis

Easily generalizable in a sequential/interactive fashion.

E2 ˆ z−E2 ˆ z−E1 E1 ˆ βE2 ˆ βE1 S

f0(S)g(ψ1(S, ˆ βE1, ˆ z−E1))1B1 · g(ψ2(S, ˆ βE2, ˆ z−E2))1B2.

◮ Flexible framework. Any selection procedure resulting from a

“Loss + Penalty” convex problem.

◮ Examples such as Lasso, logistic Lasso, marginal screening,

forward stepwise, graphical Lasso, group Lasso, are considered in Harris et al. (2016).

◮ Many more is possible.

slide-26
SLIDE 26

Summary

◮ Selective inference on a DAG ◮ Selection: more than one shot ◮ Feasible implementation of the selective tests

https://github.com/selective-inference/Python-software

Thank you!

slide-27
SLIDE 27

Berk, R., Brown, L., Buja, A., Zhang, K. & Zhao, L. (2013), ‘Valid post-selection inference’, The Annals of Statistics 41(2), 802–837. URL: http://projecteuclid.org/euclid.aos/1369836961 Fithian, W., Sun, D. & Taylor, J. (2014), ‘Optimal Inference After Model Selection’, arXiv preprint arXiv:1410.2597 . arXiv: 1410.2597. URL: http://arxiv.org/abs/1410.2597 Harris, X. T., Panigrahi, S., Markovic, J., Bi, N. & Taylor, J. (2016), ‘Selective sampling after solving a convex problem’, arXiv preprint arXiv:1609.05609 . Lee, J. D., Sun, D. L., Sun, Y. & Taylor, J. E. (2016), ‘Exact post-selection inference with the lasso’, The Annals of Statistics 44(3), 907–927. URL: http://projecteuclid.org/euclid.aos/1460381681