from selective inference to adaptive data analysis
play

From selective inference to adaptive data analysis Xiaoying Tian - PowerPoint PPT Presentation

From selective inference to adaptive data analysis Xiaoying Tian Harris December 9, 2016 Acknowledgement My advisor: Jonathan Taylor Other coauthors: Snigdha Panigrahi Jelena Markovic Nan Bi Model selection Observe data ( y


  1. From selective inference to adaptive data analysis Xiaoying Tian Harris December 9, 2016

  2. Acknowledgement My advisor: ◮ Jonathan Taylor Other coauthors: ◮ Snigdha Panigrahi ◮ Jelena Markovic ◮ Nan Bi

  3. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n

  4. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4)

  5. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4)

  6. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4)

  7. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4) ◮ Inference after model selection 1. Use data to select a set of variables E 2. Normal z-test to get p-values

  8. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4) ◮ Inference after model selection 1. Use data to select a set of variables E 2. Normal z-test to get p-values ◮ Problem: inflated significance 1. Normal z-tests need adjustment 2. Selection is biased towards “significance”

  9. Inflated Significance Setup: ◮ X ∈ R 100 × 200 has i.i.d normal entries ◮ y = X β + ǫ , ǫ ∼ N (0 , I ) ◮ β = (5 , . . . , 5 , 0 , . . . , 0) � �� � 10 ◮ LASSO, nonzero coefficient set E ◮ z-test, null pvalues for i ∈ E , i �∈ { 1 , . . . , 10 } 0.5 null pvalues after selection 0.4 0.3 frequencies 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 p-values

  10. Inflated Significance Setup: ◮ X ∈ R 100 × 200 has i.i.d normal entries ◮ y = X β + ǫ , ǫ ∼ N (0 , I ) ◮ β = (5 , . . . , 5 , 0 , . . . , 0) � �� � 10 ◮ LASSO, nonzero coefficient set E ◮ z-test, null pvalues for i ∈ E , i �∈ { 1 , . . . , 10 } selective p-values after selection 0.14 0.12 0.10 frequencies 0.08 0.06 0.04 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 p-values

  11. Selective inference: features and caveat ◮ Specific to particular selection procedures ◮ Exact post-selection test ◮ More powerful test

  12. Selective inference: popping the hood Consider the selection for “big effects”: � n i . i . d i =1 X i ◮ X 1 , . . . , X n ∼ N (0 , 1), X = n ◮ Select for “big effects”, X > 1 ◮ Observation: X obs = 1 . 1, with n = 5 ◮ Normal z -test v.s. selective test for H 0 : µ = 0. original distribution for ¯ X conditional distribution after selection 0.9 6 0.8 5 0.7 0.6 4 0.5 3 0.4 0.3 2 0.2 1 0.1 0.0 0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0

  13. Selective inference: popping the hood Consider the selection for “big effects”: � n i . i . d i =1 X i ◮ X 1 , . . . , X n ∼ N (0 , 1), X = n ◮ Select for “big effects”, X > 1 ◮ Observation: X obs = 1 . 1, with n = 5 ◮ Normal z -test v.s. selective test for H 0 : µ = 0. original distribution for ¯ X conditional distribution after selection 0 . 9 6 0 . 8 5 0 . 7 0 . 6 4 0 . 5 3 0 . 4 0 . 3 2 0 . 2 1 0 . 1 0 . 0 0 − 1 . 5 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0

  14. Selective inference: in a nutshell ◮ Selection, e.g. X > 1. ◮ Change of the reference measure ◮ the conditional distribution, e.g. N ( µ, 1 n ), truncated at 1. ◮ Target of inference may depend on the outcome of selection ◮ Example: selection by LASSO

  15. What is the “selected” model? Suppose a set of variables E are suggested by the data for further investigation. ◮ Selected model by Fithian et al. (2014): M E = { N ( X E β E , σ 2 E I ) , β E ∈ R | E | , σ 2 E > 0 } . Target is β E . ◮ Full model by Lee et al. (2016), Berk et al. (2013): M = { N ( µ, σ 2 I ) , µ ∈ R n } . Target is β E ( µ ) = X † E µ . ◮ Nonparametric model: M = {⊗ n F : ( X , Y ) ∼ F } . Target is β E ( F ) = E F [ X T E X E ] − 1 E F [ X E · Y ].

  16. What is the “selected” model? Suppose a set of variables E are suggested by the data for further investigation. ◮ Selected model by Fithian et al. (2014): M E = { N ( X E β E , σ 2 E I ) , β E ∈ R | E | , σ 2 E > 0 } . Target is β E . ◮ Full model by Lee et al. (2016), Berk et al. (2013): M = { N ( µ, σ 2 I ) , µ ∈ R n } . Target is β E ( µ ) = X † E µ . ◮ Nonparametric model: M = {⊗ n F : ( X , Y ) ∼ F } . Target is β E ( F ) = E F [ X T E X E ] − 1 E F [ X E · Y ]. A tool for valid inference after exploratory data analysis.

  17. Selective inference on a DAG ◮ Incoporate randomness through ω X, y ω 1. ( X ∗ , y ∗ ) = ( X , y ) 2. ( X ∗ , y ∗ ) = ( X 1 , y 1 ) 3. ( X ∗ , y ∗ ) = ( X , y + ω ) ◮ Reference measure conditioning on X ∗ , y ∗ E , the yellow node. ◮ Target of inference can be E 1. Not E , but depends on the data E through E 2. “Liberating” target of inference from selection ¯ 3. E incorporate knowledge from E previous literature.

  18. From selective inference to adaptive data analysis Denote the data by S S ω E ¯ E

  19. From selective inference to adaptive data analysis Denote the data by S ω 1 ω 2 S E 1 E 2 ¯ E

  20. Reference measure after selection ◮ Given any point null F 0 , use the conditional distribution F ∗ 0 as reference measure, dF ∗ 0 ( S ) = ℓ F ( S ) . dF 0 ◮ ℓ F is called the selective likelihood ratio . Depends on the selection algorithm and the randomization distribution ω ∼ G . ◮ Tests of the form H 0 : θ ( F ) = θ 0 can be reduced to testing point nulls, e.g. ◮ Score test ◮ Conditioning in exponential families

  21. Computing the reference measure after selection ◮ Selection map ˆ Q results from an optimization problem, ˆ ℓ ( S ; β ) + P ( β ) + ω T β. β ( S , ω ) = arg min β E is the active set of ˆ β . ◮ Selection region A ( S ) = { ω : ˆ Q ( S , ω ) = E } , ω ∼ G � dF ∗ 0 ( S ) = dG ( ω ) . dF 0 A ( S ) S ω { ˆ Q ( S , ω ) = E } is difficult to describe. E

  22. Computing the reference measure after selection ◮ Selection map ˆ Q results from an optimization problem, ˆ ℓ ( S ; β ) + P ( β ) + ω T β. β ( S , ω ) = arg min β E is the active set of ˆ β . ◮ Selection region A ( S ) = { ω : ˆ Q ( S , ω ) = E } , ω ∼ G � dF ∗ 0 ( S ) = dG ( ω ) . dF 0 A ( S ) Let ˆ z ( S , ω ) be the subgradient of the optimization problem. ˆ z − E ˆ β E S { (ˆ β E , ˆ z − E ) ∈ B} , B depends only on the penalty P . E

  23. Monte-Carlo sampler for the conditional distribution Suppose F 0 has density f 0 and G has density g , dF ∗ ˆ z − E ˆ S β E 0 ( S ) dF 0 � g ( ψ ( S , ˆ z − E )) d ˆ = β E , ˆ β E d ˆ z − E , B E where ω = ψ ( S , ˆ β E , ˆ z − E ). ◮ The reparametrization map ψ is easy to compute, Harris et al. (2016) ◮ In simulation, we jointly sample ( S , ˆ β E , ˆ z − E ) from the density below, f 0 ( S ) g ( ψ ( S , ˆ β E , ˆ z − E )) 1 B . Samples of S can be used as reference measure for selective inference.

  24. Interactive Data Analysis Easily generalizable in a sequential/interactive fashion. ˆ ˆ z − E β E S E f 0 ( S ) g ( ψ ( S , ˆ β E , ˆ z − E )) 1 B .

  25. Interactive Data Analysis Easily generalizable in a sequential/interactive fashion. ˆ ˆ z − E 1 ˆ ˆ z − E 2 β E 1 β E 2 S E 1 E 2 f 0 ( S ) g ( ψ 1 ( S , ˆ z − E 1 )) 1 B 1 · g ( ψ 2 ( S , ˆ β E 1 , ˆ β E 2 , ˆ z − E 2 )) 1 B 2 . ◮ Flexible framework. Any selection procedure resulting from a “Loss + Penalty” convex problem. ◮ Examples such as Lasso, logistic Lasso, marginal screening, forward stepwise, graphical Lasso, group Lasso, are considered in Harris et al. (2016). ◮ Many more is possible.

  26. Summary ◮ Selective inference on a DAG ◮ Selection: more than one shot ◮ Feasible implementation of the selective tests https://github.com/selective-inference/Python-software Thank you!

  27. Berk, R., Brown, L., Buja, A., Zhang, K. & Zhao, L. (2013), ‘Valid post-selection inference’, The Annals of Statistics 41 (2), 802–837. URL: http://projecteuclid.org/euclid.aos/1369836961 Fithian, W., Sun, D. & Taylor, J. (2014), ‘Optimal Inference After Model Selection’, arXiv preprint arXiv:1410.2597 . arXiv: 1410.2597. URL: http://arxiv.org/abs/1410.2597 Harris, X. T., Panigrahi, S., Markovic, J., Bi, N. & Taylor, J. (2016), ‘Selective sampling after solving a convex problem’, arXiv preprint arXiv:1609.05609 . Lee, J. D., Sun, D. L., Sun, Y. & Taylor, J. E. (2016), ‘Exact post-selection inference with the lasso’, The Annals of Statistics 44 (3), 907–927. URL: http://projecteuclid.org/euclid.aos/1460381681

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend