From selective inference to adaptive data analysis Xiaoying Tian - PowerPoint PPT Presentation

From selective inference to adaptive data analysis Xiaoying Tian Harris December 9, 2016

Acknowledgement My advisor: ◮ Jonathan Taylor Other coauthors: ◮ Snigdha Panigrahi ◮ Jelena Markovic ◮ Nan Bi

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4)

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4)

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4)

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4) ◮ Inference after model selection 1. Use data to select a set of variables E 2. Normal z-test to get p-values

Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4) ◮ Inference after model selection 1. Use data to select a set of variables E 2. Normal z-test to get p-values ◮ Problem: inflated significance 1. Normal z-tests need adjustment 2. Selection is biased towards “significance”

Inflated Significance Setup: ◮ X ∈ R 100 × 200 has i.i.d normal entries ◮ y = X β + ǫ , ǫ ∼ N (0 , I ) ◮ β = (5 , . . . , 5 , 0 , . . . , 0) � �� 10 ◮ LASSO, nonzero coefficient set E ◮ z-test, null pvalues for i ∈ E , i �∈ { 1 , . . . , 10 } 0.5 null pvalues after selection 0.4 0.3 frequencies 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 p-values

Inflated Significance Setup: ◮ X ∈ R 100 × 200 has i.i.d normal entries ◮ y = X β + ǫ , ǫ ∼ N (0 , I ) ◮ β = (5 , . . . , 5 , 0 , . . . , 0) � �� 10 ◮ LASSO, nonzero coefficient set E ◮ z-test, null pvalues for i ∈ E , i �∈ { 1 , . . . , 10 } selective p-values after selection 0.14 0.12 0.10 frequencies 0.08 0.06 0.04 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 p-values

Selective inference: features and caveat ◮ Specific to particular selection procedures ◮ Exact post-selection test ◮ More powerful test

Selective inference: popping the hood Consider the selection for “big effects”: � n i . i . d i =1 X i ◮ X 1 , . . . , X n ∼ N (0 , 1), X = n ◮ Select for “big effects”, X > 1 ◮ Observation: X obs = 1 . 1, with n = 5 ◮ Normal z -test v.s. selective test for H 0 : µ = 0. original distribution for ¯ X conditional distribution after selection 0.9 6 0.8 5 0.7 0.6 4 0.5 3 0.4 0.3 2 0.2 1 0.1 0.0 0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0

Selective inference: popping the hood Consider the selection for “big effects”: � n i . i . d i =1 X i ◮ X 1 , . . . , X n ∼ N (0 , 1), X = n ◮ Select for “big effects”, X > 1 ◮ Observation: X obs = 1 . 1, with n = 5 ◮ Normal z -test v.s. selective test for H 0 : µ = 0. original distribution for ¯ X conditional distribution after selection 0 . 9 6 0 . 8 5 0 . 7 0 . 6 4 0 . 5 3 0 . 4 0 . 3 2 0 . 2 1 0 . 1 0 . 0 0 − 1 . 5 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0

Selective inference: in a nutshell ◮ Selection, e.g. X > 1. ◮ Change of the reference measure ◮ the conditional distribution, e.g. N ( µ, 1 n ), truncated at 1. ◮ Target of inference may depend on the outcome of selection ◮ Example: selection by LASSO

What is the “selected” model? Suppose a set of variables E are suggested by the data for further investigation. ◮ Selected model by Fithian et al. (2014): M E = { N ( X E β E , σ 2 E I ) , β E ∈ R | E | , σ 2 E > 0 } . Target is β E . ◮ Full model by Lee et al. (2016), Berk et al. (2013): M = { N ( µ, σ 2 I ) , µ ∈ R n } . Target is β E ( µ ) = X † E µ . ◮ Nonparametric model: M = {⊗ n F : ( X , Y ) ∼ F } . Target is β E ( F ) = E F [ X T E X E ] − 1 E F [ X E · Y ].

What is the “selected” model? Suppose a set of variables E are suggested by the data for further investigation. ◮ Selected model by Fithian et al. (2014): M E = { N ( X E β E , σ 2 E I ) , β E ∈ R | E | , σ 2 E > 0 } . Target is β E . ◮ Full model by Lee et al. (2016), Berk et al. (2013): M = { N ( µ, σ 2 I ) , µ ∈ R n } . Target is β E ( µ ) = X † E µ . ◮ Nonparametric model: M = {⊗ n F : ( X , Y ) ∼ F } . Target is β E ( F ) = E F [ X T E X E ] − 1 E F [ X E · Y ]. A tool for valid inference after exploratory data analysis.

Selective inference on a DAG ◮ Incoporate randomness through ω X, y ω 1. ( X ∗ , y ∗ ) = ( X , y ) 2. ( X ∗ , y ∗ ) = ( X 1 , y 1 ) 3. ( X ∗ , y ∗ ) = ( X , y + ω ) ◮ Reference measure conditioning on X ∗ , y ∗ E , the yellow node. ◮ Target of inference can be E 1. Not E , but depends on the data E through E 2. “Liberating” target of inference from selection ¯ 3. E incorporate knowledge from E previous literature.

From selective inference to adaptive data analysis Denote the data by S S ω E ¯ E

From selective inference to adaptive data analysis Denote the data by S ω 1 ω 2 S E 1 E 2 ¯ E

Reference measure after selection ◮ Given any point null F 0 , use the conditional distribution F ∗ 0 as reference measure, dF ∗ 0 ( S ) = ℓ F ( S ) . dF 0 ◮ ℓ F is called the selective likelihood ratio . Depends on the selection algorithm and the randomization distribution ω ∼ G . ◮ Tests of the form H 0 : θ ( F ) = θ 0 can be reduced to testing point nulls, e.g. ◮ Score test ◮ Conditioning in exponential families

Computing the reference measure after selection ◮ Selection map ˆ Q results from an optimization problem, ˆ ℓ ( S ; β ) + P ( β ) + ω T β. β ( S , ω ) = arg min β E is the active set of ˆ β . ◮ Selection region A ( S ) = { ω : ˆ Q ( S , ω ) = E } , ω ∼ G � dF ∗ 0 ( S ) = dG ( ω ) . dF 0 A ( S ) S ω { ˆ Q ( S , ω ) = E } is difficult to describe. E

Computing the reference measure after selection ◮ Selection map ˆ Q results from an optimization problem, ˆ ℓ ( S ; β ) + P ( β ) + ω T β. β ( S , ω ) = arg min β E is the active set of ˆ β . ◮ Selection region A ( S ) = { ω : ˆ Q ( S , ω ) = E } , ω ∼ G � dF ∗ 0 ( S ) = dG ( ω ) . dF 0 A ( S ) Let ˆ z ( S , ω ) be the subgradient of the optimization problem. ˆ z − E ˆ β E S { (ˆ β E , ˆ z − E ) ∈ B} , B depends only on the penalty P . E

Monte-Carlo sampler for the conditional distribution Suppose F 0 has density f 0 and G has density g , dF ∗ ˆ z − E ˆ S β E 0 ( S ) dF 0 � g ( ψ ( S , ˆ z − E )) d ˆ = β E , ˆ β E d ˆ z − E , B E where ω = ψ ( S , ˆ β E , ˆ z − E ). ◮ The reparametrization map ψ is easy to compute, Harris et al. (2016) ◮ In simulation, we jointly sample ( S , ˆ β E , ˆ z − E ) from the density below, f 0 ( S ) g ( ψ ( S , ˆ β E , ˆ z − E )) 1 B . Samples of S can be used as reference measure for selective inference.

Interactive Data Analysis Easily generalizable in a sequential/interactive fashion. ˆ ˆ z − E β E S E f 0 ( S ) g ( ψ ( S , ˆ β E , ˆ z − E )) 1 B .

Interactive Data Analysis Easily generalizable in a sequential/interactive fashion. ˆ ˆ z − E 1 ˆ ˆ z − E 2 β E 1 β E 2 S E 1 E 2 f 0 ( S ) g ( ψ 1 ( S , ˆ z − E 1 )) 1 B 1 · g ( ψ 2 ( S , ˆ β E 1 , ˆ β E 2 , ˆ z − E 2 )) 1 B 2 . ◮ Flexible framework. Any selection procedure resulting from a “Loss + Penalty” convex problem. ◮ Examples such as Lasso, logistic Lasso, marginal screening, forward stepwise, graphical Lasso, group Lasso, are considered in Harris et al. (2016). ◮ Many more is possible.

Summary ◮ Selective inference on a DAG ◮ Selection: more than one shot ◮ Feasible implementation of the selective tests https://github.com/selective-inference/Python-software Thank you!

Berk, R., Brown, L., Buja, A., Zhang, K. & Zhao, L. (2013), ‘Valid post-selection inference’, The Annals of Statistics 41 (2), 802–837. URL: http://projecteuclid.org/euclid.aos/1369836961 Fithian, W., Sun, D. & Taylor, J. (2014), ‘Optimal Inference After Model Selection’, arXiv preprint arXiv:1410.2597 . arXiv: 1410.2597. URL: http://arxiv.org/abs/1410.2597 Harris, X. T., Panigrahi, S., Markovic, J., Bi, N. & Taylor, J. (2016), ‘Selective sampling after solving a convex problem’, arXiv preprint arXiv:1609.05609 . Lee, J. D., Sun, D. L., Sun, Y. & Taylor, J. E. (2016), ‘Exact post-selection inference with the lasso’, The Annals of Statistics 44 (3), 907–927. URL: http://projecteuclid.org/euclid.aos/1460381681

From selective inference to adaptive data analysis Xiaoying Tian - PowerPoint PPT Presentation

From selective inference to adaptive data analysis Xiaoying Tian Harris December 9, 2016 Acknowledgement My advisor: Jonathan Taylor Other coauthors: Snigdha Panigrahi Jelena Markovic Nan Bi Model selection Observe data ( y

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Texas Instruments & RFAB TI Information Selective Disclosure TI Information Selective

Cimzia Selective rebrand Concept A Cimzia Selective rebrand Logo Main / Colour Grayscale

Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1.

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Selective Inference via the Condition on Selection Framework: Inference after Variable Selection

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

CSC2412: Adaptive Data Analysis via Di ff erential Privacy Sasho Nikolov 1 The adaptive data

Selective Laser Trabeculoplasty Selective Laser Trabeculoplasty SLT SLT Jorge

Selective W eb Archiving at the Germ an National Library 1 | 8 | Selective Web Archiving

Introduction to Machine Learning Active Learning Barnabs Pczos 1 Credits Some of the

Selection Detection and Two-Sample-Testing: Generalized Greenwood Statistics and their

Policy Exploration for JITDs (Java) By Team Datum Cracking Results from Paper vs. Observed

Implementing Quantile Selection Models in Stata Mariel Siravegna Ercio Munoz Georgetown

Q4 Financial Results Fiscal 2016 Lee D. Rudow President and CEO Michael J. Tschiderer Chief

Selective Private Function Evaluation Johan Wall en Based on Ran Canetti, Yuval Ishai, Ravi

Lecture 4 Capacity of Wireless Channels I-Hsiang Wang ihwang@ntu.edu.tw 3/20,

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D ECISION M AKING

From selective inference to adaptive data analysis Xiaoying Tian - PowerPoint PPT Presentation

From selective inference to adaptive data analysis Xiaoying Tian Harris December 9, 2016 Acknowledgement My advisor: Jonathan Taylor Other coauthors: Snigdha Panigrahi Jelena Markovic Nan Bi Model selection Observe data ( y

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Texas Instruments &amp; RFAB TI Information Selective Disclosure TI Information Selective

Cimzia Selective rebrand Concept A Cimzia Selective rebrand Logo Main / Colour Grayscale

Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1.

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Selective Inference via the Condition on Selection Framework: Inference after Variable Selection

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

CSC2412: Adaptive Data Analysis via Di ff erential Privacy Sasho Nikolov 1 The adaptive data

Selective Laser Trabeculoplasty Selective Laser Trabeculoplasty SLT SLT Jorge

Selective W eb Archiving at the Germ an National Library 1 | 8 | Selective Web Archiving

Introduction to Machine Learning Active Learning Barnabs Pczos 1 Credits Some of the

Selection Detection and Two-Sample-Testing: Generalized Greenwood Statistics and their

Policy Exploration for JITDs (Java) By Team Datum Cracking Results from Paper vs. Observed

Implementing Quantile Selection Models in Stata Mariel Siravegna Ercio Munoz Georgetown

Q4 Financial Results Fiscal 2016 Lee D. Rudow President and CEO Michael J. Tschiderer Chief

Selective Private Function Evaluation Johan Wall en Based on Ran Canetti, Yuval Ishai, Ravi

Lecture 4 Capacity of Wireless Channels I-Hsiang Wang ihwang@ntu.edu.tw 3/20,

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D ECISION M AKING

Texas Instruments & RFAB TI Information Selective Disclosure TI Information Selective