High-Dimensional Variable Selection in Nonlinear Models that - PowerPoint PPT Presentation

Knockoffs (Barber and Cand` es, 2015) y and X j are n × 1 column vectors of data: n draws from the random variables Y and X j , respectively; design matrix X := [ X 1 · · · X p ] (1) Construct knockoffs : Knockoffs ˜ X j must satisfy, ( ˜ X := [ ˜ X 1 · · · ˜ X p ] ) � � X ⊤ X X ⊤ X − diag { s } [ X ˜ X ] ⊤ [ X ˜ X ] = X ⊤ X − diag { s } X ⊤ X (2) Compute knockoff statistics : Sufficiency: W j only a function of [ X ˜ X ] ⊤ [ X ˜ X ] and [ X ˜ X ] ⊤ y Antisymmetry: swapping values of X j and ˜ X j flips sign of W j Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 5 / 18

Knockoffs (Barber and Cand` es, 2015) y and X j are n × 1 column vectors of data: n draws from the random variables Y and X j , respectively; design matrix X := [ X 1 · · · X p ] (1) Construct knockoffs : Knockoffs ˜ X j must satisfy, ( ˜ X := [ ˜ X 1 · · · ˜ X p ] ) � � X ⊤ X X ⊤ X − diag { s } [ X ˜ X ] ⊤ [ X ˜ X ] = X ⊤ X − diag { s } X ⊤ X (2) Compute knockoff statistics : Sufficiency: W j only a function of [ X ˜ X ] ⊤ [ X ˜ X ] and [ X ˜ X ] ⊤ y Antisymmetry: swapping values of X j and ˜ X j flips sign of W j (3) Find the knockoff threshold : Order the variables by decreasing | W j | and proceed down list Select only variables with positive W j until last time negatives positives ≤ q Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 5 / 18

Knockoffs (Barber and Cand` es, 2015) y and X j are n × 1 column vectors of data: n draws from the random variables Y and X j , respectively; design matrix X := [ X 1 · · · X p ] (1) Construct knockoffs : Knockoffs ˜ X j must satisfy, ( ˜ X := [ ˜ X 1 · · · ˜ X p ] ) � � X ⊤ X X ⊤ X − diag { s } [ X ˜ X ] ⊤ [ X ˜ X ] = X ⊤ X − diag { s } X ⊤ X (2) Compute knockoff statistics : Sufficiency: W j only a function of [ X ˜ X ] ⊤ [ X ˜ X ] and [ X ˜ X ] ⊤ y Antisymmetry: swapping values of X j and ˜ X j flips sign of W j (3) Find the knockoff threshold : Order the variables by decreasing | W j | and proceed down list Select only variables with positive W j until last time negatives positives ≤ q Comments: Finite-sample FDR control and leverages sparsity for power Requires data follow low-dimensional ( n ≥ p ) Gaussian linear model Canonical approach: condition on X , rely heavily on model for y Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 5 / 18

Generalizing the Knockoffs Procedure (1) Construct knockoffs : Artificial versions (“knockoffs”) of each variable Act as controls for assessing importance of original variables Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18

Generalizing the Knockoffs Procedure (1) Construct knockoffs : Artificial versions (“knockoffs”) of each variable Act as controls for assessing importance of original variables (2) Compute knockoff statistics : Scalar statistic W j for each variable Measures how much more important a variable appears than its knockoff Positive W j denotes original more important, strength measured by magnitude Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18

Generalizing the Knockoffs Procedure (1) Construct knockoffs : Artificial versions (“knockoffs”) of each variable Act as controls for assessing importance of original variables (2) Compute knockoff statistics : Scalar statistic W j for each variable Measures how much more important a variable appears than its knockoff Positive W j denotes original more important, strength measured by magnitude (3) Find the knockoff threshold : (same as before) Order the variables by decreasing | W j | and proceed down list Select only variables with positive W j until last time negatives positives ≤ q Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18

Generalizing the Knockoffs Procedure (1) Construct knockoffs : Artificial versions (“knockoffs”) of each variable Act as controls for assessing importance of original variables (2) Compute knockoff statistics : Scalar statistic W j for each variable Measures how much more important a variable appears than its knockoff Positive W j denotes original more important, strength measured by magnitude (3) Find the knockoff threshold : (same as before) Order the variables by decreasing | W j | and proceed down list Select only variables with positive W j until last time negatives positives ≤ q Coin-flipping property : The key to knockoffs is that steps (1) and (2) are done specifically to ensure that, conditional on | W 1 | , . . . , | W p | , the signs of the unimportant/null W j are independently ± 1 with probability 1 / 2 Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18

New Interpretation of Knockoffs Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18

Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Explicitly, iid rows of X = ( X i, 1 , . . . , X i,p ) ∼ G where G can be arbitrary but is assumed known Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Explicitly, iid rows of X = ( X i, 1 , . . . , X i,p ) ∼ G where G can be arbitrary but is assumed known As compared to original knockoffs, removes Restriction on dimension Linear model requirement for Y | X 1 , . . . , X p “Sufficiency” constraint for W j Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Explicitly, iid rows of X = ( X i, 1 , . . . , X i,p ) ∼ G where G can be arbitrary but is assumed known As compared to original knockoffs, removes Restriction on dimension Linear model requirement for Y | X 1 , . . . , X p “Sufficiency” constraint for W j The rows of X must be i.i.d., not the columns (covariates) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Explicitly, iid rows of X = ( X i, 1 , . . . , X i,p ) ∼ G where G can be arbitrary but is assumed known As compared to original knockoffs, removes Restriction on dimension Linear model requirement for Y | X 1 , . . . , X p “Sufficiency” constraint for W j The rows of X must be i.i.d., not the columns (covariates) Nothing about y ’s distribution is assumed or need be known Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Explicitly, iid rows of X = ( X i, 1 , . . . , X i,p ) ∼ G where G can be arbitrary but is assumed known As compared to original knockoffs, removes Restriction on dimension Linear model requirement for Y | X 1 , . . . , X p “Sufficiency” constraint for W j The rows of X must be i.i.d., not the columns (covariates) Nothing about y ’s distribution is assumed or need be known Robust to overfitting X ’s distribution in preliminary experiments Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

Robustness 1.00 1.00 Exact Cov ● 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

Robustness 1.00 1.00 Exact Cov ● 0.75 ● 0.75 Graph. Lasso Power FDR 0.50 0.50 0.25 0.25 ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 100% Emp. Cov 0.00 0.00 ● ● 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

Shifting the Burden of Knowledge When is it appropriate? 1. Subjects sampled from a population, and 2a. X j highly structured, well-studied, or well-understood, OR Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18

Shifting the Burden of Knowledge When is it appropriate? 1. Subjects sampled from a population, and 2a. X j highly structured, well-studied, or well-understood, OR 2b. Large set of unsupervised X data (without Y ’s) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18

Shifting the Burden of Knowledge When is it appropriate? 1. Subjects sampled from a population, and 2a. X j highly structured, well-studied, or well-understood, OR 2b. Large set of unsupervised X data (without Y ’s) For instance, many genome-wide association studies satisfy all conditions: 1. Subjects sampled from a population (oversampling cases still valid) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18

Shifting the Burden of Knowledge When is it appropriate? 1. Subjects sampled from a population, and 2a. X j highly structured, well-studied, or well-understood, OR 2b. Large set of unsupervised X data (without Y ’s) For instance, many genome-wide association studies satisfy all conditions: 1. Subjects sampled from a population (oversampling cases still valid) 2a. Strong spatial structure: linkage disequilibrium models, e.g., Markov chains, are well-studied and work well Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18

Shifting the Burden of Knowledge When is it appropriate? 1. Subjects sampled from a population, and 2a. X j highly structured, well-studied, or well-understood, OR 2b. Large set of unsupervised X data (without Y ’s) For instance, many genome-wide association studies satisfy all conditions: 1. Subjects sampled from a population (oversampling cases still valid) 2a. Strong spatial structure: linkage disequilibrium models, e.g., Markov chains, are well-studied and work well 2b. Other studies have collected same or similar SNP arrays on different subjects Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18

The New Knockoffs Procedure (1) Construct knockoffs : Exchangeability D [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] X p ] Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18

The New Knockoffs Procedure (1) Construct knockoffs : Exchangeability D [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] X p ] (2) Compute knockoff statistics : Variable importance measure Z Antisymmetric function f j : R 2 → R , i.e., f j ( z 1 , z 2 ) = − f j ( z 2 , z 1 ) W j = f j ( Z j , � Z j ) , where Z j and � Z j are the variable importances of X j and ˜ X j , respectively Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18

The New Knockoffs Procedure (1) Construct knockoffs : Exchangeability D [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] X p ] (2) Compute knockoff statistics : Variable importance measure Z Antisymmetric function f j : R 2 → R , i.e., f j ( z 1 , z 2 ) = − f j ( z 2 , z 1 ) W j = f j ( Z j , � Z j ) , where Z j and � Z j are the variable importances of X j and ˜ X j , respectively (3) Find the knockoff threshold : (same as before) Order the variables by decreasing | W j | and proceed down list Select only variables with positive W j until last time negatives positives ≤ q Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18

Step (1): Construct Knockoffs Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18

Knockoff Construction Proof that valid knockoff variables can be generated for any X distribution Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18

Knockoff Construction Proof that valid knockoff variables can be generated for any X distribution If ( X 1 , . . . , X p ) multivariate Gaussian, exchangeability reduces to matching first and second moments when X j , ˜ X j swapped For Cov( X 1 , . . . , X p ) = Σ : � � Σ Σ − diag { s } Cov( X 1 , . . . , X p , ˜ X 1 , . . . , ˜ X p ) = Σ − diag { s } Σ For non-Gaussian X , still second-order-correct approximate knockoffs Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18

Knockoff Construction Proof that valid knockoff variables can be generated for any X distribution If ( X 1 , . . . , X p ) multivariate Gaussian, exchangeability reduces to matching first and second moments when X j , ˜ X j swapped For Cov( X 1 , . . . , X p ) = Σ : � � Σ Σ − diag { s } Cov( X 1 , . . . , X p , ˜ X 1 , . . . , ˜ X p ) = Σ − diag { s } Σ For non-Gaussian X , still second-order-correct approximate knockoffs Linear algebra and semidefinite programming to find good s Recently: construction for Markov chains and HMMs (Sesia et al., 2017) Constructions also possible for grouped variables (Dai and Barber, 2016) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18

Step (2): Compute Knockoff Statistics Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18

Strategy for Choosing Knockoff Statistics Recall W j an antisymmetric function f j of Z j and � Z j (the variable importances of X j and ˜ X j , respectively): W j = f j ( Z j , � Z j ) = − f j ( � Z j , Z j ) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 12 / 18

Strategy for Choosing Knockoff Statistics Recall W j an antisymmetric function f j of Z j and � Z j (the variable importances of X j and ˜ X j , respectively): W j = f j ( Z j , � Z j ) = − f j ( � Z j , Z j ) For example, Z is magnitude of fitted coefficient β from a lasso regression of y on [ X ˜ X ] f j ( z 1 , z 2 ) = z 1 − z 2 Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 12 / 18

Strategy for Choosing Knockoff Statistics Recall W j an antisymmetric function f j of Z j and � Z j (the variable importances of X j and ˜ X j , respectively): W j = f j ( Z j , � Z j ) = − f j ( � Z j , Z j ) For example, Z is magnitude of fitted coefficient β from a lasso regression of y on [ X ˜ X ] f j ( z 1 , z 2 ) = z 1 − z 2 Lasso Coefficient Difference (LCD) statistic: W j = | β j | − | ˜ β j | Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 12 / 18

Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · � � � �� · · · X j · · · ˜ · · · X j · · · ˜ � = Z j y , X j · · · , Z j y , X j · · · Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · � � � �� · · · X j · · · ˜ · · · X j · · · ˜ � = Z j y , X j · · · , Z j y , X j · · · � � � = Z j , Z j Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · � � � �� · · · X j · · · ˜ · · · X j · · · ˜ � = Z j y , X j · · · , Z j y , X j · · · � � � = Z j , Z j D W j = f j ( Z j , � = f j ( � Z j ) Z j , Z j ) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · � � � �� · · · X j · · · ˜ · · · X j · · · ˜ � = Z j y , X j · · · , Z j y , X j · · · � � � = Z j , Z j D W j = f j ( Z j , � = f j ( � Z j , Z j ) = − f j ( Z j , � Z j ) Z j ) = − W j Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · � � � �� · · · X j · · · ˜ · · · X j · · · ˜ � = Z j y , X j · · · , Z j y , X j · · · � � � = Z j , Z j D W j = − W j Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Higher-level adaptivity: CV to choose best-fitting model for inference Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Higher-level adaptivity: CV to choose best-fitting model for inference − E.g., fit random forest and ℓ 1 -penalized regression; derive feature importance from whichever has lower CV error— still strict FDR control Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Higher-level adaptivity: CV to choose best-fitting model for inference − E.g., fit random forest and ℓ 1 -penalized regression; derive feature importance from whichever has lower CV error— still strict FDR control Can even let analyst look at (masked version of) data to choose Z function Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Higher-level adaptivity: CV to choose best-fitting model for inference − E.g., fit random forest and ℓ 1 -penalized regression; derive feature importance from whichever has lower CV error— still strict FDR control Can even let analyst look at (masked version of) data to choose Z function Prior information Bayesian approach : choose prior and model, and Z j could be the posterior probability that X j contributes to the model Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Higher-level adaptivity: CV to choose best-fitting model for inference − E.g., fit random forest and ℓ 1 -penalized regression; derive feature importance from whichever has lower CV error— still strict FDR control Can even let analyst look at (masked version of) data to choose Z function Prior information Bayesian approach : choose prior and model, and Z j could be the posterior probability that X j contributes to the model Still strict FDR control, even if wrong prior or MCMC has not converged Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

Step (3): Find the Knockoff Threshold Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : 0 Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : W 10 W 2 W 9 W 7 W 6 0 W 3 W 8 W 1 W 4 W 5 Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 0 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 0 0 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 0 0 0 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 1 0 0 0 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 1 1 0 0 0 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 1 1 1 0 0 0 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 2 1 1 1 0 0 0 5 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 3 2 1 1 1 0 0 0 5 5 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 3 3 2 1 1 1 0 0 0 6 5 5 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 3 3 3 2 1 1 1 0 0 0 7 6 5 5 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 3 3 3 2 1 1 1 0 0 0 7 6 5 5 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% τ ˆ Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 7 | | W 6 | 0 | W 1 | | W 4 | | W 5 | S = { 1 , 4 , 5 , 6 , 7 } # { negative W j } # { positive W j } q = 20% τ ˆ Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

Intuition for FDR Control � � # { null X j selected } FDR = E # { total X j selected } Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18

Intuition for FDR Control � � # { null X j selected } FDR = E # { total X j selected } � � # { null positive | W j | > ˆ τ } = E # { positive | W j | > ˆ τ } Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18

Intuition for FDR Control � � # { null X j selected } FDR = E # { total X j selected } � � # { null positive | W j | > ˆ τ } = E # { positive | W j | > ˆ τ } � � # { null negative | W j | > ˆ τ } ≈ E # { positive | W j | > ˆ τ } Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18

Intuition for FDR Control � � # { null X j selected } FDR = E # { total X j selected } � � # { null positive | W j | > ˆ τ } = E # { positive | W j | > ˆ τ } � � # { null negative | W j | > ˆ τ } ≈ E # { positive | W j | > ˆ τ } q � � # { negative | W j | > ˆ τ } ≤ E # { positive | W j | > ˆ τ } ˆ τ Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18

GWAS Application Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18

Genetic Analysis of Crohn’s Disease 2007 case-control study by WTCCC n ≈ 5 , 000 , p ≈ 375 , 000 ; preprocessing mirrored original analysis Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18

Genetic Analysis of Crohn’s Disease 2007 case-control study by WTCCC n ≈ 5 , 000 , p ≈ 375 , 000 ; preprocessing mirrored original analysis Strong spatial structure : second-order knockoffs generated using genetic covariance estimate (Wen and Stephens, 2010) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18

High-Dimensional Variable Selection in Nonlinear Models that - PowerPoint PPT Presentation

High-Dimensional Variable Selection in Nonlinear Models that Controls the False Discovery Rate Lucas Janson Harvard University Department of Statistics blank line blank line CMSA Big Data Conference, August 18, 2017 Collaborators : Emmanuel

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Using Stata to estimate nonlinear models with fixed effects Paulo high-dimensional fixed effects

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Nonlinear Control Lecture # 1 Introduction & Two-Dimensional Systems Nonlinear Control

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3:

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear

Nonlinear Control Lecture # 2 Two-Dimensional Systems Nonlinear Control Lecture # 2

Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27

Nonlinear Control Lecture # 3 Two-Dimensional Systems Nonlinear Control Lecture # 3

Bicycle helmet efficacy : 1st case-control study in France Emmanuelle Amoros, Mireille Chiron,

Ion ReproSeq PGS Kits Sequence46 Disclosures Shareholder in Sequence46 Thermo Fisher

DISCLOSURES There are no conflicts of interest or relevant financial interests that have been

Ra ndo mize d Co ntro lle d T ria ls, De ve lo pme nt E c o no mic s a nd Po lic y Ma king in

HEALTHY LIMBS, FEET & SPINE P A U L S P O N S E L L E R M D M D A D A M B I T T E R M A

NYU Non-Tuberculous Mycobacteria Program Research Leopoldo N.

Opioid Overdose: What the Data Tell Us About Who is at Risk June 12, 2015 Exploring Naloxone Uptake

Listing of Abstracts Accepted for Oral and Poster Presentation for the General Presentation

High-Dimensional Variable Selection in Nonlinear Models that - PowerPoint PPT Presentation

High-Dimensional Variable Selection in Nonlinear Models that Controls the False Discovery Rate Lucas Janson Harvard University Department of Statistics blank line blank line CMSA Big Data Conference, August 18, 2017 Collaborators : Emmanuel

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Using Stata to estimate nonlinear models with fixed effects Paulo high-dimensional fixed effects

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Nonlinear Control Lecture # 1 Introduction &amp; Two-Dimensional Systems Nonlinear Control

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3:

Luigi Spezia Biomathematics &amp; Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear

Nonlinear Control Lecture # 2 Two-Dimensional Systems Nonlinear Control Lecture # 2

Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27

Nonlinear Control Lecture # 3 Two-Dimensional Systems Nonlinear Control Lecture # 3

Bicycle helmet efficacy : 1st case-control study in France Emmanuelle Amoros, Mireille Chiron,

Ion ReproSeq PGS Kits Sequence46 Disclosures Shareholder in Sequence46 Thermo Fisher

DISCLOSURES There are no conflicts of interest or relevant financial interests that have been

Ra ndo mize d Co ntro lle d T ria ls, De ve lo pme nt E c o no mic s a nd Po lic y Ma king in

HEALTHY LIMBS, FEET &amp; SPINE P A U L S P O N S E L L E R M D M D A D A M B I T T E R M A

NYU Non-Tuberculous Mycobacteria Program Research Leopoldo N.

Opioid Overdose: What the Data Tell Us About Who is at Risk June 12, 2015 Exploring Naloxone Uptake

Listing of Abstracts Accepted for Oral and Poster Presentation for the General Presentation

Nonlinear Control Lecture # 1 Introduction & Two-Dimensional Systems Nonlinear Control

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

HEALTHY LIMBS, FEET & SPINE P A U L S P O N S E L L E R M D M D A D A M B I T T E R M A