what s happening in selective inference ii
play

Whats Happening in Selective Inference II? Emmanuel Cand` es, - PowerPoint PPT Presentation

Whats Happening in Selective Inference II? Emmanuel Cand` es, Stanford University The 2017 Wald Lectures, Joint Statistical Meetings, Baltimore, August 2017 Lecture 2: Special dedication Chiara Sabatti Agenda: The knockoff machine (1) The


  1. Knockoffs as negative controls ● Original ● Knockoffs ● 4 ● Feature Importance ● 3 ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000 Variables

  2. Exchangeability of feature importance statistics ● 4 ● Knockoff agnostic feature importance Z ● 3 ● ● ● ● ● ● ● ● , ˜ Z 1 , . . . , ˜ ) = z ([ X , ˜ ● ( Z 1 , . . . , Z p X ] , y ) ● Z p ● ● 2 ● ● ● ● ● ● � �� � � �� � ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● originals knockoffs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000

  3. Exchangeability of feature importance statistics ● 4 ● Knockoff agnostic feature importance Z ● 3 ● ● ● ● ● ● ● ● , ˜ Z 1 , . . . , ˜ ) = z ([ X , ˜ ● ( Z 1 , . . . , Z p X ] , y ) ● Z p ● ● 2 ● ● ● ● ● ● � �� � � �� � ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● originals knockoffs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000 This lecture Can construct knockoff features such that d ( Z j , ˜ = ( ˜ ⇒ j null = Z j ) Z j , Z j ) d ( Z, ˜ = ( Z, ˜ T subset of nulls ⇒ more generally = Z ) swap ( T ) Z ) ˜ ˜ ˜ Z 1 Z 2 Z p Z 1 Z 2 Z p

  4. Knockoffs-adjusted scores 0 __ __ __ __ + + + + + + + |W| if null Ordering of variables + 1-bit p-values Adjusted scores W j with flip-sign property Combine Z j and ˜ Z j into single (knockoff) score W j W j = w j ( Z j , ˜ w j ( ˜ Z j , Z j ) = − w j ( Z j , ˜ Z j ) Z j ) � Z j > ˜ +1 Z j W j = Z j − ˜ W j = Z j ∨ ˜ e.g. Z j · Z j Z j ≤ ˜ − 1 Z j = ⇒ Conditional on | W | , signs of null W j ’s are i.i.d. coin flips

  5. Selection by sequential testing __ __ __ ... + + + + + + + + + + + + 0 |W| t FDP ( t ) = 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } � Select S + ( t ) = ⇒ S − ( t ) = { j : W j ≤ − t } 1 ∨ |S + ( t ) | Theorem (Barber and C. (’15)) Select S + ( τ ) , τ = min { t : � FDP ( t ) ≤ q } Knockoff � # false positives � ≤ q E # selections + q − 1 Knockoff+ � # false positives � ≤ q E # selections

  6. Why Can We Invert the Estimate of FDP? Proof Sketch of FDR Control

  7. Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __

  8. Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ FDP ( τ ) = # { j null : j ∈ S + ( τ ) } # { j : j ∈ S + ( τ ) } ∨ 1

  9. Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ FDP ( τ ) = # { j null : j ∈ S + ( τ )) } # { j : j ∈ S + ( τ ) } ∨ 1 · 1 + # { j null : j ∈ S − ( τ ) } 1 + # { j null : j ∈ S − ( τ ) }

  10. Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ V + ( τ ) � �� � # { j null : j ∈ S + ( τ ) } FDP ( τ ) ≤ q · 1 + # { j null : j ∈ S − ( τ ) } � �� � V − ( τ )

  11. Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ V + ( τ ) � �� � # { j null : j ∈ S + ( τ ) } FDP ( τ ) ≤ q · 1 + # { j null : j ∈ S − ( τ ) } � �� � V − ( τ ) To show � � V + ( τ ) ≤ 1 E 1 + V − ( τ )

  12. Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | t if null

  13. Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | s t if null

  14. Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | s t if null V + ( s ) + V − ( s ) = m Conditioned on V + ( s ) + V − ( s ) , V + ( s ) is hypergeometric

  15. Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | s t if null V + ( s ) + V − ( s ) = m Conditioned on V + ( s ) + V − ( s ) , V + ( s ) is hypergeometric � � V + ( s ) V + ( t ) 1 + V − ( s ) | V ± ( t ) , V + ( s ) + V − ( s ) ≤ E 1 + V − ( t )

  16. Optional stopping theorem 0 τ if null   Bin (# nulls , 1 / 2) � �� � � �   V + ( τ ) V + (0)   FDR ≤ q E ≤ q E    ≤ q 1 + V − ( τ ) 1 + V − (0) 

  17. Knockoffs for Fixed Features Joint with Barber

  18. Linear model � j β j X j ���� y ∼ N ( Xβ, σ 2 I ) y = Xβ + z n × 1 n × p p × 1 n × 1 Fixed design X Noise level σ unknown Multiple testing: H j : β j = 0 (is j th variable in the model?) Identifiability = ⇒ p ≤ n Inference (FDR control) will hold conditionally on X

  19. Knockoff features (fixed X ) Originals Knocko fg s

  20. Knockoff features (fixed X ) Originals Knocko fg s ˜ j ˜ X ′ X k = X ′ j X k for all j, k ˜ X ′ j X k = X ′ j X k for all j � = k

  21. Knockoff features (fixed X ) Originals Knocko fg s ˜ j ˜ X ′ X k = X ′ j X k for all j, k ˜ X ′ j X k = X ′ j X k for all j � = k No need for new data or experiment No knowledge of response y

  22. Knockoff construction ( n ≥ 2 p ) X ∈ R n × p s.t. Problem: given X ∈ R n × p , find ˜ � � � � ′ � � Σ Σ − diag { s } ˜ ˜ = := G X X X X Σ − diag { s } Σ

  23. Knockoff construction ( n ≥ 2 p ) X ∈ R n × p s.t. Problem: given X ∈ R n × p , find ˜ � � � � ′ � � Σ Σ − diag { s } ˜ ˜ = := G � 0 X X X X Σ − diag { s } Σ

  24. Knockoff construction ( n ≥ 2 p ) X ∈ R n × p s.t. Problem: given X ∈ R n × p , find ˜ � � � � ′ � � Σ Σ − diag { s } ˜ ˜ = := G � 0 X X X X Σ − diag { s } Σ diag { s } � 0 G � 0 ⇐ ⇒ 2Σ − diag { s } � 0

  25. Knockoff construction ( n ≥ 2 p ) X ∈ R n × p s.t. Problem: given X ∈ R n × p , find ˜ � � � � ′ � � Σ Σ − diag { s } ˜ ˜ = := G � 0 X X X X Σ − diag { s } Σ diag { s } � 0 G � 0 ⇐ ⇒ 2Σ − diag { s } � 0 Solution X = X ( I − Σ − 1 diag { s } ) + ˜ ˜ UC U ∈ R n × p with col. space orthogonal to that of X ˜ C ′ C Cholevsky factorization of 2 diag { s } − diag { s } Σ − 1 diag { s } � 0

  26. Knockoff construction ( n ≥ 2 p ) X ′ ˜ j X j = 1 − s j (Standardized columns) Equi-correlated knockoffs s j = 2 λ min (Σ) ∧ 1 Under equivariance, minimizes the value of |� X j , ˜ X j �|

  27. Knockoff construction ( n ≥ 2 p ) X ′ ˜ j X j = 1 − s j (Standardized columns) Equi-correlated knockoffs s j = 2 λ min (Σ) ∧ 1 Under equivariance, minimizes the value of |� X j , ˜ X j �| SDP knockoffs � minimize j | 1 − s j | subject to s j ≥ 0 diag { s } � 2Σ Highly structured semidefinite program (SDP)

  28. Knockoff construction ( n ≥ 2 p ) X ′ ˜ j X j = 1 − s j (Standardized columns) Equi-correlated knockoffs s j = 2 λ min (Σ) ∧ 1 Under equivariance, minimizes the value of |� X j , ˜ X j �| SDP knockoffs � minimize j | 1 − s j | subject to s j ≥ 0 diag { s } � 2Σ Highly structured semidefinite program (SDP) Other possibilities ...

  29. Why? For null feature X j = ˜ d j Xβ + ˜ j z = ˜ X ′ j y = X ′ j Xβ + X ′ X ′ X ′ X ′ j z j y � �

  30. Why? For null feature X j = ˜ d j Xβ + ˜ j z = ˜ X ′ j y = X ′ j Xβ + X ′ X ′ X ′ X ′ j z j y � �

  31. Why? For any subset of nulls T X ] ′ y d [ X ˜ X ] ′ = [ X ˜ swap( T ) y � � [ X ˜ X ] ′ swap( T ) =

  32. Exchangeability of feature importance statistics Sufficiency: �� � � ′ � � � � ′ y ( Z, ˜ ˜ ˜ ˜ Z ) = z X X X X , X X Knockoff-agnostic: swapping originals and knockoffs = ⇒ swaps Z ’s � � ˜ swap( T ) , y ) = ( Z, ˜ z ( Z ) swap( T ) X X

  33. Exchangeability of feature importance statistics Sufficiency: �� � � ′ � � � � ′ y ( Z, ˜ ˜ ˜ ˜ Z ) = z X X X X , X X Knockoff-agnostic: swapping originals and knockoffs = ⇒ swaps Z ’s � � ˜ swap( T ) , y ) = ( Z, ˜ z ( Z ) swap( T ) X X Theorem (Barber and C. (15)) For any subset T of nulls d = ( Z, ˜ ( Z, Z ) swap( T ) Z ) = ⇒ FDR control (conditional on X ) ˜ ˜ ˜ Z 1 Z 2 Z p Z 1 Z 2 Z p

  34. Telling the effect direction [...] in classical statistics, the significance of comparisons (e. g., θ 1 − θ 2 ) is calibrated using Type I error rate, relying on the assumption that the true difference is zero, which makes no sense in many applications. [...] a more relevant framework in which a true comparison can be positive or negative, and, based on the data, you can state “ θ 1 > θ 2 with confidence”, “ θ 2 > θ 1 with confidence”, or “no claim with confidence”. A. Gelman & F. Tuerlinckx

  35. Directional FDR Are any effects exactly zero? � # selections with wrong effect direction � FDR dir = E # selections ↑ � �� � Directional false discovery rate Directional false discovery proportion Directional FDR (Benjamini & Yekutieli, ’05) Sign errors (Type-S) (Gelman & Tuerlinckx, ’00) Important for misspecified models — exact sparsity unlikely

  36. Directional FDR control ∼ N ( s j · β j , 2 σ 2 · s j ) ind ( X j − ˜ X j ) ′ y s j ≥ 0 sgn (( X j − ˜ X j ) ′ y ) Sign estimate �

  37. Directional FDR control ∼ N ( s j · β j , 2 σ 2 · s j ) ind ( X j − ˜ X j ) ′ y s j ≥ 0 sgn (( X j − ˜ X j ) ′ y ) Sign estimate � Theorem (Barber and C., ’16) Exact same knockoff selection + sign estimate FDR ≤ FDR dir ≤ q

  38. Directional FDR control ∼ N ( s j · β j , 2 σ 2 · s j ) ind ( X j − ˜ X j ) ′ y s j ≥ 0 sgn (( X j − ˜ X j ) ′ y ) Sign estimate � Theorem (Barber and C., ’16) Exact same knockoff selection + sign estimate FDR ≤ FDR dir ≤ q Null coin f ips are unbiased null non null 0 __ __ __ __ __ + + + + + + + + + |W|

  39. Directional FDR control ∼ N ( s j · β j , 2 σ 2 · s j ) ind ( X j − ˜ X j ) ′ y s j ≥ 0 sgn (( X j − ˜ X j ) ′ y ) Sign estimate � Theorem (Barber and C. (16)) Exact same knockoff selection + sign estimate FDR ≤ FDR dir ≤ q Great subtlety: coin f ips are now biased 0 __ __ __ __ __ + + + + + + + + + |W|

  40. Empirical results Features N (0 , I n ) , n = 3000 , p = 1000 k = 30 variables with regression coefficients of magnitude 3.5 Method FDR (%) Power (%) Theor. FDR (nominal level q = 20% ) control? Knockoff+ (equivariant) 14.40 60.99 Yes Knockoff (equivariant) 17.82 66.73 No Knockoff+ (SDP) 15.05 61.54 Yes Knockoff (SDP) 18.72 67.50 No BHq 18.70 48.88 No BHq + log-factor correction 2.20 19.09 Yes BHq with whitened noise 18.79 2.33 Yes

  41. Effect of signal amplitude Same setup with k = 30 ( q = 0 . 2 ) 25 100 ● ● 20 80 ● ● ● ● ● ● ● ● ● ● 15 ● 60 ● ● Power (%) FDR (%) ● ● ● ● ● ● ● ● ● ● ● ● 10 40 ● ● ● 5 Nominal level 20 Knockoff Knockoff Knockoff+ ● ● Knockoff+ BHq BHq 0 0 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 Amplitude A Amplitude A

  42. Effect of feature correlation Θ jk = ρ | j − k | Features ∼ N (0 , Θ) n = 3000 , p = 1000 , and k = 30 and amplitude = 3 . 5 30 100 Nominal level Knockoff Knockoff Knockoff+ ● Knockoff+ BHq 25 ● 80 BHq 20 ● 60 Power (%) ● FDR (%) ● 15 ● ● ● ● ● 40 ● ● 10 ● ● ● ● 20 5 ● ● ● ● ● ● 0 0 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 Feature correlation ρ Feature correlation ρ

  43. Fixed Design Knockoff Data Analysis

  44. HIV drug resistance Drug type # drugs Sample size # protease or RT # mutations appearing positions genotyped ≥ 3 times in sample PI 6 848 99 209 NRTI 6 639 240 294 NNRTI 3 747 240 319 response y : log-fold-increase of lab-tested drug resistance covariate X j : presence or absence of mutation # j Data from R. Shafer (Stanford) available at: http://hivdb.stanford.edu/pages/published_analysis/genophenoPNAS2006/

  45. HIV data TSM list: mutations associated with the PI class of drugs in general, and is not specialized to the individual drugs in the class Resistance to APV Resistance to ATV Resistance to IDV # HIV−1 protease positions selected Appear in TSM list 35 35 35 Not in TSM list 30 30 30 25 25 25 20 20 20 Results for 15 15 15 10 10 10 PI type drugs 5 5 5 0 0 0 Knockoff BHq Knockoff BHq Knockoff BHq Data set size: n=768, p=201 Data set size: n=329, p=147 Data set size: n=826, p=208 Resistance to LPV Resistance to NFV Resistance to SQV 35 35 35 30 30 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 0 0 0 Knockoff BHq Knockoff BHq Knockoff BHq Data set size: n=516, p=184 Data set size: n=843, p=209 Data set size: n=825, p=208

  46. HIV data Resistance to X3TC Resistance to ABC Resistance to AZT # HIV−1 RT positions selected Appear in TSM list 30 30 30 Not in TSM list 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 0 0 0 Knockoff BHq Knockoff BHq Knockoff BHq Data set size: n=633, p=292 Data set size: n=628, p=294 Data set size: n=630, p=292 Results for Resistance to D4T Resistance to DDI Resistance to TDF NRTI type drugs 30 30 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 0 0 0 Knockoff BHq Knockoff BHq Knockoff BHq Data set size: n=630, p=293 Data set size: n=632, p=292 Data set size: n=353, p=218 Results for Resistance to DLV Resistance to EFV Resistance to NVP NNRTI type drugs # HIV−1 RT positions selected 35 35 35 Appear in TSM list 30 Not in TSM list 30 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 0 0 0 Knockoff BHq Knockoff BHq Knockoff BHq Data set size: n=732, p=311 Data set size: n=734, p=318 Data set size: n=746, p=319

  47. High-dimensional setting n ≈ 5 , 000 subjects p ≈ 330 , 000 SNPs/vars to test HDL cholesterol CETP 20 –log 10 ( P value) 15 LPL LIPC 10 ABCA1 GALNT2 LIPG MVK/MMAB LCAT 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 p > n − → cannot construct knockoffs as before ˜ j ˜ X ′ X k = X ′ ∀ j, k j X k ˜ ⇒ ∀ j = X j = X j ˜ X ′ j X k = X ′ ∀ j � = k j X k

  48. High dimensional knockoffs: screen and confirm original data set

  49. High dimensional knockoffs: screen and confirm original data set exploratory y (0) X (0) screen on sample 1

  50. High dimensional knockoffs: screen and confirm original data set exploratory ry confirmatory y (1) X (1) y (0) X (0) inference on sample 2 screen on sample 1

  51. High dimensional knockoffs: screen and confirm original data set exploratory ry confirmatory y (1) X (1) y (0) X (0) inference on sample 2 screen on sample 1 Theory (Barber and C., ’16) Safe data re-use to improve power (Barber and C., ’16)

  52. Some extensions � � � � · β 2 + · · · + N (0 , σ 2 I n ) y = · β 1 + X 1 X 2 � �� � � �� � n × p 1 n × p 2 Group sparsity — build knockoffs at the group-wise level Dai & Barber 2015 Identify key groups with PCA — build knockoffs only for the top PC in each group Chen, Hou, Hou 2017 Build knockoffs only for prototypes selected from each group Reid & Tibshirani 2015 Multilayer knockoffs to control FDR at the individual and group levels simultaneously Katsevich & Sabatti 2017

  53. Knockoffs for Random Features Joint with Fan, Janson & Lv

  54. Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables?

  55. Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables? Working definition of null variables Say j ∈ H 0 is null iff Y ⊥ ⊥ X j | X − j

  56. Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables? Working definition of null variables Say j ∈ H 0 is null iff Y ⊥ ⊥ X j | X − j ⇒ non nulls are smallest subset S (Markov blanket) s.t. Local Markov property = Y ⊥ ⊥ { X j } j ∈S c | { X j } j ∈S

  57. Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables? Working definition of null variables Say j ∈ H 0 is null iff Y ⊥ ⊥ X j | X − j ⇒ non nulls are smallest subset S (Markov blanket) s.t. Local Markov property = Y ⊥ ⊥ { X j } j ∈S c | { X j } j ∈S 1 Logistic model: P ( Y = 0 | X ) = 1 + e X ⊤ β If variables X 1: p are not perfectly dependent, then j ∈ H 0 ⇐ ⇒ β j = 0

  58. Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown

  59. Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = ( X 1 , . . . , X p ) X = ( ˜ ˜ X 1 , . . . , ˜ Knockoffs X p )

  60. Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = ( X 1 , . . . , X p ) X = ( ˜ ˜ X 1 , . . . , ˜ Knockoffs X p ) (1) Pairwise exchangeability d ( X, ˜ ( X, ˜ X ) swap ( S ) = X ) e.g. d ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 )

  61. Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = ( X 1 , . . . , X p ) X = ( ˜ ˜ X 1 , . . . , ˜ Knockoffs X p ) (1) Pairwise exchangeability d ( X, ˜ ( X, ˜ X ) swap ( S ) = X ) e.g. d ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 ) (2) ˜ X ⊥ ⊥ Y | X (ignore Y when constructing knockoffs)

  62. Exchangeability of feature importance statistics Theorem (C., Fan, Janson Lv (’16)) For knockoff-agnostic scores and any subset T of nulls d = ( Z, ˜ ( Z, Z ) swap( T ) Z ) This holds no matter the relationship between Y and X This holds conditionally on Y ˜ ˜ ˜ Z 1 Z 2 Z p Z 1 Z 2 Z p

  63. Exchangeability of feature importance statistics Theorem (C., Fan, Janson Lv (’16)) For knockoff-agnostic scores and any subset T of nulls d = ( Z, ˜ ( Z, Z ) swap( T ) Z ) This holds no matter the relationship between Y and X This holds conditionally on Y = ⇒ FDR control (conditional on Y ) no matter the relationship between X and Y ˜ ˜ ˜ Z 1 Z 2 Z p Z 1 Z 2 Z p

  64. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X

  65. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ )

  66. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution ( X, ˜ X ) ∼ N ( ∗ , ∗∗ )

  67. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution � µ � � � Σ − diag { s } Σ ( X, ˜ X ) ∼ N ( ∗ , ∗∗ ) ∗ = ∗ ∗ = Σ − diag { s } µ Σ

  68. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution � µ � � � Σ − diag { s } Σ ( X, ˜ X ) ∼ N ( ∗ , ∗∗ ) ∗ = ∗ ∗ = Σ − diag { s } µ Σ s such that ∗∗ � 0

  69. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution � µ � � � Σ − diag { s } Σ ( X, ˜ X ) ∼ N ( ∗ , ∗∗ ) ∗ = ∗ ∗ = Σ − diag { s } µ Σ s such that ∗∗ � 0 Given X , sample ˜ X from ˜ X | X (regression formula) Different from knockoff features for fixed X !

  70. Robustness 1.00 1.00 Exact Cov ● 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  71. Robustness 1.00 1.00 Exact Cov ● 0.75 ● 0.75 Graph. Lasso Power FDR 0.50 0.50 0.25 0.25 ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  72. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  73. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  74. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  75. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  76. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 100% Emp. Cov 0.00 0.00 ● ● 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  77. Robustness theory Ongoing with R. Barber and R. Samworth (Partial) subject of 2017 Tweedie Award Lecture Rina F. Barber

  78. Knockoffs inference with random features Pros: Holds for finite samples No parameters No matter the dependence between Y and X No p-values No matter the dimensionality Cons: Need to know distribution of covariates

  79. Relationship with classical setup Classical MF Knockoffs

  80. Relationship with classical setup Classical MF Knockoffs Observations of X are random 1 Observations of X are fixed Inference is conditional on obs. values 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled

  81. Relationship with classical setup Classical MF Knockoffs Observations of X are random 1 Observations of X are fixed Inference is conditional on obs. values Model free 2 Strong model linking Y and X 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled 2 Shifts the ‘burden’ of knowledge

  82. Relationship with classical setup Classical MF Knockoffs Observations of X are random 1 Observations of X are fixed Inference is conditional on obs. values Model free 2 Strong model linking Y and X Useful inference even if model inexact 3 Useful inference even if model inexact 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled 2 Shifts the ‘burden’ of knowledge 3 More later

  83. Shift in the burden of knowledge When are our assumptions useful? When we have large amounts of unsupervised data (e.g. economic studies with same covariate info but different responses) When we have more prior information about the covariates than about their relationship with a response (e.g. GWAS) When we control the distribution of X (experimental crosses in genetics, gene knockout experiments,...)

  84. Obstacles to obtaining p-values Y | X ∼ Bernoulli(logit( X ⊤ β )) Global Null, AR(1) Design 20 Nonzero Coefficients, AR(1) Design 2000 2000 1500 1500 count count 1000 1000 500 500 0 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 P−Values P−Values Figure: Distribution of null logistic regression p-values with n = 500 and p = 200

  85. Obstacles to obtaining p-values P { p -val ≤ . . . % } Sett. (1) Sett. (2) Sett. (3) Sett. (4) 5% 16 . 89% (0 . 37) 19 . 17% (0 . 39) 16 . 88% (0 . 37) 16 . 78% (0 . 37) 1% 6 . 78% (0 . 25) 8 . 49% (0 . 28) 7 . 02% (0 . 26) 7 . 03% (0 . 26) 0 . 1% 1 . 53% (0 . 12) 2 . 27% (0 . 15) 1 . 87% (0 . 14) 2 . 04% (0 . 14) Table: Inflated p-value probabilities with estimated Monte Carlo SEs

  86. Shameless plug: distribution of high-dimensional LRTs Wilks’ phenomenon (1938) d → χ 2 2 log L df 30000 20000 Counts 10000 0 0.00 0.25 0.50 0.75 1.00 P−Values

  87. Shameless plug: distribution of high-dimensional LRTs Sur, Chen, Cand` es (2017) Wilks’ phenomenon (1938) � p � d χ 2 d 2 log L → κ → χ 2 2 log L df df n 12500 30000 10000 20000 7500 Counts Counts 5000 10000 2500 0 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 P−Values P−Values

  88. ‘Low’ dim. linear model with dependent covariates Z j = | ˆ β j (ˆ λ CV ) | W j = Z j − ˜ Z j 1.00 Methods 1.00 BHq Marginal BHq Max Lik. MF Knockoffs Orig. Knockoffs 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 0.00 0.00 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 Autocorrelation Coefficient Autocorrelation Coefficient Figure: Low-dimensional setting: n = 3000 , p = 1000

  89. ‘Low’ dim. logistic model with indep. covariates Z j = | ˆ β j (ˆ λ CV ) | W j = Z j − ˜ Z j 1.00 1.00 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 Methods BHq Marginal BHq Max Lik. MF Knockoffs 0.00 0.00 6 8 10 6 8 10 Coefficient Amplitude Coefficient Amplitude Figure: Low-dimensional setting: n = 3000 , p = 1000

  90. ‘High’ dim. logistic model with dependent covariates Z j = | ˆ β j (ˆ λ CV ) | W j = Z j − ˜ Z j 1.00 1.00 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 Methods BHq Marginal MF Knockoffs 0.00 0.00 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 Autocorrelation Coefficient Autocorrelation Coefficient Figure: High-dimensional setting: n = 3000 , p = 6000

  91. Bayesian knockoff statistics BVS (Bayesian variable selection) Z j = P ( β j � = 0 | y , X ) LCD (Lasso coeff. difference) W j = Z j − ˜ Z j

  92. Bayesian knockoff statistics BVS (Bayesian variable selection) Z j = P ( β j � = 0 | y , X ) LCD (Lasso coeff. difference) W j = Z j − ˜ Z j 1.00 Methods 1.00 Methods BVS Knockoffs BVS Knockoffs LCD Knockoffs LCD Knockoffs 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 0.00 0.00 5 10 15 5 10 15 Amplitude Amplitude Figure: n = 300 , p = 1000 and Bayesian linear model with 60 expected variables Inference is correct even if prior is wrong or MCMC has not converged

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend