Whats Happening in Selective Inference II? Emmanuel Cand` es, - - PowerPoint PPT Presentation

what s happening in selective inference ii
SMART_READER_LITE
LIVE PREVIEW

Whats Happening in Selective Inference II? Emmanuel Cand` es, - - PowerPoint PPT Presentation

Whats Happening in Selective Inference II? Emmanuel Cand` es, Stanford University The 2017 Wald Lectures, Joint Statistical Meetings, Baltimore, August 2017 Lecture 2: Special dedication Chiara Sabatti Agenda: The knockoff machine (1) The


slide-1
SLIDE 1

What’s Happening in Selective Inference II?

Emmanuel Cand` es, Stanford University The 2017 Wald Lectures, Joint Statistical Meetings, Baltimore, August 2017

slide-2
SLIDE 2

Lecture 2: Special dedication

Chiara Sabatti

slide-3
SLIDE 3

Agenda: The knockoff machine

(1) The knockoff framework (mostly from yesterday) (2) Knockoffs for fixed covariates (3) Knockoffs for random covariates (4) Knockoffs for genome-wide association studies (GWAS) (5) Genetic data analysis

slide-4
SLIDE 4

The Knockoffs Framework (Summary from Lecture 1)

slide-5
SLIDE 5

Controlled variable selection

−log10(P) 10 5 10 15 22 X 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Crohn’s disease

Response Y (e.g. disease status) Features X1, . . . , Xp (e.g. SNPs) Question: distribution of Y | X depends on X through which variables?

slide-6
SLIDE 6

Controlled variable selection

−log10(P) 10 5 10 15 22 X 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Crohn’s disease

Response Y (e.g. disease status) Features X1, . . . , Xp (e.g. SNPs) Question: distribution of Y | X depends on X through which variables? Goal: select set of features Xj that are likely to be relevant without too many false positives – do not run into the problem of irreproducibilty FDR = E # false positives # features selected

  • FDP
slide-7
SLIDE 7

Which variables should we report?

Feature importance Zj from random forests

  • 100

200 300 400 500 1 2 3 4 5 6 7 Variables Feature Importance

slide-8
SLIDE 8

Which variables should we report?

Feature importance Zj from random forests

  • 100

200 300 400 500 1 2 3 4 5 6 7 Variables Feature Importance

  • True positives?
slide-9
SLIDE 9

Knockoffs as negative controls

  • 200

400 600 800 1000 1 2 3 4 Variables Feature Importance

  • Original

Knockoffs

slide-10
SLIDE 10

Exchangeability of feature importance statistics

Knockoff agnostic feature importance Z (Z1, . . . , Zp

  • riginals

, ˜ Z1, . . . , ˜ Zp

  • knockoffs

) = z([X, ˜ X], y)

  • 200

400 600 800 1000 1 2 3 4

slide-11
SLIDE 11

Exchangeability of feature importance statistics

Knockoff agnostic feature importance Z (Z1, . . . , Zp

  • riginals

, ˜ Z1, . . . , ˜ Zp

  • knockoffs

) = z([X, ˜ X], y)

  • 200

400 600 800 1000 1 2 3 4

This lecture

Can construct knockoff features such that j null = ⇒ (Zj, ˜ Zj)

d

= ( ˜ Zj, Zj) more generally T subset of nulls = ⇒ (Z, ˜ Z)swap(T )

d

= (Z, ˜ Z)

Z1 Zp Z2 ˜ Zp ˜ Z2 ˜ Z1

slide-12
SLIDE 12

Knockoffs-adjusted scores

+ +

__ __

+ + +

__

+ +

__

|W|

if null Ordering of variables + 1-bit p-values

Adjusted scores Wj with flip-sign property

Combine Zj and ˜ Zj into single (knockoff) score Wj Wj = wj(Zj, ˜ Zj) wj( ˜ Zj, Zj) = −wj(Zj, ˜ Zj) e.g. Wj = Zj − ˜ Zj Wj = Zj ∨ ˜ Zj ·

  • +1

Zj > ˜ Zj −1 Zj ≤ ˜ Zj = ⇒ Conditional on |W|, signs of null Wj’s are i.i.d. coin flips

slide-13
SLIDE 13

Selection by sequential testing

+ +

__ __

+ + +

__

+ +

|W|

+ + + + +

...

t Select S+(t) = ⇒

  • FDP(t) = 1+|S−(t)|

1 ∨ |S+(t)| S+(t) = {j : Wj ≥ t} S−(t) = {j : Wj ≤ −t}

Theorem (Barber and C. (’15))

Select S+(τ), τ = min {t : FDP(t) ≤ q} Knockoff E # false positives # selections + q−1

  • ≤ q

Knockoff+ E # false positives # selections

  • ≤ q
slide-14
SLIDE 14

Why Can We Invert the Estimate of FDP?

Proof Sketch of FDR Control

slide-15
SLIDE 15

Why does all this work?

τ = min

  • t : 1+|S−(t)|

|S+(t)| ∨ 1 ≤ q

  • S+(t) = {j : Wj ≥ t}

S−(t) = {j : Wj ≤ −t}

+ +

__ __

+ + +

__

+ +

__

slide-16
SLIDE 16

Why does all this work?

τ = min

  • t : 1+|S−(t)|

|S+(t)| ∨ 1 ≤ q

  • S+(t) = {j : Wj ≥ t}

S−(t) = {j : Wj ≤ −t}

+ +

__ __

+ + +

__

+ +

__

FDP(τ) = #{j null : j ∈ S+(τ)} #{j : j ∈ S+(τ)} ∨ 1

slide-17
SLIDE 17

Why does all this work?

τ = min

  • t : 1+|S−(t)|

|S+(t)| ∨ 1 ≤ q

  • S+(t) = {j : Wj ≥ t}

S−(t) = {j : Wj ≤ −t}

+ +

__ __

+ + +

__

+ +

__

FDP(τ) = #{j null : j ∈ S+(τ))} #{j : j ∈ S+(τ)} ∨ 1 · 1 + #{j null : j ∈ S−(τ)} 1 + #{j null : j ∈ S−(τ)}

slide-18
SLIDE 18

Why does all this work?

τ = min

  • t : 1+|S−(t)|

|S+(t)| ∨ 1 ≤ q

  • S+(t) = {j : Wj ≥ t}

S−(t) = {j : Wj ≤ −t}

+ +

__ __

+ + +

__

+ +

__

FDP(τ) ≤ q ·

V +(τ)

  • #{j null : j ∈ S+(τ)}

1 + #{j null : j ∈ S−(τ)}

  • V −(τ)
slide-19
SLIDE 19

Why does all this work?

τ = min

  • t : 1+|S−(t)|

|S+(t)| ∨ 1 ≤ q

  • S+(t) = {j : Wj ≥ t}

S−(t) = {j : Wj ≤ −t}

+ +

__ __

+ + +

__

+ +

__

FDP(τ) ≤ q ·

V +(τ)

  • #{j null : j ∈ S+(τ)}

1 + #{j null : j ∈ S−(τ)}

  • V −(τ)

To show E

  • V +(τ)

1 + V −(τ)

  • ≤ 1
slide-20
SLIDE 20

Martingales

V +(t) 1 + V −(t) is a (super)martingale wrt Ft = {σ(V ±(u))}u≤t

__

+ +

__

if null

t V +(t) V −(t)

,

|W|

slide-21
SLIDE 21

Martingales

V +(t) 1 + V −(t) is a (super)martingale wrt Ft = {σ(V ±(u))}u≤t

__

+ +

__

if null

t s V +(t) V −(t)

,

|W|

slide-22
SLIDE 22

Martingales

V +(t) 1 + V −(t) is a (super)martingale wrt Ft = {σ(V ±(u))}u≤t

__

+ +

__

if null

t s V +(t) V −(t)

,

V +(s) + V −(s) = m

|W|

Conditioned on V +(s) + V −(s), V +(s) is hypergeometric

slide-23
SLIDE 23

Martingales

V +(t) 1 + V −(t) is a (super)martingale wrt Ft = {σ(V ±(u))}u≤t

__

+ +

__

if null

t s V +(t) V −(t)

,

V +(s) + V −(s) = m

|W|

Conditioned on V +(s) + V −(s), V +(s) is hypergeometric E

  • V +(s)

1 + V −(s) | V ±(t), V +(s) + V −(s)

V +(t) 1 + V −(t)

slide-24
SLIDE 24

Optional stopping theorem

if null

τ

FDR ≤ q E

  • V +(τ)

1 + V −(τ)

  • ≤ q E

    

Bin(#nulls,1/2)

V +(0) 1 + V −(0)      ≤ q

slide-25
SLIDE 25

Knockoffs for Fixed Features

Joint with Barber

slide-26
SLIDE 26

Linear model

y =

  • j βjXj

+ z n × 1 n × p p × 1 n × 1 y ∼ N(Xβ, σ2I) Fixed design X Noise level σ unknown Multiple testing: Hj : βj = 0 (is jth variable in the model?) Identifiability = ⇒ p ≤ n Inference (FDR control) will hold conditionally on X

slide-27
SLIDE 27

Knockoff features (fixed X)

Originals Knockofgs

slide-28
SLIDE 28

Knockoff features (fixed X)

Originals Knockofgs

˜ X′

j ˜

Xk = X′

jXk

for all j, k ˜ X′

jXk = X′ jXk

for all j = k

slide-29
SLIDE 29

Knockoff features (fixed X)

Originals Knockofgs

˜ X′

j ˜

Xk = X′

jXk

for all j, k ˜ X′

jXk = X′ jXk

for all j = k No need for new data or experiment No knowledge of response y

slide-30
SLIDE 30

Knockoff construction (n ≥ 2p)

Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.

  • X

˜ X ′ X ˜ X

  • =
  • Σ

Σ − diag{s} Σ − diag{s} Σ

  • := G
slide-31
SLIDE 31

Knockoff construction (n ≥ 2p)

Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.

  • X

˜ X ′ X ˜ X

  • =
  • Σ

Σ − diag{s} Σ − diag{s} Σ

  • := G 0
slide-32
SLIDE 32

Knockoff construction (n ≥ 2p)

Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.

  • X

˜ X ′ X ˜ X

  • =
  • Σ

Σ − diag{s} Σ − diag{s} Σ

  • := G 0

G 0 ⇐ ⇒ diag{s} 0 2Σ − diag{s} 0

slide-33
SLIDE 33

Knockoff construction (n ≥ 2p)

Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.

  • X

˜ X ′ X ˜ X

  • =
  • Σ

Σ − diag{s} Σ − diag{s} Σ

  • := G 0

G 0 ⇐ ⇒ diag{s} 0 2Σ − diag{s} 0

Solution

˜ X = X(I − Σ−1 diag{s}) + ˜ UC ˜ U ∈ Rn×p with col. space orthogonal to that of X C′C Cholevsky factorization of 2 diag{s} − diag{s}Σ−1 diag{s} 0

slide-34
SLIDE 34

Knockoff construction (n ≥ 2p)

˜ X′

jXj = 1 − sj

(Standardized columns) Equi-correlated knockoffs sj = 2λmin(Σ) ∧ 1 Under equivariance, minimizes the value of |Xj, ˜ Xj|

slide-35
SLIDE 35

Knockoff construction (n ≥ 2p)

˜ X′

jXj = 1 − sj

(Standardized columns) Equi-correlated knockoffs sj = 2λmin(Σ) ∧ 1 Under equivariance, minimizes the value of |Xj, ˜ Xj| SDP knockoffs minimize

  • j |1 − sj|

subject to sj ≥ 0 diag{s} 2Σ Highly structured semidefinite program (SDP)

slide-36
SLIDE 36

Knockoff construction (n ≥ 2p)

˜ X′

jXj = 1 − sj

(Standardized columns) Equi-correlated knockoffs sj = 2λmin(Σ) ∧ 1 Under equivariance, minimizes the value of |Xj, ˜ Xj| SDP knockoffs minimize

  • j |1 − sj|

subject to sj ≥ 0 diag{s} 2Σ Highly structured semidefinite program (SDP) Other possibilities ...

slide-37
SLIDE 37

Why?

For null feature Xj X′

jy = X′ jXβ + X′ jz d

= ˜ X′

jXβ + ˜

X′

jz = ˜

X′

jy

slide-38
SLIDE 38

Why?

For null feature Xj X′

jy = X′ jXβ + X′ jz d

= ˜ X′

jXβ + ˜

X′

jz = ˜

X′

jy

slide-39
SLIDE 39

Why?

For any subset of nulls T [X ˜ X]′

swap(T ) y d

= [X ˜ X]′ y [X ˜ X]′

swap(T ) =

slide-40
SLIDE 40

Exchangeability of feature importance statistics

Sufficiency: (Z, ˜ Z) = z

  • X

˜ X ′ X ˜ X

  • ,
  • X

˜ X ′ y

  • Knockoff-agnostic: swapping originals and knockoffs =

⇒ swaps Z’s z(

  • X

˜ X

  • swap(T ), y) = (Z, ˜

Z)swap(T )

slide-41
SLIDE 41

Exchangeability of feature importance statistics

Sufficiency: (Z, ˜ Z) = z

  • X

˜ X ′ X ˜ X

  • ,
  • X

˜ X ′ y

  • Knockoff-agnostic: swapping originals and knockoffs =

⇒ swaps Z’s z(

  • X

˜ X

  • swap(T ), y) = (Z, ˜

Z)swap(T )

Theorem (Barber and C. (15))

For any subset T of nulls (Z, Z)swap(T )

d

= (Z, ˜ Z) = ⇒ FDR control (conditional on X)

Z1 Zp Z2 ˜ Zp ˜ Z2 ˜ Z1

slide-42
SLIDE 42

Telling the effect direction

[...] in classical statistics, the significance of comparisons (e. g., θ1 − θ2) is calibrated using Type I error rate, relying on the assumption that the true difference is zero, which makes no sense in many applications. [...] a more relevant framework in which a true comparison can be positive or negative, and, based on the data, you can state “θ1 > θ2 with confidence”, “θ2 > θ1 with confidence”, or “no claim with confidence”.

  • A. Gelman & F. Tuerlinckx
slide-43
SLIDE 43

Directional FDR

Are any effects exactly zero? FDRdir = E # selections with wrong effect direction # selections

  • Directional false discovery rate

Directional false discovery proportion

Directional FDR (Benjamini & Yekutieli, ’05) Sign errors (Type-S) (Gelman & Tuerlinckx, ’00) Important for misspecified models — exact sparsity unlikely

slide-44
SLIDE 44

Directional FDR control

(Xj − ˜ Xj)′y

ind

∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate

  • sgn((Xj − ˜

Xj)′y)

slide-45
SLIDE 45

Directional FDR control

(Xj − ˜ Xj)′y

ind

∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate

  • sgn((Xj − ˜

Xj)′y)

Theorem (Barber and C., ’16)

Exact same knockoff selection + sign estimate FDR ≤ FDRdir ≤ q

slide-46
SLIDE 46

Directional FDR control

(Xj − ˜ Xj)′y

ind

∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate

  • sgn((Xj − ˜

Xj)′y)

Theorem (Barber and C., ’16)

Exact same knockoff selection + sign estimate FDR ≤ FDRdir ≤ q + +

__ __

+ + +

__

+ +

__

null non null + +

__

|W| Null coin fips are unbiased

slide-47
SLIDE 47

Directional FDR control

(Xj − ˜ Xj)′y

ind

∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate

  • sgn((Xj − ˜

Xj)′y)

Theorem (Barber and C. (16))

Exact same knockoff selection + sign estimate FDR ≤ FDRdir ≤ q + +

__ __

+ + +

__

+ +

__

+ +

__

|W| Great subtlety: coin fips are now biased

slide-48
SLIDE 48

Empirical results

Features N(0, In), n = 3000, p = 1000 k = 30 variables with regression coefficients of magnitude 3.5

Method FDR (%) Power (%)

  • Theor. FDR

(nominal level q = 20%) control? Knockoff+ (equivariant) 14.40 60.99 Yes Knockoff (equivariant) 17.82 66.73 No Knockoff+ (SDP) 15.05 61.54 Yes Knockoff (SDP) 18.72 67.50 No BHq 18.70 48.88 No BHq + log-factor correction 2.20 19.09 Yes BHq with whitened noise 18.79 2.33 Yes

slide-49
SLIDE 49

Effect of signal amplitude

Same setup with k = 30 (q = 0.2)

2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 5 10 15 20 25 Amplitude A FDR (%)

  • Nominal level

Knockoff Knockoff+ BHq

  • 2.8

3.0 3.2 3.4 3.6 3.8 4.0 4.2 20 40 60 80 100 Amplitude A Power (%)

  • Knockoff

Knockoff+ BHq

slide-50
SLIDE 50

Effect of feature correlation

Features ∼ N(0, Θ) Θjk = ρ|j−k| n = 3000, p = 1000, and k = 30 and amplitude = 3.5

0.0 0.2 0.4 0.6 0.8 5 10 15 20 25 30 Feature correlation ρ FDR (%)

  • Nominal level

Knockoff Knockoff+ BHq

  • 0.0

0.2 0.4 0.6 0.8 20 40 60 80 100 Feature correlation ρ Power (%)

  • Knockoff

Knockoff+ BHq

slide-51
SLIDE 51

Fixed Design Knockoff Data Analysis

slide-52
SLIDE 52

HIV drug resistance

Drug type # drugs Sample size # protease or RT # mutations appearing positions genotyped ≥ 3 times in sample PI 6 848 99 209 NRTI 6 639 240 294 NNRTI 3 747 240 319 response y: log-fold-increase of lab-tested drug resistance covariate Xj: presence or absence of mutation #j Data from R. Shafer (Stanford) available at: http://hivdb.stanford.edu/pages/published_analysis/genophenoPNAS2006/

slide-53
SLIDE 53

HIV data

TSM list: mutations associated with the PI class of drugs in general, and is not specialized to the individual drugs in the class Results for PI type drugs

Knockoff BHq Data set size: n=768, p=201 # HIV−1 protease positions selected 5 10 15 20 25 30 35

Resistance to APV

Appear in TSM list Not in TSM list Knockoff BHq Data set size: n=329, p=147 5 10 15 20 25 30 35

Resistance to ATV

Knockoff BHq Data set size: n=826, p=208 5 10 15 20 25 30 35

Resistance to IDV

Knockoff BHq Data set size: n=516, p=184 5 10 15 20 25 30 35

Resistance to LPV

Knockoff BHq Data set size: n=843, p=209 5 10 15 20 25 30 35

Resistance to NFV

Knockoff BHq Data set size: n=825, p=208 5 10 15 20 25 30 35

Resistance to SQV

slide-54
SLIDE 54

HIV data

Results for NRTI type drugs Results for NNRTI type drugs

Knockoff BHq Data set size: n=633, p=292 # HIV−1 RT positions selected 5 10 15 20 25 30

Resistance to X3TC

Appear in TSM list Not in TSM list Knockoff BHq Data set size: n=628, p=294 5 10 15 20 25 30

Resistance to ABC

Knockoff BHq Data set size: n=630, p=292 5 10 15 20 25 30

Resistance to AZT

Knockoff BHq Data set size: n=630, p=293 5 10 15 20 25 30

Resistance to D4T

Knockoff BHq Data set size: n=632, p=292 5 10 15 20 25 30

Resistance to DDI

Knockoff BHq Data set size: n=353, p=218 5 10 15 20 25 30

Resistance to TDF

Knockoff BHq Data set size: n=732, p=311 # HIV−1 RT positions selected 5 10 15 20 25 30 35

Resistance to DLV

Appear in TSM list Not in TSM list Knockoff BHq Data set size: n=734, p=318 5 10 15 20 25 30 35

Resistance to EFV

Knockoff BHq Data set size: n=746, p=319 5 10 15 20 25 30 35

Resistance to NVP

slide-55
SLIDE 55

High-dimensional setting

n ≈ 5, 000 subjects p ≈ 330, 000 SNPs/vars to test

20 15 10 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 –log10 (P value) HDL cholesterol GALNT2 LPL ABCA1 MVK/MMAB LIPC LCAT LIPG CETP

p > n − → cannot construct knockoffs as before ˜ X′

j ˜

Xk = X′

jXk

∀ j, k ˜ X′

jXk = X′ jXk

∀ j = k = ⇒ ˜ Xj = Xj ∀j

slide-56
SLIDE 56

High dimensional knockoffs: screen and confirm

  • riginal data set
slide-57
SLIDE 57

High dimensional knockoffs: screen and confirm

  • riginal data set

exploratory

X(0) y(0)

screen on sample 1

slide-58
SLIDE 58

High dimensional knockoffs: screen and confirm

  • riginal data set

exploratory

X(0) y(0)

screen on sample 1

ry confirmatory

y(1) X(1) inference on sample 2

slide-59
SLIDE 59

High dimensional knockoffs: screen and confirm

  • riginal data set

exploratory

X(0) y(0)

screen on sample 1

ry confirmatory

y(1) X(1) inference on sample 2 Theory (Barber and C., ’16) Safe data re-use to improve power (Barber and C., ’16)

slide-60
SLIDE 60

Some extensions

y =

  • X1
  • n×p1

·β1 +

  • X2
  • n×p2

·β2 + · · · + N(0, σ2In) Group sparsity — build knockoffs at the group-wise level

Dai & Barber 2015

Identify key groups with PCA — build knockoffs only for the top PC in each group

Chen, Hou, Hou 2017

Build knockoffs only for prototypes selected from each group

Reid & Tibshirani 2015

Multilayer knockoffs to control FDR at the individual and group levels simultaneously

Katsevich & Sabatti 2017

slide-61
SLIDE 61

Knockoffs for Random Features

Joint with Fan, Janson & Lv

slide-62
SLIDE 62

Variable selection in arbitrary models

Random pair (X, Y ) (perhaps thousands/millions of covariates) p(Y | X) depends on X through which variables?

slide-63
SLIDE 63

Variable selection in arbitrary models

Random pair (X, Y ) (perhaps thousands/millions of covariates) p(Y | X) depends on X through which variables?

Working definition of null variables

Say j ∈ H0 is null iff Y ⊥ ⊥ Xj | X−j

slide-64
SLIDE 64

Variable selection in arbitrary models

Random pair (X, Y ) (perhaps thousands/millions of covariates) p(Y | X) depends on X through which variables?

Working definition of null variables

Say j ∈ H0 is null iff Y ⊥ ⊥ Xj | X−j Local Markov property = ⇒ non nulls are smallest subset S (Markov blanket) s.t. Y ⊥ ⊥ {Xj}j∈Sc | {Xj}j∈S

slide-65
SLIDE 65

Variable selection in arbitrary models

Random pair (X, Y ) (perhaps thousands/millions of covariates) p(Y | X) depends on X through which variables?

Working definition of null variables

Say j ∈ H0 is null iff Y ⊥ ⊥ Xj | X−j Local Markov property = ⇒ non nulls are smallest subset S (Markov blanket) s.t. Y ⊥ ⊥ {Xj}j∈Sc | {Xj}j∈S Logistic model: P(Y = 0|X) = 1 1 + eX⊤β If variables X1:p are not perfectly dependent, then j ∈ H0 ⇐ ⇒ βj = 0

slide-66
SLIDE 66

Knockoff features (random X)

i.i.d. samples from p(X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown

slide-67
SLIDE 67

Knockoff features (random X)

i.i.d. samples from p(X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = (X1, . . . , Xp) Knockoffs ˜ X = ( ˜ X1, . . . , ˜ Xp)

slide-68
SLIDE 68

Knockoff features (random X)

i.i.d. samples from p(X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = (X1, . . . , Xp) Knockoffs ˜ X = ( ˜ X1, . . . , ˜ Xp) (1) Pairwise exchangeability (X, ˜ X)swap(S)

d

= (X, ˜ X) e.g. (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)swap({2,3})

d

= (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3)

slide-69
SLIDE 69

Knockoff features (random X)

i.i.d. samples from p(X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = (X1, . . . , Xp) Knockoffs ˜ X = ( ˜ X1, . . . , ˜ Xp) (1) Pairwise exchangeability (X, ˜ X)swap(S)

d

= (X, ˜ X) e.g. (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)swap({2,3})

d

= (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) (2) ˜ X ⊥ ⊥ Y | X (ignore Y when constructing knockoffs)

slide-70
SLIDE 70

Exchangeability of feature importance statistics

Theorem (C., Fan, Janson Lv (’16))

For knockoff-agnostic scores and any subset T of nulls (Z, Z)swap(T )

d

= (Z, ˜ Z) This holds no matter the relationship between Y and X This holds conditionally on Y

Z1 Zp Z2 ˜ Zp ˜ Z2 ˜ Z1

slide-71
SLIDE 71

Exchangeability of feature importance statistics

Theorem (C., Fan, Janson Lv (’16))

For knockoff-agnostic scores and any subset T of nulls (Z, Z)swap(T )

d

= (Z, ˜ Z) This holds no matter the relationship between Y and X This holds conditionally on Y = ⇒ FDR control (conditional on Y ) no matter the relationship between X and Y

Z1 Zp Z2 ˜ Zp ˜ Z2 ˜ Z1

slide-72
SLIDE 72

Knockoffs for Gaussian features

Swapping any subset of original and knockoff features leaves (joint) dist. invariant e.g. T = {2, 3} (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3)

d

= (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3) Note ˜ X

d

= X

slide-73
SLIDE 73

Knockoffs for Gaussian features

Swapping any subset of original and knockoff features leaves (joint) dist. invariant e.g. T = {2, 3} (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3)

d

= (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3) Note ˜ X

d

= X X ∼ N(µ, Σ)

slide-74
SLIDE 74

Knockoffs for Gaussian features

Swapping any subset of original and knockoff features leaves (joint) dist. invariant e.g. T = {2, 3} (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3)

d

= (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3) Note ˜ X

d

= X X ∼ N(µ, Σ) Possible solution (X, ˜ X) ∼ N(∗, ∗∗)

slide-75
SLIDE 75

Knockoffs for Gaussian features

Swapping any subset of original and knockoff features leaves (joint) dist. invariant e.g. T = {2, 3} (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3)

d

= (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3) Note ˜ X

d

= X X ∼ N(µ, Σ) Possible solution (X, ˜ X) ∼ N(∗, ∗∗) ∗ = µ µ

  • ∗ ∗ =
  • Σ

Σ − diag{s} Σ − diag{s} Σ

slide-76
SLIDE 76

Knockoffs for Gaussian features

Swapping any subset of original and knockoff features leaves (joint) dist. invariant e.g. T = {2, 3} (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3)

d

= (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3) Note ˜ X

d

= X X ∼ N(µ, Σ) Possible solution (X, ˜ X) ∼ N(∗, ∗∗) ∗ = µ µ

  • ∗ ∗ =
  • Σ

Σ − diag{s} Σ − diag{s} Σ

  • s such that ∗∗ 0
slide-77
SLIDE 77

Knockoffs for Gaussian features

Swapping any subset of original and knockoff features leaves (joint) dist. invariant e.g. T = {2, 3} (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3)

d

= (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3) Note ˜ X

d

= X X ∼ N(µ, Σ) Possible solution (X, ˜ X) ∼ N(∗, ∗∗) ∗ = µ µ

  • ∗ ∗ =
  • Σ

Σ − diag{s} Σ − diag{s} Σ

  • s such that ∗∗ 0

Given X, sample ˜ X from ˜ X | X (regression formula) Different from knockoff features for fixed X!

slide-78
SLIDE 78

Robustness

  • Exact Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-79
SLIDE 79

Robustness

  • Exact Cov
  • Graph. Lasso

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-80
SLIDE 80

Robustness

  • Exact Cov
  • Graph. Lasso

50% Emp. Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-81
SLIDE 81

Robustness

  • Exact Cov
  • Graph. Lasso

50% Emp. Cov 62.5% Emp. Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-82
SLIDE 82

Robustness

  • Exact Cov
  • Graph. Lasso

50% Emp. Cov 62.5% Emp. Cov 75% Emp. Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-83
SLIDE 83

Robustness

  • Exact Cov
  • Graph. Lasso

50% Emp. Cov 62.5% Emp. Cov 75% Emp. Cov 87.5% Emp. Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-84
SLIDE 84

Robustness

  • Exact Cov
  • Graph. Lasso

50% Emp. Cov 62.5% Emp. Cov 75% Emp. Cov 87.5% Emp. Cov 100% Emp. Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-85
SLIDE 85

Robustness theory

Ongoing with R. Barber and

  • R. Samworth

(Partial) subject of 2017 Tweedie Award Lecture Rina F. Barber

slide-86
SLIDE 86

Knockoffs inference with random features

Pros: No parameters No p-values Holds for finite samples No matter the dependence between Y and X No matter the dimensionality Cons: Need to know distribution of covariates

slide-87
SLIDE 87

Relationship with classical setup

Classical MF Knockoffs

slide-88
SLIDE 88

Relationship with classical setup

Classical MF Knockoffs Observations of X are fixed Inference is conditional on obs. values Observations of X are random1 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled

slide-89
SLIDE 89

Relationship with classical setup

Classical MF Knockoffs Observations of X are fixed Inference is conditional on obs. values Observations of X are random1 Strong model linking Y and X Model free2 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled 2 Shifts the ‘burden’ of knowledge

slide-90
SLIDE 90

Relationship with classical setup

Classical MF Knockoffs Observations of X are fixed Inference is conditional on obs. values Observations of X are random1 Strong model linking Y and X Model free2 Useful inference even if model inexact Useful inference even if model inexact3 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled 2 Shifts the ‘burden’ of knowledge 3 More later

slide-91
SLIDE 91

Shift in the burden of knowledge

When are our assumptions useful? When we have large amounts of unsupervised data (e.g. economic studies with same covariate info but different responses) When we have more prior information about the covariates than about their relationship with a response (e.g. GWAS) When we control the distribution of X (experimental crosses in genetics, gene knockout experiments,...)

slide-92
SLIDE 92

Obstacles to obtaining p-values

Y | X ∼ Bernoulli(logit(X⊤β))

500 1000 1500 2000 0.00 0.25 0.50 0.75 1.00

P−Values count

Global Null, AR(1) Design

500 1000 1500 2000 0.00 0.25 0.50 0.75 1.00

P−Values count

20 Nonzero Coefficients, AR(1) Design

Figure: Distribution of null logistic regression p-values with n = 500 and p = 200

slide-93
SLIDE 93

Obstacles to obtaining p-values

P{p-val ≤ . . . %}

  • Sett. (1)
  • Sett. (2)
  • Sett. (3)
  • Sett. (4)

5% 16.89% (0.37) 19.17% (0.39) 16.88% (0.37) 16.78% (0.37) 1% 6.78% (0.25) 8.49% (0.28) 7.02% (0.26) 7.03% (0.26) 0.1% 1.53% (0.12) 2.27% (0.15) 1.87% (0.14) 2.04% (0.14)

Table: Inflated p-value probabilities with estimated Monte Carlo SEs

slide-94
SLIDE 94

Shameless plug: distribution of high-dimensional LRTs

Wilks’ phenomenon (1938) 2 log L

d

→ χ2

df

10000 20000 30000 0.00 0.25 0.50 0.75 1.00

P−Values Counts

slide-95
SLIDE 95

Shameless plug: distribution of high-dimensional LRTs

Wilks’ phenomenon (1938) 2 log L

d

→ χ2

df

10000 20000 30000 0.00 0.25 0.50 0.75 1.00

P−Values Counts

Sur, Chen, Cand` es (2017) 2 log L

d

→ κ p n

  • χ2

df

2500 5000 7500 10000 12500 0.00 0.25 0.50 0.75 1.00

P−Values Counts

slide-96
SLIDE 96

‘Low’ dim. linear model with dependent covariates

Zj = |ˆ βj(ˆ λCV)| Wj = Zj − ˜ Zj

0.00 0.25 0.50 0.75 1.00 0.0 0.2 0.4 0.6 0.8

Autocorrelation Coefficient Power Methods

BHq Marginal BHq Max Lik. MF Knockoffs

  • Orig. Knockoffs

0.00 0.25 0.50 0.75 1.00 0.0 0.2 0.4 0.6 0.8

Autocorrelation Coefficient FDR Figure: Low-dimensional setting: n = 3000, p = 1000

slide-97
SLIDE 97

‘Low’ dim. logistic model with indep. covariates

Zj = |ˆ βj(ˆ λCV)| Wj = Zj − ˜ Zj

0.00 0.25 0.50 0.75 1.00 6 8 10

Coefficient Amplitude Power Methods

BHq Marginal BHq Max Lik. MF Knockoffs 0.00 0.25 0.50 0.75 1.00 6 8 10

Coefficient Amplitude FDR Figure: Low-dimensional setting: n = 3000, p = 1000

slide-98
SLIDE 98

‘High’ dim. logistic model with dependent covariates

Zj = |ˆ βj(ˆ λCV)| Wj = Zj − ˜ Zj

0.00 0.25 0.50 0.75 1.00 0.0 0.2 0.4 0.6 0.8

Autocorrelation Coefficient Power Methods

BHq Marginal MF Knockoffs 0.00 0.25 0.50 0.75 1.00 0.0 0.2 0.4 0.6 0.8

Autocorrelation Coefficient FDR Figure: High-dimensional setting: n = 3000, p = 6000

slide-99
SLIDE 99

Bayesian knockoff statistics

LCD (Lasso coeff. difference) BVS (Bayesian variable selection) Zj = P(βj = 0 | y, X) Wj = Zj − ˜ Zj

slide-100
SLIDE 100

Bayesian knockoff statistics

LCD (Lasso coeff. difference) BVS (Bayesian variable selection) Zj = P(βj = 0 | y, X) Wj = Zj − ˜ Zj

0.00 0.25 0.50 0.75 1.00 5 10 15

Amplitude Power Methods

BVS Knockoffs LCD Knockoffs 0.00 0.25 0.50 0.75 1.00 5 10 15

Amplitude FDR Methods

BVS Knockoffs LCD Knockoffs

Figure: n = 300, p = 1000 and Bayesian linear model with 60 expected variables

Inference is correct even if prior is wrong or MCMC has not converged

slide-101
SLIDE 101

Partial summary

No valid p-values even for logistic regression Shifts the burden of knowledge to X (covariates); makes sense in many contexts Robustness: simulations show properties of inference hold even when the model for X is only approximately right. Always have access to these diagnostic checks (later) When assumptions are appropriate gain a lot of power, and can use sophisticated selection techniques

slide-102
SLIDE 102

How to Construct Knockoffs for Hidden Markov Models

Joint with Sabatti & Sesia

slide-103
SLIDE 103

A general construction (C., Fan, Janson and Lv, ’16)

(X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) d = (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)

Algorithm Sequential Conditional Independent Pairs for j = {1, . . . , p} do Sample ˜ Xj from law of Xj | X-j, ˜ X1:j−1 end

slide-104
SLIDE 104

A general construction (C., Fan, Janson and Lv, ’16)

(X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) d = (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)

Algorithm Sequential Conditional Independent Pairs for j = {1, . . . , p} do Sample ˜ Xj from law of Xj | X-j, ˜ X1:j−1 end e.g. p = 3

slide-105
SLIDE 105

A general construction (C., Fan, Janson and Lv, ’16)

(X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) d = (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)

Algorithm Sequential Conditional Independent Pairs for j = {1, . . . , p} do Sample ˜ Xj from law of Xj | X-j, ˜ X1:j−1 end e.g. p = 3 Sample ˜ X1 from X1 | X−1

slide-106
SLIDE 106

A general construction (C., Fan, Janson and Lv, ’16)

(X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) d = (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)

Algorithm Sequential Conditional Independent Pairs for j = {1, . . . , p} do Sample ˜ Xj from law of Xj | X-j, ˜ X1:j−1 end e.g. p = 3 Sample ˜ X1 from X1 | X−1 Joint law of X, ˜ X1 is known

slide-107
SLIDE 107

A general construction (C., Fan, Janson and Lv, ’16)

(X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) d = (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)

Algorithm Sequential Conditional Independent Pairs for j = {1, . . . , p} do Sample ˜ Xj from law of Xj | X-j, ˜ X1:j−1 end e.g. p = 3 Sample ˜ X1 from X1 | X−1 Joint law of X, ˜ X1 is known Sample ˜ X2 from X2 | X−2, ˜ X1

slide-108
SLIDE 108

A general construction (C., Fan, Janson and Lv, ’16)

(X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) d = (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)

Algorithm Sequential Conditional Independent Pairs for j = {1, . . . , p} do Sample ˜ Xj from law of Xj | X-j, ˜ X1:j−1 end e.g. p = 3 Sample ˜ X1 from X1 | X−1 Joint law of X, ˜ X1 is known Sample ˜ X2 from X2 | X−2, ˜ X1 Joint law of X, ˜ X1:2 is known

slide-109
SLIDE 109

A general construction (C., Fan, Janson and Lv, ’16)

(X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) d = (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)

Algorithm Sequential Conditional Independent Pairs for j = {1, . . . , p} do Sample ˜ Xj from law of Xj | X-j, ˜ X1:j−1 end e.g. p = 3 Sample ˜ X1 from X1 | X−1 Joint law of X, ˜ X1 is known Sample ˜ X2 from X2 | X−2, ˜ X1 Joint law of X, ˜ X1:2 is known Sample ˜ X3 from X3 | X−3, ˜ X1:2

slide-110
SLIDE 110

A general construction (C., Fan, Janson and Lv, ’16)

(X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) d = (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)

Algorithm Sequential Conditional Independent Pairs for j = {1, . . . , p} do Sample ˜ Xj from law of Xj | X-j, ˜ X1:j−1 end e.g. p = 3 Sample ˜ X1 from X1 | X−1 Joint law of X, ˜ X1 is known Sample ˜ X2 from X2 | X−2, ˜ X1 Joint law of X, ˜ X1:2 is known Sample ˜ X3 from X3 | X−3, ˜ X1:2 Joint law of X, ˜ X is known and is pairwise exchangeable!

slide-111
SLIDE 111

A general construction (C., Fan, Janson and Lv, ’16)

(X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) d = (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)

Algorithm Sequential Conditional Independent Pairs for j = {1, . . . , p} do Sample ˜ Xj from law of Xj | X-j, ˜ X1:j−1 end e.g. p = 3 Sample ˜ X1 from X1 | X−1 Joint law of X, ˜ X1 is known Sample ˜ X2 from X2 | X−2, ˜ X1 Joint law of X, ˜ X1:2 is known Sample ˜ X3 from X3 | X−3, ˜ X1:2 Joint law of X, ˜ X is known and is pairwise exchangeable! Usually not practical, easy in some cases (e.g. Markov chains)

slide-112
SLIDE 112

Knockoff copies of a Markov chain

X = (X1, X2, . . . , Xp) is a Markov chain p(X1, . . . , Xp) = q1(X1)

p

  • j=2

Qj(Xj|Xj−1) (X ∼ MC (q1, Q)) X1 X2 X3 X4 ˜ X1 ˜ X2 ˜ X3 ˜ X4

Observed variables Knockoff variables

slide-113
SLIDE 113

Knockoff copies of a Markov chain

X = (X1, X2, . . . , Xp) is a Markov chain p(X1, . . . , Xp) = q1(X1)

p

  • j=2

Qj(Xj|Xj−1) (X ∼ MC (q1, Q)) X1 X2 X3 X4 ˜ X1 ˜ X2 ˜ X3 ˜ X4

Observed variables Knockoff variables

slide-114
SLIDE 114

Knockoff copies of a Markov chain

X = (X1, X2, . . . , Xp) is a Markov chain p(X1, . . . , Xp) = q1(X1)

p

  • j=2

Qj(Xj|Xj−1) (X ∼ MC (q1, Q)) X1 X2 X3 X4 ˜ X1 ˜ X2 ˜ X3 ˜ X4

Observed variables Knockoff variables

slide-115
SLIDE 115

Knockoff copies of a Markov chain

X = (X1, X2, . . . , Xp) is a Markov chain p(X1, . . . , Xp) = q1(X1)

p

  • j=2

Qj(Xj|Xj−1) (X ∼ MC (q1, Q)) X1 X2 X3 X4 ˜ X1 ˜ X2 ˜ X3 ˜ X4

Observed variables Knockoff variables

slide-116
SLIDE 116

Knockoff copies of a Markov chain

X = (X1, X2, . . . , Xp) is a Markov chain p(X1, . . . , Xp) = q1(X1)

p

  • j=2

Qj(Xj|Xj−1) (X ∼ MC (q1, Q)) X1 X2 X3 X4 ˜ X1 ˜ X2 ˜ X3 ˜ X4

Observed variables Knockoff variables

slide-117
SLIDE 117

Knockoff copies of a Markov chain

X = (X1, X2, . . . , Xp) is a Markov chain p(X1, . . . , Xp) = q1(X1)

p

  • j=2

Qj(Xj|Xj−1) (X ∼ MC (q1, Q)) X1 X2 X3 X4 ˜ X1 ˜ X2 ˜ X3 ˜ X4

Observed variables Knockoff variables

General algorithm can be implemented efficiently in the case of a Markov chain

slide-118
SLIDE 118

Knockoff copies of a Markov chain

X = (X1, X2, . . . , Xp) is a Markov chain p(X1, . . . , Xp) = q1(X1)

p

  • j=2

Qj(Xj|Xj−1) (X ∼ MC (q1, Q)) X1 X2 X3 X4 ˜ X1 ˜ X2 ˜ X3 ˜ X4

Observed variables Knockoff variables

General algorithm can be implemented efficiently in the case of a Markov chain

slide-119
SLIDE 119

Knockoff copies of a Markov chain

X = (X1, X2, . . . , Xp) is a Markov chain p(X1, . . . , Xp) = q1(X1)

p

  • j=2

Qj(Xj|Xj−1) (X ∼ MC (q1, Q)) X1 X2 X3 X4 ˜ X1 ˜ X2 ˜ X3 ˜ X4

Observed variables Knockoff variables

General algorithm can be implemented efficiently in the case of a Markov chain

slide-120
SLIDE 120

Knockoff copies of a Markov chain

X = (X1, X2, . . . , Xp) is a Markov chain p(X1, . . . , Xp) = q1(X1)

p

  • j=2

Qj(Xj|Xj−1) (X ∼ MC (q1, Q)) X1 X2 X3 X4 ˜ X1 ˜ X2 ˜ X3 ˜ X4

Observed variables Knockoff variables

General algorithm can be implemented efficiently in the case of a Markov chain

slide-121
SLIDE 121

Knockoff copies of a Markov chain

X = (X1, X2, . . . , Xp) is a Markov chain p(X1, . . . , Xp) = q1(X1)

p

  • j=2

Qj(Xj|Xj−1) (X ∼ MC (q1, Q)) X1 X2 X3 X4 ˜ X1 ˜ X2 ˜ X3 ˜ X4

Observed variables Knockoff variables

General algorithm can be implemented efficiently in the case of a Markov chain

slide-122
SLIDE 122

Knockoff copies of a Markov chain

X = (X1, X2, . . . , Xp) is a Markov chain p(X1, . . . , Xp) = q1(X1)

p

  • j=2

Qj(Xj|Xj−1) (X ∼ MC (q1, Q)) X1 X2 X3 X4 ˜ X1 ˜ X2 ˜ X3 ˜ X4

Observed variables Knockoff variables

General algorithm can be implemented efficiently in the case of a Markov chain

slide-123
SLIDE 123

Recursive update of normalizing constants

slide-124
SLIDE 124

Hidden Markov Models (HMMs)

X = (X1, X2, . . . , Xp) is a HMM if

  • H ∼ MC (q1, Q)

(latent Markov chain) Xj|H ∼ Xj|Hj

ind.

∼ fj(Xj; Hj) (emission distribution) H1 H2 H3 X1 X2 X3

slide-125
SLIDE 125

Hidden Markov Models (HMMs)

X = (X1, X2, . . . , Xp) is a HMM if

  • H ∼ MC (q1, Q)

(latent Markov chain) Xj|H ∼ Xj|Hj

ind.

∼ fj(Xj; Hj) (emission distribution) H1 H2 H3 X1 X2 X3

slide-126
SLIDE 126

Hidden Markov Models (HMMs)

X = (X1, X2, . . . , Xp) is a HMM if

  • H ∼ MC (q1, Q)

(latent Markov chain) Xj|H ∼ Xj|Hj

ind.

∼ fj(Xj; Hj) (emission distribution) H1 H2 H3 X1 X2 X3

slide-127
SLIDE 127

Hidden Markov Models (HMMs)

X = (X1, X2, . . . , Xp) is a HMM if

  • H ∼ MC (q1, Q)

(latent Markov chain) Xj|H ∼ Xj|Hj

ind.

∼ fj(Xj; Hj) (emission distribution) H1 H2 H3 X1 X2 X3

slide-128
SLIDE 128

Hidden Markov Models (HMMs)

X = (X1, X2, . . . , Xp) is a HMM if

  • H ∼ MC (q1, Q)

(latent Markov chain) Xj|H ∼ Xj|Hj

ind.

∼ fj(Xj; Hj) (emission distribution) H1 H2 H3 X1 X2 X3

slide-129
SLIDE 129

Hidden Markov Models (HMMs)

X = (X1, X2, . . . , Xp) is a HMM if

  • H ∼ MC (q1, Q)

(latent Markov chain) Xj|H ∼ Xj|Hj

ind.

∼ fj(Xj; Hj) (emission distribution) H1 H2 H3 X1 X2 X3 The H variables are latent and only the X variables are observed

slide-130
SLIDE 130

Haplotypes and genotypes

Haplotype Set of alleles on a single chromosome 0/1 for common/rare allele Genotype Unordered pair of alleles at a single marker

0 1 0 1 1 0 1 1 0 0 1 1 1 2 0 1 2 1 + Haplotype M Haplotype P Genotypes

slide-131
SLIDE 131

A phenomenological HMM for haplotype & genotype data

Figure: Six haplotypes: color indicates ‘ancestor’ at each marker (Scheet, ’06)

slide-132
SLIDE 132

A phenomenological HMM for haplotype & genotype data

Figure: Six haplotypes: color indicates ‘ancestor’ at each marker (Scheet, ’06)

Haplotype estimation/phasing (Browning, ’11) Imputation of missing SNPs (Marchini, ’10) fastPHASE (Scheet, ’06) IMPUTE (Marchini, ’07) MaCH (Li, ’10)

slide-133
SLIDE 133

A phenomenological HMM for haplotype & genotype data

Figure: Six haplotypes: color indicates ‘ancestor’ at each marker (Scheet, ’06)

Haplotype estimation/phasing (Browning, ’11) Imputation of missing SNPs (Marchini, ’10) fastPHASE (Scheet, ’06) IMPUTE (Marchini, ’07) MaCH (Li, ’10) New application of same HMM: generation of knockoff copies of genotypes! Each genotype: sum of two independent HMM haplotype sequences

slide-134
SLIDE 134

Knockoff copies of a hidden Markov model

Theorem (Sesia, Sabatti, C. ’17)

A knockoff copy of ˜ X of X can be constructed as H1 H2 H3 X1 X2 X3 ˜ H1 ˜ H2 ˜ H1 ˜ X1 ˜ X2 ˜ X3

  • bserved variables

latent variables knockoff latent variables knockoff variables

slide-135
SLIDE 135

Knockoff copies of a hidden Markov model

Theorem (Sesia, Sabatti, C. ’17)

A knockoff copy of ˜ X of X can be constructed as (1) Sample H from p(H|X) using forward-backward algorithm H1 H2 H3 X1 X2 X3 ˜ H1 ˜ H2 ˜ H1 ˜ X1 ˜ X2 ˜ X3

  • bserved variables

imputed latent variables knockoff latent variables knockoff variables

slide-136
SLIDE 136

Knockoff copies of a hidden Markov model

Theorem (Sesia, Sabatti, C. ’17)

A knockoff copy of ˜ X of X can be constructed as (1) Sample H from p(H|X) using forward-backward algorithm (2) Generate a knockoff ˜ H of H using the SCIP algorithm for a Markov chain H1 H2 H3 X1 X2 X3 ˜ H1 ˜ H2 ˜ H1 ˜ X1 ˜ X2 ˜ X3

  • bserved variables

imputed latent variables knockoff latent variables knockoff variables

slide-137
SLIDE 137

Knockoff copies of a hidden Markov model

Theorem (Sesia, Sabatti, C. ’17)

A knockoff copy of ˜ X of X can be constructed as (1) Sample H from p(H|X) using forward-backward algorithm (2) Generate a knockoff ˜ H of H using the SCIP algorithm for a Markov chain (3) Sample ˜ X from the emission distribution of X given H = ˜ H H1 H2 H3 X1 X2 X3 ˜ H1 ˜ H2 ˜ H1 ˜ X1 ˜ X2 ˜ X3

  • bserved variables

imputed latent variables knockoff latent variables knockoff variables

slide-138
SLIDE 138

Some Examples

slide-139
SLIDE 139

Simulations with synthetic Markov chain

Markov chain covariates with 5 hidden states. Binomial response

4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP

Figure: Power and FDP over 100 repetitions (true FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj

slide-140
SLIDE 140

Robustness

Markov chain covariates with 5 hidden states. Binomial response

4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP

Figure: Power and FDP over 100 repetitions (estimated FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj

slide-141
SLIDE 141

Simulations with synthetic HMM

HMM covariates with latent “clockwise” Markov chain. Binomial response

3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP

Figure: Power and FDP over 100 repetitions (true FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj

slide-142
SLIDE 142

Robustness

HMM covariates with latent “clockwise” Markov chain. Binomial response

3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP

Figure: Power and FDP over 100 repetitions (estimated FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj

slide-143
SLIDE 143

Out-of-sample parameter estimation

Inhomogeneous Markov chain covariates with 5 hidden states. Binomial response

10 25 50 75 100 500 1000 5000 10000 Number of unsupervised observations 0.0 0.2 0.4 0.6 0.8 1.0 Power 10 25 50 75 100 500 1000 5000 10000 Number of unsupervised observations 0.0 0.2 0.4 0.6 0.8 1.0 FDP

Figure: Power and FDP over 100 repetitions (estimated FX from independent dataset) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj

slide-144
SLIDE 144

Genetic Data Analysis

slide-145
SLIDE 145

Genetic analysis

Crohn’s disease (CD) Wellcome Trust Case Control Consortium (WTCCC) n ≈ 5, 000 subjects (≈ 2, 000 patients, ≈ 3, 000 healthy controls) p ≈ 400, 000 SNPs Previously analyzed in WTCCC (2007)

slide-146
SLIDE 146

Genetic analysis

Crohn’s disease (CD) Wellcome Trust Case Control Consortium (WTCCC) n ≈ 5, 000 subjects (≈ 2, 000 patients, ≈ 3, 000 healthy controls) p ≈ 400, 000 SNPs Previously analyzed in WTCCC (2007) Lipid traits (HDL, LDL cholesterol) Northern Finland 1966 Birth Cohort study of metabolic syndrome (NFBC) n ≈ 4, 700 subjects p ≈ 330, 000 SNPs Previously analyzed in Sabatti et al. (2009)

slide-147
SLIDE 147

High-level results

Knockoffs with nominal FDR level of 10%

slide-148
SLIDE 148

High-level results

Knockoffs with nominal FDR level of 10% Power is much higher: Dataset Number of discoveries Original study Knockoffs (average) CD 9 22.8 HDL 5 8 LDL 6 9.8

slide-149
SLIDE 149

High-level results

Knockoffs with nominal FDR level of 10% Power is much higher: Dataset Number of discoveries Original study Knockoffs (average) CD 9 22.8 HDL 5 8 LDL 6 9.8 Quite a few of the discoveries made by knockoffs were confirmed by larger GWAS (Franke et al., ’10, Willer et al., ’13)

slide-150
SLIDE 150

High-level results

Knockoffs with nominal FDR level of 10% Power is much higher: Dataset Number of discoveries Original study Knockoffs (average) CD 9 22.8 HDL 5 8 LDL 6 9.8 Quite a few of the discoveries made by knockoffs were confirmed by larger GWAS (Franke et al., ’10, Willer et al., ’13) Knockoffs made a number of new discoveries

slide-151
SLIDE 151

High-level results

Knockoffs with nominal FDR level of 10% Power is much higher: Dataset Number of discoveries Original study Knockoffs (average) CD 9 22.8 HDL 5 8 LDL 6 9.8 Quite a few of the discoveries made by knockoffs were confirmed by larger GWAS (Franke et al., ’10, Willer et al., ’13) Knockoffs made a number of new discoveries Expect some (roughly 10%) of these to be false discoveries

slide-152
SLIDE 152

High-level results

Knockoffs with nominal FDR level of 10% Power is much higher: Dataset Number of discoveries Original study Knockoffs (average) CD 9 22.8 HDL 5 8 LDL 6 9.8 Quite a few of the discoveries made by knockoffs were confirmed by larger GWAS (Franke et al., ’10, Willer et al., ’13) Knockoffs made a number of new discoveries Expect some (roughly 10%) of these to be false discoveries It is likely that many of these correspond to true discoveries

slide-153
SLIDE 153

High-level results

Knockoffs with nominal FDR level of 10% Power is much higher: Dataset Number of discoveries Original study Knockoffs (average) CD 9 22.8 HDL 5 8 LDL 6 9.8 Quite a few of the discoveries made by knockoffs were confirmed by larger GWAS (Franke et al., ’10, Willer et al., ’13) Knockoffs made a number of new discoveries Expect some (roughly 10%) of these to be false discoveries It is likely that many of these correspond to true discoveries Evidence from independent studies about adjacent genes shows some of the top unconfirmed hits to be promising candidates

slide-154
SLIDE 154

Selection frequency SNP (cluster size) Chr. Position range (Mb) Franke et

  • al. ’10

WTCCC ’07 100% rs11209026 (2) 1 67.31–67.42 yes yes 99% rs6431654 (20) 2 233.94–234.11 yes yes 98% rs6688532 (33) 1 169.4–169.65 yes 97% rs17234657 (1) 5 40.44–40.44 yes yes 95% rs11805303 (16) 1 67.31–67.46 yes yes 91% rs7095491 (18) 10 101.26–101.32 yes yes 91% rs3135503 (16) 16 49.28–49.36 yes yes 81% rs7768538 (1145) 6 25.19–32.91 yes yes 80% rs6601764 (1) 10 3.85–3.85 yes 75% rs7655059 (5) 4 89.5–89.53 73% rs6500315 (4) 16 49.03–49.07 yes yes 72% rs2738758 (5) 20 61.71–61.82 yes 70% rs7726744 (46) 5 40.35–40.71 yes yes 68% rs11627513 (7) 14 96.61–96.63 66% rs4246045 (46) 5 150.07–150.41 yes yes 62% rs9783122 (234) 10 106.43–107.61 61% rs6825958 (3) 4 55.73–55.77

Table: SNP clusters found to be important for CD over 100 repetitions of knockoffs.

slide-155
SLIDE 155

Selection frequency SNP (cluster size) Chr. Position range (Mb) Confirmed in Willer et al. ’13 Found in Sabatti et al. ’09 100% rs1532085 (4) 15 58.68–58.7 yes yes 100% rs7499892 (1) 16 57.01–57.01 yes yes 100% rs1800961 (1) 20 43.04–43.04 yes 99% rs1532624 (2) 16 56.99–57.01 yes yes 95% rs255049 (142) 16 66.41–69.41 yes yes

Table: SNP clusters found to be important for HDL over 100 repetitions of knockoffs.

Selection frequency SNP (cluster size) Chr. Position range (Mb) Confirmed in Willer et al. ’13 Found in Sabatti et al. ’09 99% rs4844614 (34) 1 207.3–207.88 yes 97% rs646776 (5) 1 109.8–109.82 yes yes 97% rs2228671 (2) 19 11.2–11.21 yes yes 94% rs157580 (4) 19 45.4–45.41 yes yes 92% rs557435 (21) 1 55.52–55.72 yes 80% rs10198175 (1) 2 21.13–21.13 yes yes 76% rs10953541 (58) 7 106.48–107.3 62% rs6575501 (1) 14 95.64–95.64

Table: SNP clusters found to be important for LDL over 100 repetitions of knockoffs.

slide-156
SLIDE 156

HDL 5 10 15 20 25 Number of discoveries LDL 5 10 15 20 25 CD 10 20 30 40 50 60 Trait HDL 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of confirmed discoveries LDL CD Trait

Figure: Number of discoveries made on different GWAS datasets (left) and proportion of discoveries confirmed by a meta-analysis (right). Red lines correspond to results published in papers that first analyzed our datasets

slide-157
SLIDE 157

Data analysis issues

(1) Estimate distribution of SNPs (HMM) to build knockoffs (2) Highly correlated SNPs

slide-158
SLIDE 158

Data analysis issues

(1) Estimate distribution of SNPs (HMM) to build knockoffs (2) Highly correlated SNPs (1) Estimating the HMM Methodology of Scheet and Stephens ’06 Fitted with fastPHASE (EM), K ≈ 10 possible hidden states For each individual, making a knockoff copy of 70,000 SNPs takes about 1.3 sec on Intel Xeon CPU (2.6GHz) (after parameter estimation)

slide-159
SLIDE 159

Highly correlated SNPs

Hard to choose between two or more nearly-identical variables if the data supports at least one of them being selected

SNPs

slide-160
SLIDE 160

Clustering

SNPs

slide-161
SLIDE 161

Clustering

Cluster

Cluster SNPs using estimated correlations as similarity measure and single-linkage cutoff of 0.5 settle for discovering important SNP clusters among 71,145 candidates for CD and 59,005 for cholesterol

slide-162
SLIDE 162

Clustering

Representatives

Cluster SNPs using estimated correlations as similarity measure and single-linkage cutoff of 0.5 settle for discovering important SNP clusters among 71,145 candidates for CD and 59,005 for cholesterol Cluster variables? Choose a representative SNP from each cluster (see also Reid and Tibshirani, ’15) approximate null: cluster rep ⊥ ⊥ Y | other reps

slide-163
SLIDE 163

Clustering

Representatives

Cluster SNPs using estimated correlations as similarity measure and single-linkage cutoff of 0.5 settle for discovering important SNP clusters among 71,145 candidates for CD and 59,005 for cholesterol Cluster variables? Choose a representative SNP from each cluster (see also Reid and Tibshirani, ’15) approximate null: cluster rep ⊥ ⊥ Y | other reps Which rep? Most significant SNP as computed on 20% of the samples

slide-164
SLIDE 164

Clustering

Representatives

Cluster SNPs using estimated correlations as similarity measure and single-linkage cutoff of 0.5 settle for discovering important SNP clusters among 71,145 candidates for CD and 59,005 for cholesterol Cluster variables? Choose a representative SNP from each cluster (see also Reid and Tibshirani, ’15) approximate null: cluster rep ⊥ ⊥ Y | other reps Which rep? Most significant SNP as computed on 20% of the samples Safe data re-use (optimize power) as in Barber and C. (16)

slide-165
SLIDE 165

Safe data re-use

Used for selecting reps and safely re-used for inference Used only for inference We used an independent split of the data to select representative SNPs

X(0) X(1) ˜ X(1) ˜ X X X(0)

+ +

__ __

+ + +

__

+ +

__

|W|

if null

Re-use data to improve ordering but not to compute signs (1-bit p-values)

slide-166
SLIDE 166

Simulations with genetic covariates

Real genetic covariates X Logistic conditional model Y | X with 60 variables

8 10 12 14 16 18 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 8 10 12 14 16 18 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP

slide-167
SLIDE 167

Simulations with genetic covariates

Real genetic covariates X Logistic conditional model Y | X with 60 variables

8 10 12 14 16 18 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 8 10 12 14 16 18 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP

Figure: Power and FDP over 100 repetitions

Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj, target FDR: α = 0.1

slide-168
SLIDE 168

Diagnostic plot: simulation with data from Chromosome 1

Feature importance Zj = |ˆ βj(λCV)|

  • 2000

4000 6000 8000 10000 0.00 0.05 0.10 0.15 Variables Feature Importance

slide-169
SLIDE 169

Diagnostic plot: simulation with data from Chromosome 1

Feature importance Zj = |ˆ βj(λCV)|

2000 4000 6000 8000 10000 0.00 0.05 0.10 0.15 Variables Feature Importance

slide-170
SLIDE 170

Results of data analysis

Selection frequency SNP (cluster size) Chr. Position range (Mb) Franke et

  • al. ’10

WTCCC ’07 100% rs11209026 (2) 1 67.31–67.42 yes yes 99% rs6431654 (20) 2 233.94–234.11 yes yes 98% rs6688532 (33) 1 169.4–169.65 yes 97% rs17234657 (1) 5 40.44–40.44 yes yes 95% rs11805303 (16) 1 67.31–67.46 yes yes 91% rs7095491 (18) 10 101.26–101.32 yes yes 91% rs3135503 (16) 16 49.28–49.36 yes yes 81% rs7768538 (1145) 6 25.19–32.91 yes yes 80% rs6601764 (1) 10 3.85–3.85 yes 75% rs7655059 (5) 4 89.5–89.53 73% rs6500315 (4) 16 49.03–49.07 yes yes 72% rs2738758 (5) 20 61.71–61.82 yes 70% rs7726744 (46) 5 40.35–40.71 yes yes 68% rs11627513 (7) 14 96.61–96.63 66% rs4246045 (46) 5 150.07–150.41 yes yes 62% rs9783122 (234) 10 106.43–107.61 61% rs6825958 (3) 4 55.73–55.77

Table: SNP clusters found to be important for CD over 100 repetitions of knockoffs.

slide-171
SLIDE 171

Selection frequency SNP (cluster size) Chr. Position range (Mb) Confirmed in Willer et al. ’13 Found in Sabatti et al. ’09 100% rs1532085 (4) 15 58.68–58.7 yes yes 100% rs7499892 (1) 16 57.01–57.01 yes yes 100% rs1800961 (1) 20 43.04–43.04 yes 99% rs1532624 (2) 16 56.99–57.01 yes yes 95% rs255049 (142) 16 66.41–69.41 yes yes

Table: SNP clusters found to be important for HDL over 100 repetitions of knockoffs.

Selection frequency SNP (cluster size) Chr. Position range (Mb) Confirmed in Willer et al. ’13 Found in Sabatti et al. ’09 99% rs4844614 (34) 1 207.3–207.88 yes 97% rs646776 (5) 1 109.8–109.82 yes yes 97% rs2228671 (2) 19 11.2–11.21 yes yes 94% rs157580 (4) 19 45.4–45.41 yes yes 92% rs557435 (21) 1 55.52–55.72 yes 80% rs10198175 (1) 2 21.13–21.13 yes yes 76% rs10953541 (58) 7 106.48–107.3 62% rs6575501 (1) 14 95.64–95.64

Table: SNP clusters found to be important for LDL over 100 repetitions of knockoffs.

slide-172
SLIDE 172

Summary and open questions

Knockoffs offers finite sample inferential properties in subtle and important problems Knockoffs is a powerful, flexible, and robust solution whenever there is considerable outside information on dist. of X such as GWAS Knockoffs addresses the replicability issue Where is the burden of knowledge?

slide-173
SLIDE 173

Summary and open questions

Knockoffs offers finite sample inferential properties in subtle and important problems Knockoffs is a powerful, flexible, and robust solution whenever there is considerable outside information on dist. of X such as GWAS Knockoffs addresses the replicability issue Where is the burden of knowledge? Robustness theory (Barber, Samworth and C.) Derandomization (multiple knockoffs) Knockoff constructions and statistics for other applications

slide-174
SLIDE 174

What’s happening in selective inference III?

Lecture 3 (Thu. 8:30 a.m.)

Other views on selective inference: geography & vignettes False coverage rate (Benjamini & Yekutieli) POSI (Berk, Brown, Buja, Zhang, Zhao) Inference after Lasso (Taylor & al.) Selective hypothesis testing (Fithian et al.)

slide-175
SLIDE 175

Thank You!

slide-176
SLIDE 176

Derandomization

Combine information from mutiple knockoffs: who’s consistently showing up?

9

2 7 3 4 1 5 6 8

9 2 4 3 7 1 5 6 8

9 2 7 3 4 5 6 8

|W| 9 2 7 3 4 1 5 6 8

1

Figure: Cartoon representation of W’s from different sample realizations of knockoffs

slide-177
SLIDE 177

Sampling ˜ X1 p(X1|X−1) = p(X1|X2)

slide-178
SLIDE 178

Sampling ˜ X1 p(X1|X−1) = p(X1|X2) = p(X1, X2) p(X2)

slide-179
SLIDE 179

Sampling ˜ X1 p(X1|X−1) = p(X1|X2) = p(X1, X2) p(X2) = q1(X1) Q2(X2|X1) Z1(X2) Z1(z) =

  • u

q1(u) Q2(z|u)

slide-180
SLIDE 180

Sampling ˜ X1 p(X1|X−1) = p(X1|X2) = p(X1, X2) p(X2) = q1(X1) Q2(X2|X1) Z1(X2) Z1(z) =

  • u

q1(u) Q2(z|u) Sampling ˜ X2 p(X2|X−2, ˜ X1) = p(X2|X1, X3, ˜ X1)

slide-181
SLIDE 181

Sampling ˜ X1 p(X1|X−1) = p(X1|X2) = p(X1, X2) p(X2) = q1(X1) Q2(X2|X1) Z1(X2) Z1(z) =

  • u

q1(u) Q2(z|u) Sampling ˜ X2 p(X2|X−2, ˜ X1) = p(X2|X1, X3, ˜ X1) ∝ Q2(X2|X1) Q3(X3|X2) Q2(X2| ˜ X1) Z1(X2)

slide-182
SLIDE 182

Sampling ˜ X1 p(X1|X−1) = p(X1|X2) = p(X1, X2) p(X2) = q1(X1) Q2(X2|X1) Z1(X2) Z1(z) =

  • u

q1(u) Q2(z|u) Sampling ˜ X2 p(X2|X−2, ˜ X1) = p(X2|X1, X3, ˜ X1) ∝ Q2(X2|X1) Q3(X3|X2) Q2(X2| ˜ X1) Z1(X2) normalization constant Z2(X3) Z2(z) =

  • u

Q2(u|X1) Q3(z|u) Q2(u| ˜ X1) Z1(u)

slide-183
SLIDE 183

Sampling ˜ X3 p(X3|X−3, ˜ X1, ˜ X2) = p(X3|X2, X4, ˜ X1, ˜ X2)

slide-184
SLIDE 184

Sampling ˜ X3 p(X3|X−3, ˜ X1, ˜ X2) = p(X3|X2, X4, ˜ X1, ˜ X2) ∝ Q3(X3|X2) Q4(X4|X3) Q3(X3| ˜ X2) Z2(X3)

slide-185
SLIDE 185

Sampling ˜ X3 p(X3|X−3, ˜ X1, ˜ X2) = p(X3|X2, X4, ˜ X1, ˜ X2) ∝ Q3(X3|X2) Q4(X4|X3) Q3(X3| ˜ X2) Z2(X3) normalization constant Z3(X4) Z3(z) =

  • u

Q3(u|X2) Q4(z|u) Q3(u| ˜ X2) Z2(u)

slide-186
SLIDE 186

Sampling ˜ X3 p(X3|X−3, ˜ X1, ˜ X2) = p(X3|X2, X4, ˜ X1, ˜ X2) ∝ Q3(X3|X2) Q4(X4|X3) Q3(X3| ˜ X2) Z2(X3) normalization constant Z3(X4) Z3(z) =

  • u

Q3(u|X2) Q4(z|u) Q3(u| ˜ X2) Z2(u) And so on sampling ˜ Xj ... Computationally efficient O(p)