Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, - - PowerPoint PPT Presentation

sailing through data discoveries and mirages
SMART_READER_LITE
LIVE PREVIEW

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, - - PowerPoint PPT Presentation

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, Stanford University 2018 Machine Learning Summer School, Buenos Aires, June 2018 Robustness Robustness 1.00 1.00 Exact Cov 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25


slide-1
SLIDE 1

Sailing Through Data: Discoveries and Mirages

Emmanuel Cand` es, Stanford University 2018 Machine Learning Summer School, Buenos Aires, June 2018

slide-2
SLIDE 2

Robustness

slide-3
SLIDE 3

Robustness

  • Exact Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-4
SLIDE 4

Robustness

  • Exact Cov
  • Graph. Lasso

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-5
SLIDE 5

Robustness

  • Exact Cov
  • Graph. Lasso

50% Emp. Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-6
SLIDE 6

Robustness

  • Exact Cov
  • Graph. Lasso

50% Emp. Cov 62.5% Emp. Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-7
SLIDE 7

Robustness

  • Exact Cov
  • Graph. Lasso

50% Emp. Cov 62.5% Emp. Cov 75% Emp. Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-8
SLIDE 8

Robustness

  • Exact Cov
  • Graph. Lasso

50% Emp. Cov 62.5% Emp. Cov 75% Emp. Cov 87.5% Emp. Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-9
SLIDE 9

Robustness

  • Exact Cov
  • Graph. Lasso

50% Emp. Cov 62.5% Emp. Cov 75% Emp. Cov 87.5% Emp. Cov 100% Emp. Cov

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error Power

  • 0.00

0.25 0.50 0.75 1.00 0.0 0.5 1.0

Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

slide-10
SLIDE 10

Simulations with synthetic Markov chain

Markov chain covariates with 5 hidden states. Binomial response

4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP

Figure: Power and FDP over 100 repetitions (true FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj

slide-11
SLIDE 11

Robustness

Markov chain covariates with 5 hidden states. Binomial response

4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP

Figure: Power and FDP over 100 repetitions (estimated FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj

slide-12
SLIDE 12

Simulations with synthetic HMM

HMM covariates with latent “clockwise” Markov chain. Binomial response

3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP

Figure: Power and FDP over 100 repetitions (true FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj

slide-13
SLIDE 13

Robustness

HMM covariates with latent “clockwise” Markov chain. Binomial response

3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP

Figure: Power and FDP over 100 repetitions (estimated FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj

slide-14
SLIDE 14

Out-of-sample parameter estimation

Inhomogeneous Markov chain covariates with 5 hidden states. Binomial response

10 25 50 75 100 500 1000 5000 10000 Number of unsupervised observations 0.0 0.2 0.4 0.6 0.8 1.0 Power 10 25 50 75 100 500 1000 5000 10000 Number of unsupervised observations 0.0 0.2 0.4 0.6 0.8 1.0 FDP

Figure: Power and FDP over 100 repetitions (estimated FX from independent dataset) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj

slide-15
SLIDE 15

Model-X knockoff variables (robust version)

i.i.d. samples from PXY

  • Distr. PX of X only ‘approx’ known
  • Distr. PY |X of Y | X completely unknown
slide-16
SLIDE 16

Model-X knockoff variables (robust version)

i.i.d. samples from PXY

  • Distr. PX of X only ‘approx’ known
  • Distr. PY |X of Y | X completely unknown

Knockoffs wrt. to user input QX (Barber, C. and Samworth, ’18)

Originals X = (X1, . . . , Xp) Knockoffs ˜ X = ( ˜ X1, . . . , ˜ Xp)

slide-17
SLIDE 17

Model-X knockoff variables (robust version)

i.i.d. samples from PXY

  • Distr. PX of X only ‘approx’ known
  • Distr. PY |X of Y | X completely unknown

Knockoffs wrt. to user input QX (Barber, C. and Samworth, ’18)

Originals X = (X1, . . . , Xp) Knockoffs ˜ X = ( ˜ X1, . . . , ˜ Xp) (1) Pairwise exchangeability wrt QX: If X ∼ QX (X, ˜ X)swap(S)

d

= (X, ˜ X) e.g. (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)swap({2,3})

d

= (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3)

slide-18
SLIDE 18

Model-X knockoff variables (robust version)

i.i.d. samples from PXY

  • Distr. PX of X only ‘approx’ known
  • Distr. PY |X of Y | X completely unknown

Knockoffs wrt. to user input QX (Barber, C. and Samworth, ’18)

Originals X = (X1, . . . , Xp) Knockoffs ˜ X = ( ˜ X1, . . . , ˜ Xp) (1) Pairwise exchangeability wrt QX: If X ∼ QX (X, ˜ X)swap(S)

d

= (X, ˜ X) e.g. (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)swap({2,3})

d

= (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) (2) Ignore Y when constructing knockoffs: ˜ X ⊥ ⊥ Y | X

slide-19
SLIDE 19

Model-X knockoff variables (robust version)

i.i.d. samples from PXY

  • Distr. PX of X only ‘approx’ known
  • Distr. PY |X of Y | X completely unknown

Knockoffs wrt. to user input QX (Barber, C. and Samworth, ’18)

Originals X = (X1, . . . , Xp) Knockoffs ˜ X = ( ˜ X1, . . . , ˜ Xp) (1) Pairwise exchangeability wrt QX: If X ∼ QX (X, ˜ X)swap(S)

d

= (X, ˜ X) e.g. (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)swap({2,3})

d

= (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) (2) Ignore Y when constructing knockoffs: ˜ X ⊥ ⊥ Y | X Only require conditionals Q(Xj|X−j) which do not have to be compatible

slide-20
SLIDE 20

FDR control

ˆ S = {Wj ≥ τ} τ = min

  • t : 1+|{j : Wj ≤ −t}|

1 ∨ |{j : Wj ≥ t}|

  • FDP(t)

≤ q

  • t
slide-21
SLIDE 21

FDR control

ˆ S = {Wj ≥ τ} τ = min

  • t : 1+|{j : Wj ≤ −t}|

1 ∨ |{j : Wj ≥ t}|

  • FDP(t)

≤ q

  • t

Theorem (Barber and C. (’15))

If user-input QX is correct (QX = PX), then for knockoff+ E # false positives # selections

  • ≤ q
slide-22
SLIDE 22

Robustness of knockoffs?

Does exchangeability hold approx. when QX = PX?

+ +

__ __

+ + +

__

+ +

__

+ +

__

|W|

If PX = QX, coins are unbiased independent Problem: if PX = QX, coins may be (slighltly) biased (slightly) dependent

slide-23
SLIDE 23

KL divergence condition

The KL condition

  • KLj :=
  • i

log

  • Pj(Xij|Xi,−j) Qj(

Xij|Xi,−j) Qj(Xij|Xi,−j) Pj( Xij|Xi,−j)

  • ≤ ǫ

E[ KLj] = KL divergence between distributions of (Xj, Xj, X−j, X−j) & ( Xj, Xj, X−j, X−j)

slide-24
SLIDE 24

From KL condition to FDR control

Theorem (Barber, C. and Samworth (2018))

For any ǫ ≥ 0 E

  • # false positives j with

KLj ≤ ǫ # selections

  • ≤ q exp(ǫ)
slide-25
SLIDE 25

From KL condition to FDR control

Theorem (Barber, C. and Samworth (2018))

For any ǫ ≥ 0 E

  • # false positives j with

KLj ≤ ǫ # selections

  • ≤ q exp(ǫ)

Corollary

FDR ≤ min

ǫ≥0

  • q exp(ǫ) + P
  • max

null j

  • KLj > ǫ
  • Information theoretically optimal
slide-26
SLIDE 26

New directions

slide-27
SLIDE 27

ML inspired knockoffs

Joint with S. Bates, Y. Romano, M. Sesia and J. Zhou Knockoffs for graphical models Knockoffs via restricted Boltzmann machines Knockoffs via variational auto-encoders? Knockoffs via generative adversarial networks?

slide-28
SLIDE 28

Improving power?

Joint with Z. Ren and M. Sesia

slide-29
SLIDE 29

Derandomization

Combine information from mutiple knockoffs: who’s consistently showing up?

9

2 7 3 4 1 5 6 8

9 2 4 3 7 1 5 6 8

9 2 7 3 4 5 6 8

|W| 9 2 7 3 4 1 5 6 8

1

Figure: Cartoon representation of W’s from different sample realizations of knockoffs

slide-30
SLIDE 30

Knockoffs for Fixed Features

Joint with Barber

slide-31
SLIDE 31

Linear model

y =

  • j βjXj

+ z n × 1 n × p p × 1 n × 1 y ∼ N(Xβ, σ2I) Fixed design X Noise level σ unknown Multiple testing: Hj : βj = 0 (is jth variable in the model?) Identifiability = ⇒ p ≤ n Inference (FDR control) will hold conditionally on X

slide-32
SLIDE 32

Knockoff features (fixed X)

Originals Knockofgs

slide-33
SLIDE 33

Knockoff features (fixed X)

Originals Knockofgs

˜ X′

j ˜

Xk = X′

jXk

for all j, k ˜ X′

jXk = X′ jXk

for all j = k

slide-34
SLIDE 34

Knockoff features (fixed X)

Originals Knockofgs

˜ X′

j ˜

Xk = X′

jXk

for all j, k ˜ X′

jXk = X′ jXk

for all j = k No need for new data or experiment No knowledge of response y

slide-35
SLIDE 35

Knockoff construction (n ≥ 2p)

Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.

  • X

˜ X ′ X ˜ X

  • =
  • Σ

Σ − diag{s} Σ − diag{s} Σ

  • := G
slide-36
SLIDE 36

Knockoff construction (n ≥ 2p)

Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.

  • X

˜ X ′ X ˜ X

  • =
  • Σ

Σ − diag{s} Σ − diag{s} Σ

  • := G 0
slide-37
SLIDE 37

Knockoff construction (n ≥ 2p)

Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.

  • X

˜ X ′ X ˜ X

  • =
  • Σ

Σ − diag{s} Σ − diag{s} Σ

  • := G 0

G 0 ⇐ ⇒ diag{s} 0 2Σ − diag{s} 0

slide-38
SLIDE 38

Knockoff construction (n ≥ 2p)

Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.

  • X

˜ X ′ X ˜ X

  • =
  • Σ

Σ − diag{s} Σ − diag{s} Σ

  • := G 0

G 0 ⇐ ⇒ diag{s} 0 2Σ − diag{s} 0

Solution

˜ X = X(I − Σ−1 diag{s}) + ˜ UC ˜ U ∈ Rn×p with col. space orthogonal to that of X C′C Cholevsky factorization of 2 diag{s} − diag{s}Σ−1 diag{s} 0

slide-39
SLIDE 39

Knockoff construction (n ≥ 2p)

˜ X′

jXj = 1 − sj

(Standardized columns) Equi-correlated knockoffs sj = 2λmin(Σ) ∧ 1 Under equivariance, minimizes the value of |Xj, ˜ Xj|

slide-40
SLIDE 40

Knockoff construction (n ≥ 2p)

˜ X′

jXj = 1 − sj

(Standardized columns) Equi-correlated knockoffs sj = 2λmin(Σ) ∧ 1 Under equivariance, minimizes the value of |Xj, ˜ Xj| SDP knockoffs minimize

  • j |1 − sj|

subject to sj ≥ 0 diag{s} 2Σ Highly structured semidefinite program (SDP)

slide-41
SLIDE 41

Knockoff construction (n ≥ 2p)

˜ X′

jXj = 1 − sj

(Standardized columns) Equi-correlated knockoffs sj = 2λmin(Σ) ∧ 1 Under equivariance, minimizes the value of |Xj, ˜ Xj| SDP knockoffs minimize

  • j |1 − sj|

subject to sj ≥ 0 diag{s} 2Σ Highly structured semidefinite program (SDP) Other possibilities ...

slide-42
SLIDE 42

Why?

For null feature Xj X′

jy = X′ jXβ + X′ jz d

= ˜ X′

jXβ + ˜

X′

jz = ˜

X′

jy

slide-43
SLIDE 43

Why?

For null feature Xj X′

jy = X′ jXβ + X′ jz d

= ˜ X′

jXβ + ˜

X′

jz = ˜

X′

jy

slide-44
SLIDE 44

Why?

For any subset of nulls T [X ˜ X]′

swap(T ) y d

= [X ˜ X]′ y [X ˜ X]′

swap(T ) =

slide-45
SLIDE 45

Exchangeability of feature importance statistics

Sufficiency: (Z, ˜ Z) = z

  • X

˜ X ′ X ˜ X

  • ,
  • X

˜ X ′ y

  • Knockoff-agnostic: swapping originals and knockoffs =

⇒ swaps Z’s z(

  • X

˜ X

  • swap(T ), y) = (Z, ˜

Z)swap(T )

slide-46
SLIDE 46

Exchangeability of feature importance statistics

Sufficiency: (Z, ˜ Z) = z

  • X

˜ X ′ X ˜ X

  • ,
  • X

˜ X ′ y

  • Knockoff-agnostic: swapping originals and knockoffs =

⇒ swaps Z’s z(

  • X

˜ X

  • swap(T ), y) = (Z, ˜

Z)swap(T )

Theorem (Barber and C. (15))

For any subset T of nulls (Z, Z)swap(T )

d

= (Z, ˜ Z) = ⇒ FDR control (conditional on X)

Z1 Zp Z2 ˜ Zp ˜ Z2 ˜ Z1

slide-47
SLIDE 47

Telling the effect direction

[...] in classical statistics, the significance of comparisons (e. g., θ1 − θ2) is calibrated using Type I error rate, relying on the assumption that the true difference is zero, which makes no sense in many applications. [...] a more relevant framework in which a true comparison can be positive or negative, and, based on the data, you can state “θ1 > θ2 with confidence”, “θ2 > θ1 with confidence”, or “no claim with confidence”.

  • A. Gelman & F. Tuerlinckx
slide-48
SLIDE 48

Directional FDR

Are any effects exactly zero? FDRdir = E # selections with wrong effect direction # selections

  • Directional false discovery rate

Directional false discovery proportion

Directional FDR (Benjamini & Yekutieli, ’05) Sign errors (Type-S) (Gelman & Tuerlinckx, ’00) Important for misspecified models — exact sparsity unlikely

slide-49
SLIDE 49

Directional FDR control

(Xj − ˜ Xj)′y

ind

∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate

  • sgn((Xj − ˜

Xj)′y)

slide-50
SLIDE 50

Directional FDR control

(Xj − ˜ Xj)′y

ind

∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate

  • sgn((Xj − ˜

Xj)′y)

Theorem (Barber and C., ’16)

Exact same knockoff selection + sign estimate FDR ≤ FDRdir ≤ q

slide-51
SLIDE 51

Directional FDR control

(Xj − ˜ Xj)′y

ind

∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate

  • sgn((Xj − ˜

Xj)′y)

Theorem (Barber and C., ’16)

Exact same knockoff selection + sign estimate FDR ≤ FDRdir ≤ q + +

__ __

+ + +

__

+ +

__

null non null + +

__

|W| Null coin fips are unbiased

slide-52
SLIDE 52

Directional FDR control

(Xj − ˜ Xj)′y

ind

∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate

  • sgn((Xj − ˜

Xj)′y)

Theorem (Barber and C. (16))

Exact same knockoff selection + sign estimate FDR ≤ FDRdir ≤ q + +

__ __

+ + +

__

+ +

__

+ +

__

|W| Great subtlety: coin fips are now biased

slide-53
SLIDE 53

Empirical results

Features N(0, In), n = 3000, p = 1000 k = 30 variables with regression coefficients of magnitude 3.5

Method FDR (%) Power (%)

  • Theor. FDR

(nominal level q = 20%) control? Knockoff+ (equivariant) 14.40 60.99 Yes Knockoff (equivariant) 17.82 66.73 No Knockoff+ (SDP) 15.05 61.54 Yes Knockoff (SDP) 18.72 67.50 No BHq 18.70 48.88 No BHq + log-factor correction 2.20 19.09 Yes BHq with whitened noise 18.79 2.33 Yes

slide-54
SLIDE 54

Effect of signal amplitude

Same setup with k = 30 (q = 0.2)

2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 5 10 15 20 25 Amplitude A FDR (%)

  • Nominal level

Knockoff Knockoff+ BHq

  • 2.8

3.0 3.2 3.4 3.6 3.8 4.0 4.2 20 40 60 80 100 Amplitude A Power (%)

  • Knockoff

Knockoff+ BHq

slide-55
SLIDE 55

Effect of feature correlation

Features ∼ N(0, Θ) Θjk = ρ|j−k| n = 3000, p = 1000, and k = 30 and amplitude = 3.5

0.0 0.2 0.4 0.6 0.8 5 10 15 20 25 30 Feature correlation ρ FDR (%)

  • Nominal level

Knockoff Knockoff+ BHq

  • 0.0

0.2 0.4 0.6 0.8 20 40 60 80 100 Feature correlation ρ Power (%)

  • Knockoff

Knockoff+ BHq

slide-56
SLIDE 56

Fixed Design Knockoff Data Analysis

slide-57
SLIDE 57

HIV drug resistance

Drug type # drugs Sample size # protease or RT # mutations appearing positions genotyped ≥ 3 times in sample PI 6 848 99 209 NRTI 6 639 240 294 NNRTI 3 747 240 319 response y: log-fold-increase of lab-tested drug resistance covariate Xj: presence or absence of mutation #j Data from R. Shafer (Stanford) available at: http://hivdb.stanford.edu/pages/published_analysis/genophenoPNAS2006/

slide-58
SLIDE 58

HIV data

TSM list: mutations associated with the PI class of drugs in general, and is not specialized to the individual drugs in the class Results for PI type drugs

Knockoff BHq Data set size: n=768, p=201 # HIV−1 protease positions selected 5 10 15 20 25 30 35

Resistance to APV

Appear in TSM list Not in TSM list Knockoff BHq Data set size: n=329, p=147 5 10 15 20 25 30 35

Resistance to ATV

Knockoff BHq Data set size: n=826, p=208 5 10 15 20 25 30 35

Resistance to IDV

Knockoff BHq Data set size: n=516, p=184 5 10 15 20 25 30 35

Resistance to LPV

Knockoff BHq Data set size: n=843, p=209 5 10 15 20 25 30 35

Resistance to NFV

Knockoff BHq Data set size: n=825, p=208 5 10 15 20 25 30 35

Resistance to SQV

slide-59
SLIDE 59

HIV data

Results for NRTI type drugs Results for NNRTI type drugs

Knockoff BHq Data set size: n=633, p=292 # HIV−1 RT positions selected 5 10 15 20 25 30

Resistance to X3TC

Appear in TSM list Not in TSM list Knockoff BHq Data set size: n=628, p=294 5 10 15 20 25 30

Resistance to ABC

Knockoff BHq Data set size: n=630, p=292 5 10 15 20 25 30

Resistance to AZT

Knockoff BHq Data set size: n=630, p=293 5 10 15 20 25 30

Resistance to D4T

Knockoff BHq Data set size: n=632, p=292 5 10 15 20 25 30

Resistance to DDI

Knockoff BHq Data set size: n=353, p=218 5 10 15 20 25 30

Resistance to TDF

Knockoff BHq Data set size: n=732, p=311 # HIV−1 RT positions selected 5 10 15 20 25 30 35

Resistance to DLV

Appear in TSM list Not in TSM list Knockoff BHq Data set size: n=734, p=318 5 10 15 20 25 30 35

Resistance to EFV

Knockoff BHq Data set size: n=746, p=319 5 10 15 20 25 30 35

Resistance to NVP

slide-60
SLIDE 60

High-dimensional setting

n ≈ 5, 000 subjects p ≈ 330, 000 SNPs/vars to test

20 15 10 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 –log10 (P value) HDL cholesterol GALNT2 LPL ABCA1 MVK/MMAB LIPC LCAT LIPG CETP

p > n − → cannot construct knockoffs as before ˜ X′

j ˜

Xk = X′

jXk

∀ j, k ˜ X′

jXk = X′ jXk

∀ j = k = ⇒ ˜ Xj = Xj ∀j

slide-61
SLIDE 61

High dimensional knockoffs: screen and confirm

  • riginal data set
slide-62
SLIDE 62

High dimensional knockoffs: screen and confirm

  • riginal data set

exploratory

X(0) y(0)

screen on sample 1

slide-63
SLIDE 63

High dimensional knockoffs: screen and confirm

  • riginal data set

exploratory

X(0) y(0)

screen on sample 1

ry confirmatory

y(1) X(1) inference on sample 2

slide-64
SLIDE 64

High dimensional knockoffs: screen and confirm

  • riginal data set

exploratory

X(0) y(0)

screen on sample 1

ry confirmatory

y(1) X(1) inference on sample 2 Theory (Barber and C., ’16) Safe data re-use to improve power (Barber and C., ’16)

slide-65
SLIDE 65

Some extensions

y =

  • X1
  • n×p1

·β1 +

  • X2
  • n×p2

·β2 + · · · + N(0, σ2In) Group sparsity — build knockoffs at the group-wise level

Dai & Barber 2015

Identify key groups with PCA — build knockoffs only for the top PC in each group

Chen, Hou, Hou 2017

Build knockoffs only for prototypes selected from each group

Reid & Tibshirani 2015

Multilayer knockoffs to control FDR at the individual and group levels simultaneously

Katsevich & Sabatti 2017

slide-66
SLIDE 66

Learning from data is not trivial

’Wrapper’ around black-box algorithm rigorously addresses reproducibility issue How to make valid knockoffs (controls)? Importance of correct statistical reasoning Which level of significance is appropriate? Importance of mathematics (martingale theory) Sensitivity to modeling assumptions Importance of mathematics

slide-67
SLIDE 67

Beyond replicability: grand challenges in data-driven science

Reducing our irreproducibility Establishing causality

slide-68
SLIDE 68

Beyond replicability: grand challenges in data-driven science

Reducing our irreproducibility Establishing causality

In some cases, variables with the property p(response | variable, others) = p(response | others) are ‘causal’

slide-69
SLIDE 69

Beyond replicability: grand challenges in data-driven science

Reducing our irreproducibility Establishing causality Guaranteeing fairness and robustness of AI systems

In some cases, variables with the property p(response | variable, others) = p(response | others) are ‘causal’ If predictive algorithm uses causal variables, then it is likely to be fair

slide-70
SLIDE 70

This is not just about not being wrong (irreproducibility)

Robustness? Would want predictions to be valid in different samples collected in different circumstances “Constant conjunction” is a property of causal effects (Hume)

slide-71
SLIDE 71

Fairness: can computer programs be racist and sexist?

Guido Rosa/Getty Images/Ikon Images

Blind application of machine learning runs risk of amplifying biases and prejudices Identifying variables chance to scrutinize model built from one sample: Do we believe these variables are “structurally” important, or are they just reflecting a spurious association in this sample? Are we learning something about the world or reifying our prejudices?