Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, - - PowerPoint PPT Presentation
Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, - - PowerPoint PPT Presentation
Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, Stanford University 2018 Machine Learning Summer School, Buenos Aires, June 2018 Robustness Robustness 1.00 1.00 Exact Cov 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25
Robustness
Robustness
- Exact Cov
0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error Power
- 0.00
0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness
- Exact Cov
- Graph. Lasso
0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error Power
- 0.00
0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness
- Exact Cov
- Graph. Lasso
50% Emp. Cov
0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error Power
- 0.00
0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness
- Exact Cov
- Graph. Lasso
50% Emp. Cov 62.5% Emp. Cov
0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error Power
- 0.00
0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness
- Exact Cov
- Graph. Lasso
50% Emp. Cov 62.5% Emp. Cov 75% Emp. Cov
0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error Power
- 0.00
0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness
- Exact Cov
- Graph. Lasso
50% Emp. Cov 62.5% Emp. Cov 75% Emp. Cov 87.5% Emp. Cov
0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error Power
- 0.00
0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness
- Exact Cov
- Graph. Lasso
50% Emp. Cov 62.5% Emp. Cov 75% Emp. Cov 87.5% Emp. Cov 100% Emp. Cov
0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error Power
- 0.00
0.25 0.50 0.75 1.00 0.0 0.5 1.0
Relative Frobenius Norm Error FDR Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800, p = 1500, and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Simulations with synthetic Markov chain
Markov chain covariates with 5 hidden states. Binomial response
4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP
Figure: Power and FDP over 100 repetitions (true FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj
Robustness
Markov chain covariates with 5 hidden states. Binomial response
4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 4 5 6 7 8 9 10 12 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP
Figure: Power and FDP over 100 repetitions (estimated FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj
Simulations with synthetic HMM
HMM covariates with latent “clockwise” Markov chain. Binomial response
3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP
Figure: Power and FDP over 100 repetitions (true FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj
Robustness
HMM covariates with latent “clockwise” Markov chain. Binomial response
3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 Power 3 4 5 6 7 8 9 10 15 20 Signal amplitude 0.0 0.2 0.4 0.6 0.8 1.0 FDP
Figure: Power and FDP over 100 repetitions (estimated FX) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj
Out-of-sample parameter estimation
Inhomogeneous Markov chain covariates with 5 hidden states. Binomial response
10 25 50 75 100 500 1000 5000 10000 Number of unsupervised observations 0.0 0.2 0.4 0.6 0.8 1.0 Power 10 25 50 75 100 500 1000 5000 10000 Number of unsupervised observations 0.0 0.2 0.4 0.6 0.8 1.0 FDP
Figure: Power and FDP over 100 repetitions (estimated FX from independent dataset) n = 1000, p = 1000, target FDR: α = 0.1 Zj = |ˆ βj(ˆ λCV)|, Wj = Zj − ˜ Zj
Model-X knockoff variables (robust version)
i.i.d. samples from PXY
- Distr. PX of X only ‘approx’ known
- Distr. PY |X of Y | X completely unknown
Model-X knockoff variables (robust version)
i.i.d. samples from PXY
- Distr. PX of X only ‘approx’ known
- Distr. PY |X of Y | X completely unknown
Knockoffs wrt. to user input QX (Barber, C. and Samworth, ’18)
Originals X = (X1, . . . , Xp) Knockoffs ˜ X = ( ˜ X1, . . . , ˜ Xp)
Model-X knockoff variables (robust version)
i.i.d. samples from PXY
- Distr. PX of X only ‘approx’ known
- Distr. PY |X of Y | X completely unknown
Knockoffs wrt. to user input QX (Barber, C. and Samworth, ’18)
Originals X = (X1, . . . , Xp) Knockoffs ˜ X = ( ˜ X1, . . . , ˜ Xp) (1) Pairwise exchangeability wrt QX: If X ∼ QX (X, ˜ X)swap(S)
d
= (X, ˜ X) e.g. (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)swap({2,3})
d
= (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3)
Model-X knockoff variables (robust version)
i.i.d. samples from PXY
- Distr. PX of X only ‘approx’ known
- Distr. PY |X of Y | X completely unknown
Knockoffs wrt. to user input QX (Barber, C. and Samworth, ’18)
Originals X = (X1, . . . , Xp) Knockoffs ˜ X = ( ˜ X1, . . . , ˜ Xp) (1) Pairwise exchangeability wrt QX: If X ∼ QX (X, ˜ X)swap(S)
d
= (X, ˜ X) e.g. (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)swap({2,3})
d
= (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) (2) Ignore Y when constructing knockoffs: ˜ X ⊥ ⊥ Y | X
Model-X knockoff variables (robust version)
i.i.d. samples from PXY
- Distr. PX of X only ‘approx’ known
- Distr. PY |X of Y | X completely unknown
Knockoffs wrt. to user input QX (Barber, C. and Samworth, ’18)
Originals X = (X1, . . . , Xp) Knockoffs ˜ X = ( ˜ X1, . . . , ˜ Xp) (1) Pairwise exchangeability wrt QX: If X ∼ QX (X, ˜ X)swap(S)
d
= (X, ˜ X) e.g. (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)swap({2,3})
d
= (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3) (2) Ignore Y when constructing knockoffs: ˜ X ⊥ ⊥ Y | X Only require conditionals Q(Xj|X−j) which do not have to be compatible
FDR control
ˆ S = {Wj ≥ τ} τ = min
- t : 1+|{j : Wj ≤ −t}|
1 ∨ |{j : Wj ≥ t}|
- FDP(t)
≤ q
- t
FDR control
ˆ S = {Wj ≥ τ} τ = min
- t : 1+|{j : Wj ≤ −t}|
1 ∨ |{j : Wj ≥ t}|
- FDP(t)
≤ q
- t
Theorem (Barber and C. (’15))
If user-input QX is correct (QX = PX), then for knockoff+ E # false positives # selections
- ≤ q
Robustness of knockoffs?
Does exchangeability hold approx. when QX = PX?
+ +
__ __
+ + +
__
+ +
__
+ +
__
|W|
If PX = QX, coins are unbiased independent Problem: if PX = QX, coins may be (slighltly) biased (slightly) dependent
KL divergence condition
The KL condition
- KLj :=
- i
log
- Pj(Xij|Xi,−j) Qj(
Xij|Xi,−j) Qj(Xij|Xi,−j) Pj( Xij|Xi,−j)
- ≤ ǫ
E[ KLj] = KL divergence between distributions of (Xj, Xj, X−j, X−j) & ( Xj, Xj, X−j, X−j)
From KL condition to FDR control
Theorem (Barber, C. and Samworth (2018))
For any ǫ ≥ 0 E
- # false positives j with
KLj ≤ ǫ # selections
- ≤ q exp(ǫ)
From KL condition to FDR control
Theorem (Barber, C. and Samworth (2018))
For any ǫ ≥ 0 E
- # false positives j with
KLj ≤ ǫ # selections
- ≤ q exp(ǫ)
Corollary
FDR ≤ min
ǫ≥0
- q exp(ǫ) + P
- max
null j
- KLj > ǫ
- Information theoretically optimal
New directions
ML inspired knockoffs
Joint with S. Bates, Y. Romano, M. Sesia and J. Zhou Knockoffs for graphical models Knockoffs via restricted Boltzmann machines Knockoffs via variational auto-encoders? Knockoffs via generative adversarial networks?
Improving power?
Joint with Z. Ren and M. Sesia
Derandomization
Combine information from mutiple knockoffs: who’s consistently showing up?
9
…
2 7 3 4 1 5 6 8
…
9 2 4 3 7 1 5 6 8
…
9 2 7 3 4 5 6 8
…
|W| 9 2 7 3 4 1 5 6 8
…
1
Figure: Cartoon representation of W’s from different sample realizations of knockoffs
Knockoffs for Fixed Features
Joint with Barber
Linear model
y =
- j βjXj
- Xβ
+ z n × 1 n × p p × 1 n × 1 y ∼ N(Xβ, σ2I) Fixed design X Noise level σ unknown Multiple testing: Hj : βj = 0 (is jth variable in the model?) Identifiability = ⇒ p ≤ n Inference (FDR control) will hold conditionally on X
Knockoff features (fixed X)
Originals Knockofgs
Knockoff features (fixed X)
Originals Knockofgs
˜ X′
j ˜
Xk = X′
jXk
for all j, k ˜ X′
jXk = X′ jXk
for all j = k
Knockoff features (fixed X)
Originals Knockofgs
˜ X′
j ˜
Xk = X′
jXk
for all j, k ˜ X′
jXk = X′ jXk
for all j = k No need for new data or experiment No knowledge of response y
Knockoff construction (n ≥ 2p)
Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.
- X
˜ X ′ X ˜ X
- =
- Σ
Σ − diag{s} Σ − diag{s} Σ
- := G
Knockoff construction (n ≥ 2p)
Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.
- X
˜ X ′ X ˜ X
- =
- Σ
Σ − diag{s} Σ − diag{s} Σ
- := G 0
Knockoff construction (n ≥ 2p)
Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.
- X
˜ X ′ X ˜ X
- =
- Σ
Σ − diag{s} Σ − diag{s} Σ
- := G 0
G 0 ⇐ ⇒ diag{s} 0 2Σ − diag{s} 0
Knockoff construction (n ≥ 2p)
Problem: given X ∈ Rn×p, find ˜ X ∈ Rn×p s.t.
- X
˜ X ′ X ˜ X
- =
- Σ
Σ − diag{s} Σ − diag{s} Σ
- := G 0
G 0 ⇐ ⇒ diag{s} 0 2Σ − diag{s} 0
Solution
˜ X = X(I − Σ−1 diag{s}) + ˜ UC ˜ U ∈ Rn×p with col. space orthogonal to that of X C′C Cholevsky factorization of 2 diag{s} − diag{s}Σ−1 diag{s} 0
Knockoff construction (n ≥ 2p)
˜ X′
jXj = 1 − sj
(Standardized columns) Equi-correlated knockoffs sj = 2λmin(Σ) ∧ 1 Under equivariance, minimizes the value of |Xj, ˜ Xj|
Knockoff construction (n ≥ 2p)
˜ X′
jXj = 1 − sj
(Standardized columns) Equi-correlated knockoffs sj = 2λmin(Σ) ∧ 1 Under equivariance, minimizes the value of |Xj, ˜ Xj| SDP knockoffs minimize
- j |1 − sj|
subject to sj ≥ 0 diag{s} 2Σ Highly structured semidefinite program (SDP)
Knockoff construction (n ≥ 2p)
˜ X′
jXj = 1 − sj
(Standardized columns) Equi-correlated knockoffs sj = 2λmin(Σ) ∧ 1 Under equivariance, minimizes the value of |Xj, ˜ Xj| SDP knockoffs minimize
- j |1 − sj|
subject to sj ≥ 0 diag{s} 2Σ Highly structured semidefinite program (SDP) Other possibilities ...
Why?
For null feature Xj X′
jy = X′ jXβ + X′ jz d
= ˜ X′
jXβ + ˜
X′
jz = ˜
X′
jy
Why?
For null feature Xj X′
jy = X′ jXβ + X′ jz d
= ˜ X′
jXβ + ˜
X′
jz = ˜
X′
jy
Why?
For any subset of nulls T [X ˜ X]′
swap(T ) y d
= [X ˜ X]′ y [X ˜ X]′
swap(T ) =
Exchangeability of feature importance statistics
Sufficiency: (Z, ˜ Z) = z
- X
˜ X ′ X ˜ X
- ,
- X
˜ X ′ y
- Knockoff-agnostic: swapping originals and knockoffs =
⇒ swaps Z’s z(
- X
˜ X
- swap(T ), y) = (Z, ˜
Z)swap(T )
Exchangeability of feature importance statistics
Sufficiency: (Z, ˜ Z) = z
- X
˜ X ′ X ˜ X
- ,
- X
˜ X ′ y
- Knockoff-agnostic: swapping originals and knockoffs =
⇒ swaps Z’s z(
- X
˜ X
- swap(T ), y) = (Z, ˜
Z)swap(T )
Theorem (Barber and C. (15))
For any subset T of nulls (Z, Z)swap(T )
d
= (Z, ˜ Z) = ⇒ FDR control (conditional on X)
Z1 Zp Z2 ˜ Zp ˜ Z2 ˜ Z1
Telling the effect direction
[...] in classical statistics, the significance of comparisons (e. g., θ1 − θ2) is calibrated using Type I error rate, relying on the assumption that the true difference is zero, which makes no sense in many applications. [...] a more relevant framework in which a true comparison can be positive or negative, and, based on the data, you can state “θ1 > θ2 with confidence”, “θ2 > θ1 with confidence”, or “no claim with confidence”.
- A. Gelman & F. Tuerlinckx
Directional FDR
Are any effects exactly zero? FDRdir = E # selections with wrong effect direction # selections
- ↑
- Directional false discovery rate
Directional false discovery proportion
Directional FDR (Benjamini & Yekutieli, ’05) Sign errors (Type-S) (Gelman & Tuerlinckx, ’00) Important for misspecified models — exact sparsity unlikely
Directional FDR control
(Xj − ˜ Xj)′y
ind
∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate
- sgn((Xj − ˜
Xj)′y)
Directional FDR control
(Xj − ˜ Xj)′y
ind
∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate
- sgn((Xj − ˜
Xj)′y)
Theorem (Barber and C., ’16)
Exact same knockoff selection + sign estimate FDR ≤ FDRdir ≤ q
Directional FDR control
(Xj − ˜ Xj)′y
ind
∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate
- sgn((Xj − ˜
Xj)′y)
Theorem (Barber and C., ’16)
Exact same knockoff selection + sign estimate FDR ≤ FDRdir ≤ q + +
__ __
+ + +
__
+ +
__
null non null + +
__
|W| Null coin fips are unbiased
Directional FDR control
(Xj − ˜ Xj)′y
ind
∼ N(sj · βj, 2σ2 · sj) sj ≥ 0 Sign estimate
- sgn((Xj − ˜
Xj)′y)
Theorem (Barber and C. (16))
Exact same knockoff selection + sign estimate FDR ≤ FDRdir ≤ q + +
__ __
+ + +
__
+ +
__
+ +
__
|W| Great subtlety: coin fips are now biased
Empirical results
Features N(0, In), n = 3000, p = 1000 k = 30 variables with regression coefficients of magnitude 3.5
Method FDR (%) Power (%)
- Theor. FDR
(nominal level q = 20%) control? Knockoff+ (equivariant) 14.40 60.99 Yes Knockoff (equivariant) 17.82 66.73 No Knockoff+ (SDP) 15.05 61.54 Yes Knockoff (SDP) 18.72 67.50 No BHq 18.70 48.88 No BHq + log-factor correction 2.20 19.09 Yes BHq with whitened noise 18.79 2.33 Yes
Effect of signal amplitude
Same setup with k = 30 (q = 0.2)
2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 5 10 15 20 25 Amplitude A FDR (%)
- Nominal level
Knockoff Knockoff+ BHq
- 2.8
3.0 3.2 3.4 3.6 3.8 4.0 4.2 20 40 60 80 100 Amplitude A Power (%)
- Knockoff
Knockoff+ BHq
Effect of feature correlation
Features ∼ N(0, Θ) Θjk = ρ|j−k| n = 3000, p = 1000, and k = 30 and amplitude = 3.5
0.0 0.2 0.4 0.6 0.8 5 10 15 20 25 30 Feature correlation ρ FDR (%)
- Nominal level
Knockoff Knockoff+ BHq
- 0.0
0.2 0.4 0.6 0.8 20 40 60 80 100 Feature correlation ρ Power (%)
- Knockoff
Knockoff+ BHq
Fixed Design Knockoff Data Analysis
HIV drug resistance
Drug type # drugs Sample size # protease or RT # mutations appearing positions genotyped ≥ 3 times in sample PI 6 848 99 209 NRTI 6 639 240 294 NNRTI 3 747 240 319 response y: log-fold-increase of lab-tested drug resistance covariate Xj: presence or absence of mutation #j Data from R. Shafer (Stanford) available at: http://hivdb.stanford.edu/pages/published_analysis/genophenoPNAS2006/
HIV data
TSM list: mutations associated with the PI class of drugs in general, and is not specialized to the individual drugs in the class Results for PI type drugs
Knockoff BHq Data set size: n=768, p=201 # HIV−1 protease positions selected 5 10 15 20 25 30 35
Resistance to APV
Appear in TSM list Not in TSM list Knockoff BHq Data set size: n=329, p=147 5 10 15 20 25 30 35
Resistance to ATV
Knockoff BHq Data set size: n=826, p=208 5 10 15 20 25 30 35
Resistance to IDV
Knockoff BHq Data set size: n=516, p=184 5 10 15 20 25 30 35
Resistance to LPV
Knockoff BHq Data set size: n=843, p=209 5 10 15 20 25 30 35
Resistance to NFV
Knockoff BHq Data set size: n=825, p=208 5 10 15 20 25 30 35
Resistance to SQV
HIV data
Results for NRTI type drugs Results for NNRTI type drugs
Knockoff BHq Data set size: n=633, p=292 # HIV−1 RT positions selected 5 10 15 20 25 30
Resistance to X3TC
Appear in TSM list Not in TSM list Knockoff BHq Data set size: n=628, p=294 5 10 15 20 25 30
Resistance to ABC
Knockoff BHq Data set size: n=630, p=292 5 10 15 20 25 30
Resistance to AZT
Knockoff BHq Data set size: n=630, p=293 5 10 15 20 25 30
Resistance to D4T
Knockoff BHq Data set size: n=632, p=292 5 10 15 20 25 30
Resistance to DDI
Knockoff BHq Data set size: n=353, p=218 5 10 15 20 25 30
Resistance to TDF
Knockoff BHq Data set size: n=732, p=311 # HIV−1 RT positions selected 5 10 15 20 25 30 35
Resistance to DLV
Appear in TSM list Not in TSM list Knockoff BHq Data set size: n=734, p=318 5 10 15 20 25 30 35
Resistance to EFV
Knockoff BHq Data set size: n=746, p=319 5 10 15 20 25 30 35
Resistance to NVP
High-dimensional setting
n ≈ 5, 000 subjects p ≈ 330, 000 SNPs/vars to test
20 15 10 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 –log10 (P value) HDL cholesterol GALNT2 LPL ABCA1 MVK/MMAB LIPC LCAT LIPG CETP
p > n − → cannot construct knockoffs as before ˜ X′
j ˜
Xk = X′
jXk
∀ j, k ˜ X′
jXk = X′ jXk
∀ j = k = ⇒ ˜ Xj = Xj ∀j
High dimensional knockoffs: screen and confirm
- riginal data set
High dimensional knockoffs: screen and confirm
- riginal data set
exploratory
X(0) y(0)
screen on sample 1
High dimensional knockoffs: screen and confirm
- riginal data set
exploratory
X(0) y(0)
screen on sample 1
ry confirmatory
y(1) X(1) inference on sample 2
High dimensional knockoffs: screen and confirm
- riginal data set
exploratory
X(0) y(0)
screen on sample 1
ry confirmatory
y(1) X(1) inference on sample 2 Theory (Barber and C., ’16) Safe data re-use to improve power (Barber and C., ’16)
Some extensions
y =
- X1
- n×p1
·β1 +
- X2
- n×p2
·β2 + · · · + N(0, σ2In) Group sparsity — build knockoffs at the group-wise level
Dai & Barber 2015
Identify key groups with PCA — build knockoffs only for the top PC in each group
Chen, Hou, Hou 2017
Build knockoffs only for prototypes selected from each group
Reid & Tibshirani 2015
Multilayer knockoffs to control FDR at the individual and group levels simultaneously
Katsevich & Sabatti 2017
Learning from data is not trivial
’Wrapper’ around black-box algorithm rigorously addresses reproducibility issue How to make valid knockoffs (controls)? Importance of correct statistical reasoning Which level of significance is appropriate? Importance of mathematics (martingale theory) Sensitivity to modeling assumptions Importance of mathematics
Beyond replicability: grand challenges in data-driven science
Reducing our irreproducibility Establishing causality
Beyond replicability: grand challenges in data-driven science
Reducing our irreproducibility Establishing causality
In some cases, variables with the property p(response | variable, others) = p(response | others) are ‘causal’
Beyond replicability: grand challenges in data-driven science
Reducing our irreproducibility Establishing causality Guaranteeing fairness and robustness of AI systems
In some cases, variables with the property p(response | variable, others) = p(response | others) are ‘causal’ If predictive algorithm uses causal variables, then it is likely to be fair
This is not just about not being wrong (irreproducibility)
Robustness? Would want predictions to be valid in different samples collected in different circumstances “Constant conjunction” is a property of causal effects (Hume)
Fairness: can computer programs be racist and sexist?
Guido Rosa/Getty Images/Ikon Images