COMPUTATIONAL ASPECTS OF SELECTION OF EXPERIMENTS
Yining Wang
Machine Learning Department, Carnegie Mellon University Georgia Institute of Technology, Atlanta GA, USA
Arxiv:1711.05174
Joint work with Zeyuan Allen-Zhu, Yuanzhi Li and Aarti Singh
COMPUTATIONAL ASPECTS OF SELECTION OF EXPERIMENTS Yining Wang - - PowerPoint PPT Presentation
Georgia Institute of Technology, Atlanta GA, USA COMPUTATIONAL ASPECTS OF SELECTION OF EXPERIMENTS Yining Wang Machine Learning Department, Carnegie Mellon University Arxiv:1711.05174 Joint work with Zeyuan Allen-Zhu, Yuanzhi Li and Aarti Singh
Machine Learning Department, Carnegie Mellon University Georgia Institute of Technology, Atlanta GA, USA
Arxiv:1711.05174
Joint work with Zeyuan Allen-Zhu, Yuanzhi Li and Aarti Singh
process
locations would be computationally too expensive
Justification for single, normal, compressive load can be found in Ulu et al.’17, based on Rockafellar’s Theorem
model for the rest locations
~4000 nodes 200 nodes
top e-vec of surface Laplacian unknown regression model modeling error
dimension p Force location 1 Force location 2 Force location n
dimension p selected location 1 selected location 2 selected location k
~4000 nodes 200 nodes y1 y2 yk
i2S xix> i )1(P i2S yixi)
d
i2S xix> i )1)
(scaled) sample covariance, Fisher’s Information
j2S xjx> j
“optimality criteria”
j2S xjx> j
Example: A-optimality D-optimality E-optimality V-optimality ….
fA(Σ) = tr(Σ−1)/p
fD(Σ) = det(Σ)−1/p fE(Σ) = 1/kΣ−1kop
MSE Ekˆ
θ θ0k2
2
“scale invariant” “optimality criteria”
j2S xjx> j
j2b S xjx> j
j2S xjx> j
“approximation ratio”
(Nikolov & Singh, STOC’15) (Avron & Boutsidis, SIMAX’13) (Summa et al., SODA’15) (Cerny & Hladik, Comput. Optim. Appl.’12)
Applicable to only one or two criteria f
j2S xjx> j
“Regular” criteria:
(A1) Convexity: f (or its surrogate) is convex; (A2) Monotonicity: (A3) Reciprocal linearity:
A B = ) f(A) f(B)
f(tA) = t−1f(A)
All popular optimality criteria are “regular”, e.g., A/D/E/V/G-optimality
continuous relaxation type methods.
k = Ω(p/ε2) k = Ω(p/ε + 1/ε2)
(Singh & Xie, SODA’18; Nikolov et al., arXiv’18) (Nikolov et al., arXiv’18) #. of design subsets #. of variables / dimension
mirror descent)
j2S xjx> j
min
s1,··· ,sn f
n X
i=1
sixix>
i
! s.t.
n
X
i=1
si ≤ k, si ∈ {0, 1}
Relaxation: 0 ≤ si ≤ 1
j2S xjx> j
min
s1,··· ,sn f
n X
i=1
sixix>
i
! s.t.
n
X
i=1
si ≤ k, si ∈ {0, 1}
Relaxation: 0 ≤ si ≤ 1
i b
i b
i ) ≤ (1 + O(ε)) · f(P i πixix> i )
i πixix> i
i b
i ) ≥ 1 − O(ε)
reference and suffers loss
At ∈ ∆p hAt, Fti
Action space ∆p = {A ⌫ 0, tr(A) = 1}
R(A) :=
T
X
t=1
hFt, Ati inf
U∈∆p T
X
t=1
hFt, ∆i
X Ft)
Ft
reference and suffers loss
At ∈ ∆p hAt, Fti
Ft
At = arg min
A∈∆p
( w(A) + α ·
t−1
X
τ=1
hFτ, Ai )
“regularizer”
Example regularizers:
w(A) = tr(A>(log A − I))
At = exp ( cI − α
t−1
X
τ=1
Fτ )
w(A) = −2tr(A1/2)
At = cI − α
t−1
X
τ=1
Fτ !−2
t − vtv> t
penalty parameter in FTRL FTRL solution at time t
inf
U2∆p k
X
t=0
hFt, Ui
k
X
t=1
u>
t Atut
1 + 2αu>
t A1/2 t
ut
t Atvt
1 2αv>
t A1/2 t
vt 2pp α
swapping of two design points
t − vtv> t
inf
U2∆p k
X
t=0
hFt, Ui
k
X
t=1
u>
t Atut
1 + 2αu>
t A1/2 t
ut
t Atvt
1 2αv>
t A1/2 t
vt 2pp α
t − vtv> t
inf
U2∆p k
X
t=0
hFt, Ui
k
X
t=1
u>
t Atut
1 + 2αu>
t A1/2 t
ut
t Atvt
1 2αv>
t A1/2 t
vt 2pp α
S0 ⊆ [n]
ψ(xjt, xit; At−1)
St ← St−1 ∪ {jt}\{it}
bounded by until
t − vtv> t
inf
U2∆p k
X
t=0
hFt, Ui
k
X
t=1
u>
t Atut
1 + 2αu>
t A1/2 t
ut
t Atvt
1 2αv>
t A1/2 t
vt 2pp α
k ≥ 5p/ε2, α = √p/ε
i sixix> i ) s.t. si ∈ {0, 1}, P i si ≤ k
for the rest locations
~4000 nodes 200 nodes
for the rest locations
top e-vec of surface Laplacian unknown regression model modeling error
Our algorithm
“Sensitive” regions (e.g., arms, wingtips) more sampled “Easy” regions less sampled Equidistant (naive) Our solution
“Sensitive” regions (e.g., arms, wingtips) more sampled “Easy” regions less sampled
s1,··· ,sn max ξ1,··· ,ξn f
i=1
i si k, kξik2 δ
Adversarial perturbations
S
D∈D
2
S: selected design subset β0: best linear predictor w.r.t. D βS: OLS on XS Variance: f(P
i2S xix> i )
Bias: dependent on D