COMPUTATIONAL ASPECTS OF SELECTION OF EXPERIMENTS Yining Wang - - PowerPoint PPT Presentation

computational aspects of selection of experiments
SMART_READER_LITE
LIVE PREVIEW

COMPUTATIONAL ASPECTS OF SELECTION OF EXPERIMENTS Yining Wang - - PowerPoint PPT Presentation

Georgia Institute of Technology, Atlanta GA, USA COMPUTATIONAL ASPECTS OF SELECTION OF EXPERIMENTS Yining Wang Machine Learning Department, Carnegie Mellon University Arxiv:1711.05174 Joint work with Zeyuan Allen-Zhu, Yuanzhi Li and Aarti Singh


slide-1
SLIDE 1

COMPUTATIONAL ASPECTS OF SELECTION OF EXPERIMENTS

Yining Wang

Machine Learning Department, Carnegie Mellon University Georgia Institute of Technology, Atlanta GA, USA

Arxiv:1711.05174

Joint work with Zeyuan Allen-Zhu, Yuanzhi Li and Aarti Singh

slide-2
SLIDE 2

MOTIVATING APPLICATION

Worst-case structural analysis

  • Maximum stress resulting from worst-case external forces
  • Example application: lightweight structural design in automated fiber

process

slide-3
SLIDE 3

MOTIVATING APPLICATION

Worst-case structural analysis

  • Challenges: Finite Element Analysis (FEA) for every external force

locations would be computationally too expensive

Justification for single, normal, compressive load can be found in Ulu et al.’17, based on Rockafellar’s Theorem

slide-4
SLIDE 4

MOTIVATING APPLICATION

Worst-case structural analysis

  • Idea: Sample a few “representative” force locations and build a predictive

model for the rest locations

  • Challenge: How to determine the “best” representative locations

~4000 nodes 200 nodes

slide-5
SLIDE 5

PROBLEM FORMULATION

Linear regression model: Experiment selection:

yi = hxi, θ0i + εi

  • max. stress response

top e-vec of surface Laplacian unknown regression model modeling error

x1 x2

dimension p Force location 1 Force location 2 Force location n

xn

X ∈ Rn×p

dimension p selected location 1 selected location 2 selected location k

XS ∈ Rk×p

~4000 nodes 200 nodes y1 y2 yk

slide-6
SLIDE 6

PROBLEM FORMULATION

Linear regression model: Ordinary Least Squares:

  • By CLT:

yi = hxi, θ0i + εi

b θ = (P

i2S xix> i )1(P i2S yixi)

√n(b θ − θ0)

d

→ N(0, (P

i2S xix> i )1)

(scaled) sample covariance, Fisher’s Information

Optimal experimental design Find subset , so as to minimize S ⊆ [n] |S| ≤ k

f ⇣P

j2S xjx> j

“optimality criteria”

slide-7
SLIDE 7

PROBLEM FORMULATION

Predictive model:

yi = hxi, θ0i + εi

Optimal experimental design Find subset , so as to minimize S ⊆ [n] |S| ≤ k

f ⇣P

j2S xjx> j

Example: A-optimality D-optimality E-optimality V-optimality ….

fA(Σ) = tr(Σ−1)/p

fD(Σ) = det(Σ)−1/p fE(Σ) = 1/kΣ−1kop

MSE Ekˆ

θ θ0k2

2

“scale invariant” “optimality criteria”

slide-8
SLIDE 8

PROBLEM FORMULATION

Predictive model:

yi = hxi, θ0i + εi

Optimal experimental design Find subset , so as to minimize S ⊆ [n] |S| ≤ k

f ⇣P

j2S xjx> j

⌘ Objective: efficient approximation algorithms f ⇣P

j2b S xjx> j

⌘ ≤ C(n, p) · min|S|k f ⇣P

j2S xjx> j

“approximation ratio”

slide-9
SLIDE 9

EXISTING RESULTS

Existing positive results

  • O(1) approximation for D-optimality
  • O(n/k) approximation for A-optimality

Existing negative results

  • NP-Hard for exact optimization of D/E-optimality
  • NP-Hard for (1+𝜁) approximation for D-optimality when k=p

(Nikolov & Singh, STOC’15) (Avron & Boutsidis, SIMAX’13) (Summa et al., SODA’15) (Cerny & Hladik, Comput. Optim. Appl.’12)

Applicable to only one or two criteria f

slide-10
SLIDE 10

REGULAR CRITERIA

Optimal experimental design Find subset , so as to minimize S ⊆ [n] |S| ≤ k

f ⇣P

j2S xjx> j

“Regular” criteria:

(A1) Convexity: f (or its surrogate) is convex; (A2) Monotonicity: (A3) Reciprocal linearity:

A B = ) f(A) f(B)

f(tA) = t−1f(A)

All popular optimality criteria are “regular”, e.g., A/D/E/V/G-optimality

slide-11
SLIDE 11

OUR RESULT

  • Remark 1: Concurrent to or after our works, 1+𝜁 approx. for D/A-
  • ptimality are obtained under condition
  • Remark 2: The condition is tight for E-optimality and

continuous relaxation type methods.

  • Theorem. For all regular criteria f, there exists a polynomial

time (1+𝜁) approximation algorithm provided that

k = Ω(p/ε2)

k = Ω(p/ε2) k = Ω(p/ε + 1/ε2)

(Singh & Xie, SODA’18; Nikolov et al., arXiv’18) (Nikolov et al., arXiv’18) #. of design subsets #. of variables / dimension

slide-12
SLIDE 12

ALGORITHMIC FRAMEWORK

Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions

slide-13
SLIDE 13

ALGORITHMIC FRAMEWORK

Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions

slide-14
SLIDE 14

CONTINUOUS RELAXATION

  • Equivalent formulation:
  • Convex! Can be solved using classical methods (e.g., projected gradient/

mirror descent)

Optimal experimental design Find subset , so as to minimize S ⊆ [n] |S| ≤ k

f ⇣P

j2S xjx> j

min

s1,··· ,sn f

n X

i=1

sixix>

i

! s.t.

n

X

i=1

si ≤ k, si ∈ {0, 1}

Relaxation: 0 ≤ si ≤ 1

slide-15
SLIDE 15

CONTINUOUS RELAXATION

  • Equivalent formulation:
  • Question: Round {si} to integer values

Optimal experimental design Find subset , so as to minimize S ⊆ [n] |S| ≤ k

f ⇣P

j2S xjx> j

min

s1,··· ,sn f

n X

i=1

sixix>

i

! s.t.

n

X

i=1

si ≤ k, si ∈ {0, 1}

Relaxation: 0 ≤ si ≤ 1

slide-16
SLIDE 16

ALGORITHMIC FRAMEWORK

Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions

slide-17
SLIDE 17

WHITENING

  • Whitening: where
  • By monotonicity of f, the rounding problem is reduced to

Rounding problem. Given optimal continuous solution , round it to such that

π

b s ∈ {0, 1}n, P

i b

si ≤ k

f(P

i b

sixix>

i ) ≤ (1 + O(ε)) · f(P i πixix> i )

e xi = W −1/2xi W = P

i πixix> i

λmin(P

i b

sie xie x>

i ) ≥ 1 − O(ε)

slide-18
SLIDE 18

ALGORITHMIC FRAMEWORK

Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions

slide-19
SLIDE 19

REGRET MINIMIZATION

Matrix linear bandit/online learning:

  • At each time t a player picks an action , observes a

reference and suffers loss

  • Objective: minimize regret of the action sequences

At ∈ ∆p hAt, Fti

Action space ∆p = {A ⌫ 0, tr(A) = 1}

R(A) :=

T

X

t=1

hFt, Ati inf

U∈∆p T

X

t=1

hFt, ∆i

precisely λmin(

X Ft)

Ft

slide-20
SLIDE 20

REGRET MINIMIZATION

Matrix linear bandit/online learning:

  • At each time t a player picks an action , observes a

reference and suffers loss

  • Objective: minimize regret of the action sequences R(A)
  • Follow-The-Regularized-Leader policy:

At ∈ ∆p hAt, Fti

Ft

At = arg min

A∈∆p

( w(A) + α ·

t−1

X

τ=1

hFτ, Ai )

“regularizer”

Example regularizers:

  • 1. MWU:
  • 2. l1/2-regularization:

w(A) = tr(A>(log A − I))

At = exp ( cI − α

t−1

X

τ=1

Fτ )

w(A) = −2tr(A1/2)

At = cI − α

t−1

X

τ=1

Fτ !−2

slide-21
SLIDE 21

REGRET MINIMIZATION

  • Proved using classical analysis of regret of FTRL policies
  • Ft: swapping of two design points from the pool.

Regret lemma. Suppose . Then

Ft = utu>

t − vtv> t

penalty parameter in FTRL FTRL solution at time t

inf

U2∆p k

X

t=0

hFt, Ui

k

X

t=1

u>

t Atut

1 + 2αu>

t A1/2 t

ut

  • v>

t Atvt

1 2αv>

t A1/2 t

vt 2pp α

swapping of two design points

slide-22
SLIDE 22

ALGORITHMIC FRAMEWORK

Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions

slide-23
SLIDE 23

GREEDY SWAPPING

A “potential” function:

Regret lemma. Suppose . Then

Ft = utu>

t − vtv> t

inf

U2∆p k

X

t=0

hFt, Ui

k

X

t=1

u>

t Atut

1 + 2αu>

t A1/2 t

ut

  • v>

t Atvt

1 2αv>

t A1/2 t

vt 2pp α

ψ(u, v; A) := u>Au 1 + 2αu>A1/2u − v>Av 1 − 2αv>A1/2v

slide-24
SLIDE 24

GREEDY SWAPPING

The “greedy swapping” algorithm:

  • Start with an arbitrary set of size k
  • At each t, find that maximize
  • Greedy swapping:

Regret lemma. Suppose . Then

Ft = utu>

t − vtv> t

inf

U2∆p k

X

t=0

hFt, Ui

k

X

t=1

u>

t Atut

1 + 2αu>

t A1/2 t

ut

  • v>

t Atvt

1 2αv>

t A1/2 t

vt 2pp α

S0 ⊆ [n]

it ∈ St−1, jt / ∈ St−1

ψ(xjt, xit; At−1)

St ← St−1 ∪ {jt}\{it}

slide-25
SLIDE 25

GREEDY SWAPPING

Proof framework:

  • If then the “progress” of each swapping is lower

bounded by until

  • Repeat the swapping for at most iterations until we’re done.

Regret lemma. Suppose . Then

Ft = utu>

t − vtv> t

inf

U2∆p k

X

t=0

hFt, Ui

k

X

t=1

u>

t Atut

1 + 2αu>

t A1/2 t

ut

  • v>

t Atvt

1 2αv>

t A1/2 t

vt 2pp α

k ≥ 5p/ε2, α = √p/ε

ε/k

λmin ≥ 1 − O(ε)

O(k/ε)

slide-26
SLIDE 26

SUMMARY

Summary of our result (re-cap):

  • Objective: discrete optimization
  • Regularity: f is “regular” if it is convex, monotonic and reciprocal linear
  • Method: continuous relaxation + greedy swapping

mins f(P

i sixix> i ) s.t. si ∈ {0, 1}, P i si ≤ k

  • Theorem. For all regular criteria f, there exists a polynomial

time (1+𝜁) approximation algorithm provided that

k = Ω(p/ε2)

slide-27
SLIDE 27

APPLICATION

Worst-case structural analysis

  • Sample a few “representative” force locations and build a predictive model

for the rest locations

~4000 nodes 200 nodes

slide-28
SLIDE 28

APPLICATION

Worst-case structural analysis

  • Sample a few “representative” force locations and build a predictive model

for the rest locations

  • Predictive model: Laplacian (linear) smoothing
  • Typical problem parameter range:

yi = hxi, θ0i + εi

  • max. stress response

top e-vec of surface Laplacian unknown regression model modeling error

n = 4000 ∼ 6000, p = 10 ∼ 15, k = 25 ∼ 300

slide-29
SLIDE 29

ALGORITHMIC FRAMEWORK

Input: a structure with fixed boundary conditions (blue) and contact regions (red).

slide-30
SLIDE 30

RESULTS

Results for the “Fertility” model

Our algorithm

nF = 3914

slide-31
SLIDE 31

RESULTS

Results for the “RockingChair” model

nF = 5348

slide-32
SLIDE 32

RESULTS

Results for the “Shark” model

nF = 4281

slide-33
SLIDE 33

SAMPLING ALGORITHM

Results: comparison with equi-distance sampling

  • K=100 sampling points

“Sensitive” regions (e.g., arms, wingtips) more sampled “Easy” regions less sampled Equidistant (naive) Our solution

slide-34
SLIDE 34

SAMPLING ALGORITHM

Results: comparison with equidistance sampling

  • K=200 sampling points

“Sensitive” regions (e.g., arms, wingtips) more sampled “Easy” regions less sampled

slide-35
SLIDE 35

EXTENSIONS

“Robust” experimental design

  • Design points are subject to adversarial “perturbations”
  • Example discrete optimization problem:

min

s1,··· ,sn max ξ1,··· ,ξn f

n X

i=1

(xi + ξi)(xi + ξi)> ! s.t. si 2 {0, 1}, P

i si  k, kξik2  δ

Adversarial perturbations

slide-36
SLIDE 36

EXTENSIONS

“Random design” linear regression

  • Random designs , but
  • Worst-case optimal designs:

(xi, yi) ∼ D

E[yi|xi] 6= hxi, β0i min

S

sup

D∈D

ED,S h kb βS β0k2

2

i

S: selected design subset β0: best linear predictor w.r.t. D βS: OLS on XS Variance: f(P

i2S xix> i )

Bias: dependent on D

slide-37
SLIDE 37

Thank you!