computational aspects of selection of experiments
play

COMPUTATIONAL ASPECTS OF SELECTION OF EXPERIMENTS Yining Wang - PowerPoint PPT Presentation

Georgia Institute of Technology, Atlanta GA, USA COMPUTATIONAL ASPECTS OF SELECTION OF EXPERIMENTS Yining Wang Machine Learning Department, Carnegie Mellon University Arxiv:1711.05174 Joint work with Zeyuan Allen-Zhu, Yuanzhi Li and Aarti Singh


  1. Georgia Institute of Technology, Atlanta GA, USA COMPUTATIONAL ASPECTS OF SELECTION OF EXPERIMENTS Yining Wang Machine Learning Department, Carnegie Mellon University Arxiv:1711.05174 Joint work with Zeyuan Allen-Zhu, Yuanzhi Li and Aarti Singh

  2. MOTIVATING APPLICATION Worst-case structural analysis - Maximum stress resulting from worst-case external forces - Example application: lightweight structural design in automated fiber process

  3. MOTIVATING APPLICATION Worst-case structural analysis - Challenges : Finite Element Analysis (FEA) for every external force locations would be computationally too expensive Justification for single, normal, compressive load can be found in Ulu et al.’17, based on Rockafellar’s Theorem

  4. MOTIVATING APPLICATION Worst-case structural analysis - Idea : Sample a few “representative” force locations and build a predictive model for the rest locations - Challenge : How to determine the “best” representative locations ~4000 nodes 200 nodes

  5. PROBLEM FORMULATION max. stress response y i = h x i , θ 0 i + ε i modeling error Linear regression model: top e-vec of surface Laplacian unknown regression model Experiment selection: dimension p Force location 1 x 1 dimension p Force location 2 x 2 y 1 selected location 1 selected location 2 y 2 X S ∈ R k × p X ∈ R n × p selected location k y k ~4000 nodes 200 nodes Force location n x n

  6. PROBLEM FORMULATION y i = h x i , θ 0 i + ε i Linear regression model: θ = ( P i ) � 1 ( P b i 2 S x i x > Ordinary Least Squares: i 2 S y i x i ) → N (0 , ( P √ n ( b d i 2 S x i x > i ) � 1 ) θ − θ 0 ) - By CLT: (scaled) sample covariance, Fisher’s Information Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j “optimality criteria”

  7. PROBLEM FORMULATION y i = h x i , θ 0 i + ε i Predictive model: Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j “optimality criteria” MSE E k ˆ f A ( Σ ) = tr( Σ − 1 ) /p θ � θ 0 k 2 Example: A -optimality 2 f D ( Σ ) = det( Σ ) − 1 /p “scale invariant” D -optimality E -optimality f E ( Σ ) = 1 / k Σ − 1 k op V-optimality ….

  8. PROBLEM FORMULATION y i = h x i , θ 0 i + ε i Predictive model: Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j Objective: efficient approximation algorithms ⇣P ⌘ ⇣P ⌘ S x j x > j 2 S x j x > ≤ C ( n, p ) · min | S |  k f f j 2 b j j “approximation ratio”

  9. EXISTING RESULTS Existing positive results - O (1) approximation for D-optimality ( Nikolov & Singh, STOC’15) - O(n/k) approximation for A-optimality ( Avron & Boutsidis, SIMAX’13) Existing negative results - NP-Hard for exact optimization of D/E-optimality ( Summa et al., SODA ’15) - NP-Hard for (1+ 𝜁 ) approximation for D-optimality when k=p ( Cerny & Hladik, Comput. Optim. Appl.’12) Applicable to only one or two criteria f

  10. REGULAR CRITERIA Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j “Regular” criteria: (A1) Convexity : f (or its surrogate) is convex; (A2) Monotonicity : A � B = ) f ( A ) � f ( B ) f ( tA ) = t − 1 f ( A ) (A3) Reciprocal linearity : All popular optimality criteria are “regular”, e.g., A/D/E/V/G-optimality

  11. OUR RESULT Theorem. For all regular criteria f , there exists a polynomial time (1+ 𝜁 ) approximation algorithm provided that k = Ω ( p/ ε 2 ) #. of design subsets #. of variables / dimension - Remark 1: Concurrent to or after our works, 1+ 𝜁 approx. for D/A- optimality are obtained under condition k = Ω ( p/ ε + 1 / ε 2 ) ( Singh & Xie, SODA’18; Nikolov et al., arXiv’18) - Remark 2: The condition is tight for E-optimality and k = Ω ( p/ ε 2 ) continuous relaxation type methods. (Nikolov et al., arXiv’18)

  12. ALGORITHMIC FRAMEWORK Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions

  13. ALGORITHMIC FRAMEWORK Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions

  14. CONTINUOUS RELAXATION Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j Relaxation: 0 ≤ s i ≤ 1 - Equivalent formulation: n ! n X X s i x i x > s i ≤ k, s i ∈ { 0 , 1 } s 1 , ··· ,s n f s.t. min i i =1 i =1 - Convex! Can be solved using classical methods (e.g., projected gradient/ mirror descent )

  15. CONTINUOUS RELAXATION Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j Relaxation: 0 ≤ s i ≤ 1 - Equivalent formulation: n ! n X X s i x i x > s i ≤ k, s i ∈ { 0 , 1 } s 1 , ··· ,s n f s.t. min i i =1 i =1 - Question: Round { s i } to integer values

  16. ALGORITHMIC FRAMEWORK Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions

  17. WHITENING Rounding problem. Given optimal continuous solution , π s ∈ { 0 , 1 } n , P round it to such that b i b s i ≤ k f ( P i ) ≤ (1 + O ( ε )) · f ( P s i x i x > i π i x i x > i ) i b x i = W − 1 / 2 x i i π i x i x > - Whitening: where W = P e i - By monotonicity of f , the rounding problem is reduced to λ min ( P x > i ) ≥ 1 − O ( ε ) i b s i e x i e

  18. ALGORITHMIC FRAMEWORK Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions

  19. REGRET MINIMIZATION Matrix linear bandit/online learning: Action space ∆ p = { A ⌫ 0 , tr( A ) = 1 } - At each time t a player picks an action , observes a A t ∈ ∆ p reference and suffers loss h A t , F t i F t - Objective: minimize regret of the action sequences T T X X R ( A ) := h F t , A t i � inf h F t , ∆ i U ∈ ∆ p t =1 t =1 precisely λ min ( X F t )

  20. REGRET MINIMIZATION Matrix linear bandit/online learning: - At each time t a player picks an action , observes a A t ∈ ∆ p reference and suffers loss h A t , F t i F t - Objective: minimize regret of the action sequences R ( A ) - Follow-The-Regularized-Leader policy: ( t − 1 ) X A t = arg min w ( A ) + α · h F τ , A i A ∈ ∆ p “regularizer” τ =1 Example regularizers: ( ) t − 1 X A t = exp cI − α F τ w ( A ) = tr( A > (log A − I )) 1. MWU: τ =1 ! − 2 2. l 1/2 -regularization: w ( A ) = − 2tr( A 1 / 2 ) t − 1 X A t = cI − α F τ τ =1

  21. REGRET MINIMIZATION swapping of two design points F t = u t u > t − v t v > Regret lemma. Suppose . Then t � 2 p p k k u > v > t A t u t t A t v t X X h F t , U i � � inf t A 1 / 2 t A 1 / 2 1 + 2 α u > 1 � 2 α v > U 2 ∆ p u t v t α t =0 t =1 t t penalty parameter in FTRL FTRL solution at time t - Proved using classical analysis of regret of FTRL policies - F t : swapping of two design points from the pool.

  22. ALGORITHMIC FRAMEWORK Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions

  23. GREEDY SWAPPING F t = u t u > t − v t v > Regret lemma. Suppose . Then t � 2 p p k k u > v > t A t u t t A t v t X X h F t , U i � � inf t A 1 / 2 t A 1 / 2 1 + 2 α u > 1 � 2 α v > U 2 ∆ p u t v t α t =0 t =1 t t A “potential” function: u > Au v > Av ψ ( u, v ; A ) := 1 + 2 α u > A 1 / 2 u − 1 − 2 α v > A 1 / 2 v

  24. GREEDY SWAPPING F t = u t u > t − v t v > Regret lemma. Suppose . Then t � 2 p p k k u > v > t A t u t t A t v t X X h F t , U i � � inf t A 1 / 2 t A 1 / 2 1 + 2 α u > 1 � 2 α v > U 2 ∆ p u t v t α t =0 t =1 t t The “greedy swapping” algorithm: - Start with an arbitrary set of size k S 0 ⊆ [ n ] - At each t , find that maximize i t ∈ S t − 1 , j t / ∈ S t − 1 ψ ( x j t , x i t ; A t − 1 ) - Greedy swapping: S t ← S t − 1 ∪ { j t }\{ i t }

  25. GREEDY SWAPPING F t = u t u > t − v t v > Regret lemma. Suppose . Then t � 2 p p k k u > v > t A t u t t A t v t X X h F t , U i � � inf t A 1 / 2 t A 1 / 2 1 + 2 α u > 1 � 2 α v > U 2 ∆ p u t v t α t =0 t =1 t t Proof framework: k ≥ 5 p/ ε 2 , α = √ p/ ε - If then the “progress” of each swapping is lower bounded by until ε /k λ min ≥ 1 − O ( ε ) - Repeat the swapping for at most iterations until we’re done. O ( k/ ε )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend