zeroth order non convex smooth optimization local minimax
play

ZEROTH-ORDER NON-CONVEX SMOOTH OPTIMIZATION: LOCAL MINIMAX RATES - PowerPoint PPT Presentation

ZEROTH-ORDER NON-CONVEX SMOOTH OPTIMIZATION: LOCAL MINIMAX RATES Yining Wang, CMU joint work with Sivaraman Balakrishnan and Aarti Singh BACKGROUND Optimization: min x X f ( x ) Classical setting (first-order): f is known (e.g.,


  1. ZEROTH-ORDER NON-CONVEX SMOOTH OPTIMIZATION: LOCAL MINIMAX RATES Yining Wang, CMU joint work with Sivaraman Balakrishnan and Aarti Singh

  2. BACKGROUND ➤ Optimization: min x ∈ X f ( x ) ➤ Classical setting (first-order): ✴ f is known (e.g., a likelihood function or an NN objective) ✴ can be evaluated, or unbiasedly approximated. r f ( x ) ➤ Zeroth-order setting: ✴ f is unknown, or very complicated. ✴ is unknown, or very difficult to evaluate. r f ( x ) ✴ can be evaluated, or unbiasedly approximated. f ( x )

  3. BACKGROUND ➤ Hyper-parameter tuning ✴ f maps hyper-parameter to system performance r θ ✴ f is essentially unknown ➤ Experimental design ✴ f maps experimental setting (pressure, temperature, etc.) to synthesized material quality. ➤ Communication-e ffi cient optimization ✴ Data defining the objective scattered throughout machines ✴ Communicating is expensive, but is ok. r f ( x ) f ( x )

  4. PROBLEM FORMULATION ➤ Compact domain X = [0 , 1] d ➤ Objective function f : X → R ✴ f belongs to the Holder class of order α ✴ f may be non-convex ➤ Query model: adaptive x 1 , x 2 , · · · , x n ∈ X i.i.d. ∼ N (0 , 1) y t = f ( x t ) + ξ t ξ t ✴ ➤ Goal: minimize f (ˆ x n ) − inf x ∈ X f ( x )

  5. PROBLEM FORMULATION ➤ Compact domain X = [0 , 1] d ➤ Objective function f : X → R ✴ f belongs to the Holder class of order α ✴ f may be non-convex ➤ Query model: adaptive x 1 , x 2 , · · · , x n ∈ X i.i.d. ∼ N (0 , 1) y t = f ( x t ) + ξ t ξ t ✴ ➤ Goal: minimize f (ˆ x n ) − inf x ∈ X f ( x )

  6. PROBLEM FORMULATION ➤ Compact domain X = [0 , 1] d ➤ Objective function f : X → R k f ( α ) k ∞  M ✴ f belongs to the Holder class of order α ✴ f may be non-convex ➤ Query model: adaptive x 1 , x 2 , · · · , x n ∈ X i.i.d. ∼ N (0 , 1) y t = f ( x t ) + ξ t ξ t ✴ ➤ Goal: minimize f (ˆ x n ) − inf x ∈ X f ( x )

  7. PROBLEM FORMULATION ➤ Compact domain X = [0 , 1] d ➤ Objective function f : X → R k f ( α ) k ∞  M ✴ f belongs to the Holder class of order α ✴ f may be non-convex ➤ Query model: adaptive x 1 , x 2 , · · · , x n ∈ X i.i.d. ∼ N (0 , 1) y t = f ( x t ) + ξ t ξ t ✴ ➤ Goal: minimize f (ˆ x n ) − inf x ∈ X f ( x )

  8. PROBLEM FORMULATION ➤ Compact domain X = [0 , 1] d ➤ Objective function f : X → R k f ( α ) k ∞  M ✴ f belongs to the Holder class of order α ✴ f may be non-convex ➤ Query model: adaptive x 1 , x 2 , · · · , x n ∈ X i.i.d. ∼ N (0 , 1) y t = f ( x t ) + ξ t ξ t ✴ ➤ Goal: minimize f (ˆ x n ) − inf x ∈ X f ( x )

  9. PROBLEM FORMULATION ➤ Compact domain X = [0 , 1] d ➤ Objective function f : X → R k f ( α ) k ∞  M ✴ f belongs to the Holder class of order α ✴ f may be non-convex ➤ Query model: adaptive x 1 , x 2 , · · · , x n ∈ X i.i.d. ∼ N (0 , 1) y t = f ( x t ) + ξ t ξ t ✴ ➤ Goal: minimize f (ˆ x n ) − inf x ∈ X f ( x ) =: L (ˆ x n ; f )

  10. PROBLEM FORMULATION ➤ Compact domain X = [0 , 1] d ➤ Objective function f : X → R k f ( α ) k ∞  M ✴ f belongs to the Holder class of order α ✴ f may be non-convex ➤ Query model: adaptive x 1 , x 2 , · · · , x n ∈ X i.i.d. ∼ N (0 , 1) y t = f ( x t ) + ξ t ξ t ✴ ➤ Goal: minimize f (ˆ x n ) − inf x ∈ X f ( x ) =: L (ˆ x n ; f )

  11. A SIMPLE IDEA FIRST … ➤ Uniform sampling + nonparametric reconstruction

  12. A SIMPLE IDEA FIRST … ➤ Uniform sampling + nonparametric reconstruction

  13. A SIMPLE IDEA FIRST … ➤ Uniform sampling + nonparametric reconstruction

  14. A SIMPLE IDEA FIRST … ➤ Uniform sampling + nonparametric reconstruction ✴ Classical Non-parametric analysis ⇣ n − α / (2 α + d ) ⌘ f n � f k ∞ = e k ˆ O P x n ) � f ∗  2 k ˆ ✴ Implies optimization error: f (ˆ f n � f k ∞ ➤ Can we do better? ➤ NO! x n ; f )] & n − α / (2 α + d ) inf sup E f [ L (ˆ x n ˆ f ∈ Σ α ( M )

  15. A SIMPLE IDEA FIRST … ➤ Uniform sampling + nonparametric reconstruction ✴ Classical Non-parametric analysis ⇣ n − α / (2 α + d ) ⌘ f n � f k ∞ = e k ˆ O P x n ) � f ∗  2 k ˆ ✴ Implies optimization error: f (ˆ f n � f k ∞ ➤ Can we do better? No! Intuitions: h n ∼ n − 1 / (2 α + d ) n ∼ n − α / (2 α + d ) h α

  16. LOCAL RESULTS ➤ Characterize error for functions “near” a reference function f 0 ➤ What is the error rate for f close to f 0 that is … ✴ a constant function? ✴ strongly convex? ✴ has regular level sets? ✴ … ➤ Can an algorithm achieve instance-optimal error, without knowing f 0 ?

  17. NOTATIONS ➤ Some definitions L f ( ✏ ) := { x ∈ X : f ( x ) ≤ f ∗ + ✏ } ✴ Level set: ✴ Distribution function: µ f ( ✏ ) := vol( L f ( ✏ )) L f ( ✏ ) L f ( ✏ ) ✏

  18. REGULARITY CONDITIONS ➤ Some definitions L f ( ✏ ) := { x ∈ X : f ( x ) ≤ f ∗ + ✏ } ✴ Level set: ✴ Distribution function: µ f ( ✏ ) := vol( L f ( ✏ )) ➤ Regularity condition (A1): ✴ # of -radius balls needed to cover L f ( ✏ ) ⇣ 1 + µ f ( ✏ ) / � d δ Regular level-set L f ( ✏ )

  19. REGULARITY CONDITIONS ➤ Some definitions L f ( ✏ ) := { x ∈ X : f ( x ) ≤ f ∗ + ✏ } ✴ Level set: ✴ Distribution function: µ f ( ✏ ) := vol( L f ( ✏ )) ➤ Regularity condition (A1): ✴ # of -radius balls needed to cover L f ( ✏ ) ⇣ 1 + µ f ( ✏ ) / � d δ Irregular level-set L f ( ✏ )

  20. REGULARITY CONDITIONS ➤ Some definitions L f ( ✏ ) := { x ∈ X : f ( x ) ≤ f ∗ + ✏ } ✴ Level set: ✴ Distribution function: µ f ( ✏ ) := vol( L f ( ✏ )) ➤ Regularity condition (A1): ✴ # of -radius balls needed to cover L f ( ✏ ) ⇣ 1 + µ f ( ✏ ) / � d δ Irregular level-set L f ( ✏ )

  21. REGULARITY CONDITIONS ➤ Some definitions L f ( ✏ ) := { x ∈ X : f ( x ) ≤ f ∗ + ✏ } ✴ Level set: ✴ Distribution function: µ f ( ✏ ) := vol( L f ( ✏ )) ➤ Regularity condition (A2): µ f ( ✏ log n ) ≤ µ f ( ✏ ) × O (log γ n ) ✴ µ f ( ✏ ) ✏ log n Regular ✏ ✏ f

  22. REGULARITY CONDITIONS ➤ Some definitions L f ( ✏ ) := { x ∈ X : f ( x ) ≤ f ∗ + ✏ } ✴ Level set: ✴ Distribution function: µ f ( ✏ ) := vol( L f ( ✏ )) ➤ Regularity condition (A2): µ f ( ✏ log n ) ≤ µ f ( ✏ ) × O (log γ n ) ✴ µ f ( ✏ ) ✏ log n Irregular ✏ ✏ f

  23. LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup

  24. LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup

  25. LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup

  26. LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup Adaptivity: The algo does not know f .

  27. LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup Adaptivity: The algo does not know f .

  28. LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup Adaptivity: The algo does not know f .

  29. LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup Adaptivity: Instance dependent: The algo does not know f . Error rate depends on f

  30. LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup ➤ Example 1: polynomial growth µ f ( ✏ ) ⇣ ✏ β , � � 0 ε n ( f ) ⇣ n − α / (2 α + d − αβ ) Much faster than the “baseline” rate n − α / (2 α + d )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend