ZEROTH-ORDER NON-CONVEX SMOOTH OPTIMIZATION: LOCAL MINIMAX RATES - PowerPoint PPT Presentation

ZEROTH-ORDER NON-CONVEX SMOOTH OPTIMIZATION: LOCAL MINIMAX RATES Yining Wang, CMU joint work with Sivaraman Balakrishnan and Aarti Singh

BACKGROUND ➤ Optimization: min x ∈ X f ( x ) ➤ Classical setting (first-order): ✴ f is known (e.g., a likelihood function or an NN objective) ✴ can be evaluated, or unbiasedly approximated. r f ( x ) ➤ Zeroth-order setting: ✴ f is unknown, or very complicated. ✴ is unknown, or very difficult to evaluate. r f ( x ) ✴ can be evaluated, or unbiasedly approximated. f ( x )

BACKGROUND ➤ Hyper-parameter tuning ✴ f maps hyper-parameter to system performance r θ ✴ f is essentially unknown ➤ Experimental design ✴ f maps experimental setting (pressure, temperature, etc.) to synthesized material quality. ➤ Communication-e ffi cient optimization ✴ Data defining the objective scattered throughout machines ✴ Communicating is expensive, but is ok. r f ( x ) f ( x )

PROBLEM FORMULATION ➤ Compact domain X = [0 , 1] d ➤ Objective function f : X → R ✴ f belongs to the Holder class of order α ✴ f may be non-convex ➤ Query model: adaptive x 1 , x 2 , · · · , x n ∈ X i.i.d. ∼ N (0 , 1) y t = f ( x t ) + ξ t ξ t ✴ ➤ Goal: minimize f (ˆ x n ) − inf x ∈ X f ( x )

PROBLEM FORMULATION ➤ Compact domain X = [0 , 1] d ➤ Objective function f : X → R k f ( α ) k ∞  M ✴ f belongs to the Holder class of order α ✴ f may be non-convex ➤ Query model: adaptive x 1 , x 2 , · · · , x n ∈ X i.i.d. ∼ N (0 , 1) y t = f ( x t ) + ξ t ξ t ✴ ➤ Goal: minimize f (ˆ x n ) − inf x ∈ X f ( x )

PROBLEM FORMULATION ➤ Compact domain X = [0 , 1] d ➤ Objective function f : X → R k f ( α ) k ∞  M ✴ f belongs to the Holder class of order α ✴ f may be non-convex ➤ Query model: adaptive x 1 , x 2 , · · · , x n ∈ X i.i.d. ∼ N (0 , 1) y t = f ( x t ) + ξ t ξ t ✴ ➤ Goal: minimize f (ˆ x n ) − inf x ∈ X f ( x ) =: L (ˆ x n ; f )

A SIMPLE IDEA FIRST … ➤ Uniform sampling + nonparametric reconstruction

A SIMPLE IDEA FIRST … ➤ Uniform sampling + nonparametric reconstruction ✴ Classical Non-parametric analysis ⇣ n − α / (2 α + d ) ⌘ f n � f k ∞ = e k ˆ O P x n ) � f ∗  2 k ˆ ✴ Implies optimization error: f (ˆ f n � f k ∞ ➤ Can we do better? ➤ NO! x n ; f )] & n − α / (2 α + d ) inf sup E f [ L (ˆ x n ˆ f ∈ Σ α ( M )

A SIMPLE IDEA FIRST … ➤ Uniform sampling + nonparametric reconstruction ✴ Classical Non-parametric analysis ⇣ n − α / (2 α + d ) ⌘ f n � f k ∞ = e k ˆ O P x n ) � f ∗  2 k ˆ ✴ Implies optimization error: f (ˆ f n � f k ∞ ➤ Can we do better? No! Intuitions: h n ∼ n − 1 / (2 α + d ) n ∼ n − α / (2 α + d ) h α

LOCAL RESULTS ➤ Characterize error for functions “near” a reference function f 0 ➤ What is the error rate for f close to f 0 that is … ✴ a constant function? ✴ strongly convex? ✴ has regular level sets? ✴ … ➤ Can an algorithm achieve instance-optimal error, without knowing f 0 ?

NOTATIONS ➤ Some definitions L f ( ✏ ) := { x ∈ X : f ( x ) ≤ f ∗ + ✏ } ✴ Level set: ✴ Distribution function: µ f ( ✏ ) := vol( L f ( ✏ )) L f ( ✏ ) L f ( ✏ ) ✏

REGULARITY CONDITIONS ➤ Some definitions L f ( ✏ ) := { x ∈ X : f ( x ) ≤ f ∗ + ✏ } ✴ Level set: ✴ Distribution function: µ f ( ✏ ) := vol( L f ( ✏ )) ➤ Regularity condition (A1): ✴ # of -radius balls needed to cover L f ( ✏ ) ⇣ 1 + µ f ( ✏ ) / � d δ Regular level-set L f ( ✏ )

REGULARITY CONDITIONS ➤ Some definitions L f ( ✏ ) := { x ∈ X : f ( x ) ≤ f ∗ + ✏ } ✴ Level set: ✴ Distribution function: µ f ( ✏ ) := vol( L f ( ✏ )) ➤ Regularity condition (A1): ✴ # of -radius balls needed to cover L f ( ✏ ) ⇣ 1 + µ f ( ✏ ) / � d δ Irregular level-set L f ( ✏ )

REGULARITY CONDITIONS ➤ Some definitions L f ( ✏ ) := { x ∈ X : f ( x ) ≤ f ∗ + ✏ } ✴ Level set: ✴ Distribution function: µ f ( ✏ ) := vol( L f ( ✏ )) ➤ Regularity condition (A2): µ f ( ✏ log n ) ≤ µ f ( ✏ ) × O (log γ n ) ✴ µ f ( ✏ ) ✏ log n Regular ✏ ✏ f

REGULARITY CONDITIONS ➤ Some definitions L f ( ✏ ) := { x ∈ X : f ( x ) ≤ f ∗ + ✏ } ✴ Level set: ✴ Distribution function: µ f ( ✏ ) := vol( L f ( ✏ )) ➤ Regularity condition (A2): µ f ( ✏ log n ) ≤ µ f ( ✏ ) × O (log γ n ) ✴ µ f ( ✏ ) ✏ log n Irregular ✏ ✏ f

LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup

LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup Adaptivity: The algo does not know f .

LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup Adaptivity: Instance dependent: The algo does not know f . Error rate depends on f

LOCAL UPPER BOUND ➤ Main result on local upper bound: THEOREM 1. Suppose regularity conditions hold. There exists an algorithm such that for sufficiently large n , x n ; f ) ≥ C ε n ( f ) log c n ] ≤ 1 / 4 sup Pr f [ L (ˆ f ∈ Σ α ( M ) n o ✏ > 0 : ✏ − (2+ d/ α ) µ f ( ✏ ) ≥ n where " n ( f ) := sup ➤ Example 1: polynomial growth µ f ( ✏ ) ⇣ ✏ β , � � 0 ε n ( f ) ⇣ n − α / (2 α + d − αβ ) Much faster than the “baseline” rate n − α / (2 α + d )

ZEROTH-ORDER NON-CONVEX SMOOTH OPTIMIZATION: LOCAL MINIMAX RATES - PowerPoint PPT Presentation

ZEROTH-ORDER NON-CONVEX SMOOTH OPTIMIZATION: LOCAL MINIMAX RATES Yining Wang, CMU joint work with Sivaraman Balakrishnan and Aarti Singh BACKGROUND Optimization: min x X f ( x ) Classical setting (first-order): f is known (e.g.,

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Non-Smooth Convex Optimization in Data Sciences Jalal Fadili Normandie Universit-ENSICAEN,

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization Kaiyi Ji

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

Dimension Free Optimization and Non-Convex Optimization Instructor: Sham Kakade 1 Non-convex

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Tech SEO fixes for large sites that made a difference 20 years web experience - 15 years of

Optimizing Cloud Use Case of Complete . . . under Resulting Formula for . . . Optimization:

How Quantum Computing This Lower Bound Is . . . Can Help With (Continuous) How Quantum . . .

(recent advancements in) Optimization in the Big Data Regime Sham M. Kakade Computer

MANET-based Nested NEMO Route Optimization Thomas Heide Clausen, Emmanuel Baccelli Ecole

GAMS: A POWERFUL OPTIMIZATION TOOL AND ITS INTERFACE TO MATLAB Muhammad Ismail Outline 2

SEO for WordPress 27 SEO Essentials to Master in 2017 Presenter: Stephen Stanczak, Founder and

Welcome! Todays Agenda: Introduction Course Formalities High Level Overview