Minimization Problem with Smooth Components Yu. Nesterov Presenter: - PowerPoint PPT Presentation

Minimization Problem with Smooth Components Yu. Nesterov Presenter: Lei Tang Department of CSE Arizona State University Dec. 7th, 2008 1 / 39

Outline MiniMax problem Gradient Mapping for MiniMax problem ; The complexity of gradient and optimal method; Optimization with functional constraint (General constrained optimization problem) Constrained Minimization Problem 2 / 39

MiniMax Problem Objective function is composed with several components. The simplest problem of that type is minimax problem. We’ll focus on smooth minimax problem: � � min f ( x ) = max 1 ≤ i ≤ m f i ( x ) x ∈ Q where f i ∈ S 1 , 1 µ, L ( R n ) , i = 1 , · · · , m and Q is a closed convex set. f ( x ): the max-type function composed by the components f i ( x ). In general, f ( x ) is not differentiable. We use f ∈ S 1 , 1 µ, L ( R n ) to denote all the f i ∈ S 1 , 1 µ, L ( R n ). 3 / 39

Connection with General Minimization Problem General Minimization Problem min f 0 ( x ) (1) f i ( x ) ≤ 0 , i = 1 , · · · , m (2) s . t . x ∈ Q (3) parametric max-type function f ( t ; x ) = max { f 0 ( x ) − t ; f i ( x ) } Will be showed later: the optimal value of f 0 ( x ) corresponds to the root t of f ( t ; x ) = 0; minimax problem is used as a subroutine to solve (1); 4 / 39

Linear approximation Linearization max-type function f ( x ) = max 1 ≤ i ≤ m f i ( x ) � � f ′ linearization of f ( x ) f (¯ x ; x ) = max 1 ≤ i ≤ m [ f i (¯ x ) + i (¯ x ) , x − ¯ ] x Essentially, linearization over each component. Properties x || 2 ≤ f ( x ) ≤ f (¯ x ; x ) + µ x ; x ) + L x || 2 ; f (¯ 2 || x − ¯ 2 || x − ¯ x ∗ ∈ Q ⇔ f ( x ∗ ; x ) ≥ f ( x ∗ ; x ∗ ) = f ( x ∗ ). f ( x ) ≥ f ( x ∗ ) + µ 2 || x − x ∗ || 2 the solution x ∗ exists and unique. 5 / 39

Lemma 2.3.1 x || 2 ≤ f ( x ) ≤ f (¯ x ; x ) + µ x ; x ) + L x || 2 f (¯ 2 || x − ¯ 2 || x − ¯ f i ∈ S 1 , 1 µ, L ( R n ) For strongly convex function, we have x ) � + µ x ) + � f ′ x || 2 f i ( x ) ≥ f i (¯ i (¯ x , x − ¯ 2 || x − ¯ x ; x ) + µ x || 2 = f (¯ 2 || x − ¯ x ; x ) + µ x || 2 Take the max on both sides: f ( x ) ≥ f (¯ 2 || x − ¯ For Lipshitz continuous function, it follows x ) � + L x ) + � f ′ x || 2 f i ( x ) ≤ f i (¯ i (¯ x , x − ¯ 2 || x − ¯ x ; x ) + L x || 2 = f (¯ 2 || x − ¯ max operation keeps the property as smooth strongly convex function. 6 / 39

Theorem 2.3.1: x ∗ ∈ Q ⇔ f ( x ∗ ; x ) ≥ f ( x ∗ ; x ∗ ) = f ( x ∗ ) x ; x ) + µ x || 2 , we have ⇐ As f ( x ) ≥ f (¯ 2 || x − ¯ f ( x ) ≥ f ( x ∗ ; x ) + µ 2 || x − x ∗ || 2 ≥ f ( x ∗ ; x ∗ ) + 0 = f ( x ∗ ) ⇒ Prove by contradiction: if f ( x ∗ ; x ) < f ( x ∗ ), then for 1 ≤ i ≤ m f i ( x ∗ ) + � f ′ (¯ x ; x ∗ ) , x − x ∗ � < f ( x ∗ ) = max 1 ≤ i ≤ m f i ( x ∗ ) Define φ i ( α ) = f i ( x ∗ + α ( x − x ∗ )) , α ∈ [0 , 1] So either φ i (0) ≡ f i ( x ∗ ) < f ( x ∗ ) or φ i (0) = f ( x ∗ ) , φ ′ i (0) = � f ′ i ( x ∗ ) , x − x ∗ � < 0 So small enough α , f i ( x ∗ + α ( x − x ∗ )) = φ i ( α ) < f ( x ∗ ) ∀ 1 ≤ i ≤ m contradiction! Linearization achieves its minimum at x ∗ 7 / 39

Corollary 2.3.1 f ( x ) ≥ f ( x ∗ ) + µ 2 || x − x ∗ || 2 x ; x ) + µ x || 2 f ( x ) ≥ f (¯ 2 || x − ¯ f ( x ∗ ; x ) + µ 2 || x − x ∗ || 2 ≥ f ( x ∗ ; x ∗ ) + µ 2 || x − x ∗ || 2 ≥ f ( x ∗ ) + µ 2 || x − x ∗ || 2 = So if x ∗ exists, it must be unique. 8 / 39

Theorem 3.2 Let a max-type function f ( x ) ∈ S 1 µ ( R n ), µ > 0, and Q be a closed convex set. Then the solution x ∗ exists and unique. x ∈ Q , consider the set ¯ Let ¯ Q = { x ∈ Q | f ( x ) ≤ f (¯ x ) } . Transform to a problem as min { f ( x ) | x ∈ ¯ Q } Need to show ¯ Q is bounded. x � + µ x ) + � f ′ (¯ x || 2 f (¯ x ) ≥ f i ( x ) ≥ f i (¯ x ) , x − ¯ 2 || x − ¯ µ x || 2 ≤ || f ′ (¯ = ⇒ 2 || x − ¯ x ) || · || x − ¯ x || + f (¯ x ) − f i (¯ x ) So the solution x ∗ exists and is unique 9 / 39

Quick Summary MiniMax, though generally not smooth, share all the properties as minimizing smooth strongly convex functions over simple convex set. Linearization max-type function f ( x ) = max 1 ≤ i ≤ m f i ( x ) � � f ′ linearization of f ( x ) f (¯ x ; x ) = max 1 ≤ i ≤ m [ f i (¯ x ) + i (¯ x ) , x − ¯ x ] Essentially, linearization over each component. Properties x || 2 ≤ f ( x ) ≤ f (¯ x ; x ) + µ x ; x ) + L x || 2 ; f (¯ 2 || x − ¯ 2 || x − ¯ x ∗ ∈ Q ⇔ f ( x ∗ ; x ) ≥ f ( x ∗ ; x ∗ ) = f ( x ∗ ). f ( x ) ≥ f ( x ∗ ) + µ 2 || x − x ∗ || 2 the solution x ∗ exists and unique. 10 / 39

Minimization Problem with Smooth Components Yu. Nesterov Presenter: - PowerPoint PPT Presentation

Minimization Problem with Smooth Components Yu. Nesterov Presenter: Lei Tang Department of CSE Arizona State University Dec. 7th, 2008 1 / 39 Outline MiniMax problem Gradient Mapping for MiniMax problem ; The complexity of gradient and

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

1 The Minimization Problem The Minimization Problem Input: A DFA (deterministic finite-state

A Minimization Algorithm Consider the minimization problem: * M min M M * subject

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

Moment methods in energy minimization David de Laat CWI Amsterdam Andrejewski-Tage Moment

Benefits of Radial Build Benefits of Radial Build Minimization and Requirements Minimization and

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view

ARS Workshop Context Markov Random Fields minimization and minimal cuts in Exact total variation

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Learning as Loss Minimization Machine Learning 1 Learning as loss minimization The setup

One-Dimensional Minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Minimization Using Descent Information we will consider the minimization of unconstrained

Cluster Minimization in Geometric Graphs Jakob Geiger Motivation Motivation Cluster

11. Equality constrained minimization equality constrained minimization eliminating

CENG 342 Digital Systems Tabular Minimization Larry Pyeatt SDSM&T Tabular Minimization

Connect Four by Jacob Frericks Rules Snapshot Algorithms: Random Randomly chooses a

On Minimax Optimality of GANs for Robust Mean Estimation Kaiwen Wu 1,2 With Gavin Weiguang Ding 3

Interval-valued regression and classication models in the framework of machine learning Lev

!"#"!$%&'()*&+,"**-(& !"#"!$%&'()*&+,"**-(&

Enhancing the Climate Resilience of African Infrastructure T HE W ATER AND POWER S ECTORS : S

Company pany Pre Present entation ion C.S.I. s.r.l. is firefighting equipment manufacturer,

The Parks McClellan algorithm: a scalable approach for designing FIR filters Silviu Filip under

Costly Public Transfers in Repeated Cooperation Under Imperfect Monitoring Mikhail Panov and

Minimization Problem with Smooth Components Yu. Nesterov Presenter: - PowerPoint PPT Presentation

Minimization Problem with Smooth Components Yu. Nesterov Presenter: Lei Tang Department of CSE Arizona State University Dec. 7th, 2008 1 / 39 Outline MiniMax problem Gradient Mapping for MiniMax problem ; The complexity of gradient and

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

1 The Minimization Problem The Minimization Problem Input: A DFA (deterministic finite-state

A Minimization Algorithm Consider the minimization problem: * M min M M * subject

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

Moment methods in energy minimization David de Laat CWI Amsterdam Andrejewski-Tage Moment

Benefits of Radial Build Benefits of Radial Build Minimization and Requirements Minimization and

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view

ARS Workshop Context Markov Random Fields minimization and minimal cuts in Exact total variation

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Learning as Loss Minimization Machine Learning 1 Learning as loss minimization The setup

One-Dimensional Minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Minimization Using Descent Information we will consider the minimization of unconstrained

Cluster Minimization in Geometric Graphs Jakob Geiger Motivation Motivation Cluster

11. Equality constrained minimization equality constrained minimization eliminating

CENG 342 Digital Systems Tabular Minimization Larry Pyeatt SDSM&amp;T Tabular Minimization

Connect Four by Jacob Frericks Rules Snapshot Algorithms: Random Randomly chooses a

On Minimax Optimality of GANs for Robust Mean Estimation Kaiwen Wu 1,2 With Gavin Weiguang Ding 3

Interval-valued regression and classication models in the framework of machine learning Lev

!&quot;#&quot;!$%&amp;'()*&amp;+,&quot;**-(&amp; !&quot;#&quot;!$%&amp;'()*&amp;+,&quot;**-(&amp;

Enhancing the Climate Resilience of African Infrastructure T HE W ATER AND POWER S ECTORS : S

Company pany Pre Present entation ion C.S.I. s.r.l. is firefighting equipment manufacturer,

The Parks McClellan algorithm: a scalable approach for designing FIR filters Silviu Filip under

Costly Public Transfers in Repeated Cooperation Under Imperfect Monitoring Mikhail Panov and

CENG 342 Digital Systems Tabular Minimization Larry Pyeatt SDSM&T Tabular Minimization

!"#"!$%&'()*&+,"**-(& !"#"!$%&'()*&+,"**-(&