minimization problem with smooth components
play

Minimization Problem with Smooth Components Yu. Nesterov Presenter: - PowerPoint PPT Presentation

Minimization Problem with Smooth Components Yu. Nesterov Presenter: Lei Tang Department of CSE Arizona State University Dec. 7th, 2008 1 / 39 Outline MiniMax problem Gradient Mapping for MiniMax problem ; The complexity of gradient and


  1. Minimization Problem with Smooth Components Yu. Nesterov Presenter: Lei Tang Department of CSE Arizona State University Dec. 7th, 2008 1 / 39

  2. Outline MiniMax problem Gradient Mapping for MiniMax problem ; The complexity of gradient and optimal method; Optimization with functional constraint (General constrained optimization problem) Constrained Minimization Problem 2 / 39

  3. MiniMax Problem Objective function is composed with several components. The simplest problem of that type is minimax problem. We’ll focus on smooth minimax problem: � � min f ( x ) = max 1 ≤ i ≤ m f i ( x ) x ∈ Q where f i ∈ S 1 , 1 µ, L ( R n ) , i = 1 , · · · , m and Q is a closed convex set. f ( x ): the max-type function composed by the components f i ( x ). In general, f ( x ) is not differentiable. We use f ∈ S 1 , 1 µ, L ( R n ) to denote all the f i ∈ S 1 , 1 µ, L ( R n ). 3 / 39

  4. MiniMax Problem Objective function is composed with several components. The simplest problem of that type is minimax problem. We’ll focus on smooth minimax problem: � � min f ( x ) = max 1 ≤ i ≤ m f i ( x ) x ∈ Q where f i ∈ S 1 , 1 µ, L ( R n ) , i = 1 , · · · , m and Q is a closed convex set. f ( x ): the max-type function composed by the components f i ( x ). In general, f ( x ) is not differentiable. We use f ∈ S 1 , 1 µ, L ( R n ) to denote all the f i ∈ S 1 , 1 µ, L ( R n ). 3 / 39

  5. Connection with General Minimization Problem General Minimization Problem min f 0 ( x ) (1) f i ( x ) ≤ 0 , i = 1 , · · · , m (2) s . t . x ∈ Q (3) parametric max-type function f ( t ; x ) = max { f 0 ( x ) − t ; f i ( x ) } Will be showed later: the optimal value of f 0 ( x ) corresponds to the root t of f ( t ; x ) = 0; minimax problem is used as a subroutine to solve (1); 4 / 39

  6. Linear approximation Linearization max-type function f ( x ) = max 1 ≤ i ≤ m f i ( x ) � � f ′ linearization of f ( x ) f (¯ x ; x ) = max 1 ≤ i ≤ m [ f i (¯ x ) + i (¯ x ) , x − ¯ ] x Essentially, linearization over each component. Properties x || 2 ≤ f ( x ) ≤ f (¯ x ; x ) + µ x ; x ) + L x || 2 ; f (¯ 2 || x − ¯ 2 || x − ¯ x ∗ ∈ Q ⇔ f ( x ∗ ; x ) ≥ f ( x ∗ ; x ∗ ) = f ( x ∗ ). f ( x ) ≥ f ( x ∗ ) + µ 2 || x − x ∗ || 2 the solution x ∗ exists and unique. 5 / 39

  7. Lemma 2.3.1 x || 2 ≤ f ( x ) ≤ f (¯ x ; x ) + µ x ; x ) + L x || 2 f (¯ 2 || x − ¯ 2 || x − ¯ f i ∈ S 1 , 1 µ, L ( R n ) For strongly convex function, we have x ) � + µ x ) + � f ′ x || 2 f i ( x ) ≥ f i (¯ i (¯ x , x − ¯ 2 || x − ¯ x ; x ) + µ x || 2 = f (¯ 2 || x − ¯ x ; x ) + µ x || 2 Take the max on both sides: f ( x ) ≥ f (¯ 2 || x − ¯ For Lipshitz continuous function, it follows x ) � + L x ) + � f ′ x || 2 f i ( x ) ≤ f i (¯ i (¯ x , x − ¯ 2 || x − ¯ x ; x ) + L x || 2 = f (¯ 2 || x − ¯ max operation keeps the property as smooth strongly convex function. 6 / 39

  8. Lemma 2.3.1 x || 2 ≤ f ( x ) ≤ f (¯ x ; x ) + µ x ; x ) + L x || 2 f (¯ 2 || x − ¯ 2 || x − ¯ f i ∈ S 1 , 1 µ, L ( R n ) For strongly convex function, we have x ) � + µ x ) + � f ′ x || 2 f i ( x ) ≥ f i (¯ i (¯ x , x − ¯ 2 || x − ¯ x ; x ) + µ x || 2 = f (¯ 2 || x − ¯ x ; x ) + µ x || 2 Take the max on both sides: f ( x ) ≥ f (¯ 2 || x − ¯ For Lipshitz continuous function, it follows x ) � + L x ) + � f ′ x || 2 f i ( x ) ≤ f i (¯ i (¯ x , x − ¯ 2 || x − ¯ x ; x ) + L x || 2 = f (¯ 2 || x − ¯ max operation keeps the property as smooth strongly convex function. 6 / 39

  9. Lemma 2.3.1 x || 2 ≤ f ( x ) ≤ f (¯ x ; x ) + µ x ; x ) + L x || 2 f (¯ 2 || x − ¯ 2 || x − ¯ f i ∈ S 1 , 1 µ, L ( R n ) For strongly convex function, we have x ) � + µ x ) + � f ′ x || 2 f i ( x ) ≥ f i (¯ i (¯ x , x − ¯ 2 || x − ¯ x ; x ) + µ x || 2 = f (¯ 2 || x − ¯ x ; x ) + µ x || 2 Take the max on both sides: f ( x ) ≥ f (¯ 2 || x − ¯ For Lipshitz continuous function, it follows x ) � + L x ) + � f ′ x || 2 f i ( x ) ≤ f i (¯ i (¯ x , x − ¯ 2 || x − ¯ x ; x ) + L x || 2 = f (¯ 2 || x − ¯ max operation keeps the property as smooth strongly convex function. 6 / 39

  10. Theorem 2.3.1: x ∗ ∈ Q ⇔ f ( x ∗ ; x ) ≥ f ( x ∗ ; x ∗ ) = f ( x ∗ ) x ; x ) + µ x || 2 , we have ⇐ As f ( x ) ≥ f (¯ 2 || x − ¯ f ( x ) ≥ f ( x ∗ ; x ) + µ 2 || x − x ∗ || 2 ≥ f ( x ∗ ; x ∗ ) + 0 = f ( x ∗ ) ⇒ Prove by contradiction: if f ( x ∗ ; x ) < f ( x ∗ ), then for 1 ≤ i ≤ m f i ( x ∗ ) + � f ′ (¯ x ; x ∗ ) , x − x ∗ � < f ( x ∗ ) = max 1 ≤ i ≤ m f i ( x ∗ ) Define φ i ( α ) = f i ( x ∗ + α ( x − x ∗ )) , α ∈ [0 , 1] So either φ i (0) ≡ f i ( x ∗ ) < f ( x ∗ ) or φ i (0) = f ( x ∗ ) , φ ′ i (0) = � f ′ i ( x ∗ ) , x − x ∗ � < 0 So small enough α , f i ( x ∗ + α ( x − x ∗ )) = φ i ( α ) < f ( x ∗ ) ∀ 1 ≤ i ≤ m contradiction! Linearization achieves its minimum at x ∗ 7 / 39

  11. Theorem 2.3.1: x ∗ ∈ Q ⇔ f ( x ∗ ; x ) ≥ f ( x ∗ ; x ∗ ) = f ( x ∗ ) x ; x ) + µ x || 2 , we have ⇐ As f ( x ) ≥ f (¯ 2 || x − ¯ f ( x ) ≥ f ( x ∗ ; x ) + µ 2 || x − x ∗ || 2 ≥ f ( x ∗ ; x ∗ ) + 0 = f ( x ∗ ) ⇒ Prove by contradiction: if f ( x ∗ ; x ) < f ( x ∗ ), then for 1 ≤ i ≤ m f i ( x ∗ ) + � f ′ (¯ x ; x ∗ ) , x − x ∗ � < f ( x ∗ ) = max 1 ≤ i ≤ m f i ( x ∗ ) Define φ i ( α ) = f i ( x ∗ + α ( x − x ∗ )) , α ∈ [0 , 1] So either φ i (0) ≡ f i ( x ∗ ) < f ( x ∗ ) or φ i (0) = f ( x ∗ ) , φ ′ i (0) = � f ′ i ( x ∗ ) , x − x ∗ � < 0 So small enough α , f i ( x ∗ + α ( x − x ∗ )) = φ i ( α ) < f ( x ∗ ) ∀ 1 ≤ i ≤ m contradiction! Linearization achieves its minimum at x ∗ 7 / 39

  12. Theorem 2.3.1: x ∗ ∈ Q ⇔ f ( x ∗ ; x ) ≥ f ( x ∗ ; x ∗ ) = f ( x ∗ ) x ; x ) + µ x || 2 , we have ⇐ As f ( x ) ≥ f (¯ 2 || x − ¯ f ( x ) ≥ f ( x ∗ ; x ) + µ 2 || x − x ∗ || 2 ≥ f ( x ∗ ; x ∗ ) + 0 = f ( x ∗ ) ⇒ Prove by contradiction: if f ( x ∗ ; x ) < f ( x ∗ ), then for 1 ≤ i ≤ m f i ( x ∗ ) + � f ′ (¯ x ; x ∗ ) , x − x ∗ � < f ( x ∗ ) = max 1 ≤ i ≤ m f i ( x ∗ ) Define φ i ( α ) = f i ( x ∗ + α ( x − x ∗ )) , α ∈ [0 , 1] So either φ i (0) ≡ f i ( x ∗ ) < f ( x ∗ ) or φ i (0) = f ( x ∗ ) , φ ′ i (0) = � f ′ i ( x ∗ ) , x − x ∗ � < 0 So small enough α , f i ( x ∗ + α ( x − x ∗ )) = φ i ( α ) < f ( x ∗ ) ∀ 1 ≤ i ≤ m contradiction! Linearization achieves its minimum at x ∗ 7 / 39

  13. Theorem 2.3.1: x ∗ ∈ Q ⇔ f ( x ∗ ; x ) ≥ f ( x ∗ ; x ∗ ) = f ( x ∗ ) x ; x ) + µ x || 2 , we have ⇐ As f ( x ) ≥ f (¯ 2 || x − ¯ f ( x ) ≥ f ( x ∗ ; x ) + µ 2 || x − x ∗ || 2 ≥ f ( x ∗ ; x ∗ ) + 0 = f ( x ∗ ) ⇒ Prove by contradiction: if f ( x ∗ ; x ) < f ( x ∗ ), then for 1 ≤ i ≤ m f i ( x ∗ ) + � f ′ (¯ x ; x ∗ ) , x − x ∗ � < f ( x ∗ ) = max 1 ≤ i ≤ m f i ( x ∗ ) Define φ i ( α ) = f i ( x ∗ + α ( x − x ∗ )) , α ∈ [0 , 1] So either φ i (0) ≡ f i ( x ∗ ) < f ( x ∗ ) or φ i (0) = f ( x ∗ ) , φ ′ i (0) = � f ′ i ( x ∗ ) , x − x ∗ � < 0 So small enough α , f i ( x ∗ + α ( x − x ∗ )) = φ i ( α ) < f ( x ∗ ) ∀ 1 ≤ i ≤ m contradiction! Linearization achieves its minimum at x ∗ 7 / 39

  14. Corollary 2.3.1 f ( x ) ≥ f ( x ∗ ) + µ 2 || x − x ∗ || 2 x ; x ) + µ x || 2 f ( x ) ≥ f (¯ 2 || x − ¯ f ( x ∗ ; x ) + µ 2 || x − x ∗ || 2 ≥ f ( x ∗ ; x ∗ ) + µ 2 || x − x ∗ || 2 ≥ f ( x ∗ ) + µ 2 || x − x ∗ || 2 = So if x ∗ exists, it must be unique. 8 / 39

  15. Corollary 2.3.1 f ( x ) ≥ f ( x ∗ ) + µ 2 || x − x ∗ || 2 x ; x ) + µ x || 2 f ( x ) ≥ f (¯ 2 || x − ¯ f ( x ∗ ; x ) + µ 2 || x − x ∗ || 2 ≥ f ( x ∗ ; x ∗ ) + µ 2 || x − x ∗ || 2 ≥ f ( x ∗ ) + µ 2 || x − x ∗ || 2 = So if x ∗ exists, it must be unique. 8 / 39

  16. Theorem 3.2 Let a max-type function f ( x ) ∈ S 1 µ ( R n ), µ > 0, and Q be a closed convex set. Then the solution x ∗ exists and unique. x ∈ Q , consider the set ¯ Let ¯ Q = { x ∈ Q | f ( x ) ≤ f (¯ x ) } . Transform to a problem as min { f ( x ) | x ∈ ¯ Q } Need to show ¯ Q is bounded. x � + µ x ) + � f ′ (¯ x || 2 f (¯ x ) ≥ f i ( x ) ≥ f i (¯ x ) , x − ¯ 2 || x − ¯ µ x || 2 ≤ || f ′ (¯ = ⇒ 2 || x − ¯ x ) || · || x − ¯ x || + f (¯ x ) − f i (¯ x ) So the solution x ∗ exists and is unique 9 / 39

  17. Quick Summary MiniMax, though generally not smooth, share all the properties as minimizing smooth strongly convex functions over simple convex set. Linearization max-type function f ( x ) = max 1 ≤ i ≤ m f i ( x ) � � f ′ linearization of f ( x ) f (¯ x ; x ) = max 1 ≤ i ≤ m [ f i (¯ x ) + i (¯ x ) , x − ¯ x ] Essentially, linearization over each component. Properties x || 2 ≤ f ( x ) ≤ f (¯ x ; x ) + µ x ; x ) + L x || 2 ; f (¯ 2 || x − ¯ 2 || x − ¯ x ∗ ∈ Q ⇔ f ( x ∗ ; x ) ≥ f ( x ∗ ; x ∗ ) = f ( x ∗ ). f ( x ) ≥ f ( x ∗ ) + µ 2 || x − x ∗ || 2 the solution x ∗ exists and unique. 10 / 39

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend