compsci 514 algorithms for data science
play

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Spring 2020. Lecture 23 0 summary Last Class: convexity and Lipschitzness. This Class: optimization. 1 Multivariable calculus review and


  1. compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Spring 2020. Lecture 23 0

  2. summary Last Class: convexity and Lipschitzness. This Class: optimization. 1 • Multivariable calculus review and gradient computation. • Introduction to gradient descent. Motivation as a greedy algorithm. • Conditions under which we will analyze gradient descent: • Analysis of gradient descent for Lipschitz, convex functions. • Simple extension to projected gradient descent for constrained

  3. convexity 2 Definition – Convex Function: A function f : R d → R is convex if and only if, for any ⃗ θ 1 , ⃗ θ 2 ∈ R d and λ ∈ [ 0 , 1 ] : ( ) ( 1 − λ ) · f ( ⃗ θ 1 ) + λ · f ( ⃗ ( 1 − λ ) · ⃗ θ 1 + λ · ⃗ θ 2 ) ≥ f θ 2 Corollary – Convex Function: A function f : R d → R is convex if and only if, for any ⃗ θ 1 , ⃗ θ 2 ∈ R d and λ ∈ [ 0 , 1 ] : θ 1 ) T ( ) f ( ⃗ θ 2 ) − f ( ⃗ θ 1 ) ≥ ⃗ ∇ f ( ⃗ ⃗ θ 2 − ⃗ θ 1 Definition – Lipschitz Function: A function f : R d → R is G - ∇ f ( ⃗ θ ) ∥ 2 ≤ G for all ⃗ Lipschitz if ∥ ⃗ θ .

  4. gd analysis – convex functions Gradient Descent t . Assume that: G R 3 • f is convex. • f is G -Lipschitz. • ∥ ⃗ θ 1 − ⃗ θ ∗ ∥ 2 ≤ R where ⃗ θ 1 is the initialization point. • Choose some initialization ⃗ θ 1 and set η = √ • For i = 1 , . . . , t − 1 • ⃗ θ i + 1 = ⃗ θ i − η⃗ ∇ f ( ⃗ θ i ) • Return ˆ θ t f ( ⃗ θ = arg min ⃗ θ i ) . θ 1 ,...,⃗

  5. gd analysis proof G 2 Theorem – GD on Convex Lipschitz Functions: For convex G - t , 4 R Lipschitz function f , GD run with t ≥ R 2 G 2 iterations, η = √ ϵ 2 θ ∗ , outputs ˆ and starting point within radius R of ⃗ θ satisfying: f (ˆ θ ) ≤ f ( ⃗ θ ∗ ) + ϵ. θ ∗ ) ≤ ∥ ⃗ θ i − ⃗ 2 −∥ ⃗ θ i + 1 − ⃗ θ ∗ ∥ 2 θ ∗ ∥ 2 Step 1: For all i , f ( ⃗ θ i ) − f ( ⃗ + η G 2 2 η 2 . Visually:

  6. gd analysis proof G 2 Theorem – GD on Convex Lipschitz Functions: For convex G - t , 5 R Lipschitz function f , GD run with t ≥ R 2 G 2 iterations, η = √ ϵ 2 θ ∗ , outputs ˆ and starting point within radius R of ⃗ θ satisfying: f (ˆ θ ) ≤ f ( ⃗ θ ∗ ) + ϵ. θ ∗ ) ≤ ∥ ⃗ 2 −∥ ⃗ θ i + 1 − ⃗ θ i − θ ∗ ∥ 2 θ ∗ ∥ 2 Step 1: For all i , f ( ⃗ θ i ) − f ( ⃗ + η G 2 2 η 2 . Formally:

  7. gd analysis proof t , 2 2 2 Theorem – GD on Convex Lipschitz Functions: For convex G - 6 G R Lipschitz function f , GD run with t ≥ R 2 G 2 iterations, η = √ ϵ 2 and starting point within radius R of ⃗ θ ∗ , outputs ˆ θ satisfying: f (ˆ θ ) ≤ f ( ⃗ θ ∗ ) + ϵ. θ ∗ ) ≤ ∥ ⃗ θ i − ⃗ 2 −∥ ⃗ θ i + 1 − ⃗ Step 1: For all i , f ( ⃗ θ i ) − f ( ⃗ θ ∗ ∥ 2 θ ∗ ∥ 2 + η G 2 2 η 2 . θ ∗ ) ≤ ∥ ⃗ θ i − ⃗ 2 −∥ ⃗ θ i + 1 − ⃗ θ ∗ ∥ 2 θ ∗ ∥ 2 Step 1.1: ⃗ ∇ f ( ⃗ θ i ) T ( ⃗ θ i − ⃗ + η G 2 = ⇒ Step 1. 2 η

  8. gd analysis proof t , R 2 t 2 2 Theorem – GD on Convex Lipschitz Functions: For convex G - 7 R G Lipschitz function f , GD run with t ≥ R 2 G 2 iterations, η = √ ϵ 2 and starting point within radius R of ⃗ θ ∗ , outputs ˆ θ satisfying: f (ˆ θ ) ≤ f ( ⃗ θ ∗ ) + ϵ. θ ∗ ) ≤ ∥ ⃗ θ i − ⃗ 2 −∥ ⃗ θ i + 1 − ⃗ Step 1: For all i , f ( ⃗ θ i ) − f ( ⃗ θ ∗ ∥ 2 θ ∗ ∥ 2 + η G 2 = ⇒ 2 η i = 1 f ( ⃗ θ i ) − f ( ⃗ 2 η · t + η G 2 ∑ t θ ∗ ) ≤ Step 2: 1 2 .

  9. gd analysis proof t , R 2 t Theorem – GD on Convex Lipschitz Functions: For convex G - 8 G R Lipschitz function f , GD run with t ≥ R 2 G 2 iterations, η = √ ϵ 2 θ ∗ , outputs ˆ and starting point within radius R of ⃗ θ satisfying: f (ˆ θ ) ≤ f ( ⃗ θ ∗ ) + ϵ. i = 1 f ( ⃗ θ i ) − f ( ⃗ 2 η · t + η G 2 ∑ t θ ∗ ) ≤ Step 2: 1 2 .

  10. constrained convex optimization Often want to perform convex optimization with convex constraints. 9 θ ∗ = arg min ⃗ f ( ⃗ θ ) , ⃗ θ ∈S where S is a convex set. Definition – Convex Set: A set S ⊆ R d is convex if and only if, for any ⃗ θ 1 , ⃗ θ 2 ∈ S and λ ∈ [ 0 , 1 ] : ( 1 − λ ) ⃗ θ 1 + λ · ⃗ θ 2 ∈ S θ ∈ R d : ∥ ⃗ E.g. S = { ⃗ θ ∥ 2 ≤ 1 } .

  11. projected gradient descent Projected Gradient Descent t . G R 10 For any convex set let P S ( · ) denote the projection function onto S . θ ∈S ∥ ⃗ • P S ( ⃗ θ − ⃗ y ) = arg min ⃗ y ∥ 2 . θ ∈ R d : ∥ ⃗ • For S = { ⃗ θ ∥ 2 ≤ 1 } what is P S ( ⃗ y ) ? • For S being a k dimensional subspace of R d , what is P S ( ⃗ y ) ? • Choose some initialization ⃗ θ 1 and set η = √ • For i = 1 , . . . , t − 1 θ ( out ) • ⃗ = ⃗ θ i − η · ⃗ ∇ f ( ⃗ θ i ) i + 1 θ ( out ) • ⃗ θ i + 1 = P S ( ⃗ i + 1 ) . • Return ˆ θ i f ( ⃗ θ = arg min ⃗ θ i ) .

  12. convex projections Projected gradient descent can be analyzed identically to gradient descent! 11 Theorem – Projection to a convex set: For any convex set S ⊆ y ∈ R d , and ⃗ R d , ⃗ θ ∈ S , y ) − ⃗ y − ⃗ ∥ P S ( ⃗ θ ∥ 2 ≤ ∥ ⃗ θ ∥ 2 .

  13. projected gradient descent analysis Theorem – Projected GD: For convex G -Lipschitz function f , and 2 R 2 t 2 2 12 G t , R convex set S , Projected GD run with t ≥ R 2 G 2 ϵ 2 iterations, η = √ θ ∗ , outputs ˆ and starting point within radius R of ⃗ θ satisfying: f (ˆ θ ) ≤ f ( ⃗ f ( ⃗ θ ∗ ) + ϵ = min θ ) + ϵ ⃗ θ ∈S θ ( out ) θ ( out ) Recall: ⃗ = ⃗ θ i − η · ⃗ ∇ f ( ⃗ θ i ) and ⃗ θ i + 1 = P S ( ⃗ i + 1 ) . i + 1 ∥ ⃗ 2 −∥ ⃗ θ ( out ) i + 1 − ⃗ θ i − θ ∗ ∥ 2 θ ∗ ∥ 2 Step 1: For all i , f ( ⃗ θ i ) − f ( ⃗ + η G 2 θ ∗ ) ≤ 2 η 2 . θ ∗ ) ≤ ∥ ⃗ θ i − ⃗ 2 −∥ ⃗ θ i + 1 − ⃗ θ ∗ ∥ 2 θ ∗ ∥ 2 Step 1.a: For all i , f ( ⃗ θ i ) − f ( ⃗ + η G 2 2 η 2 . i = 1 f ( ⃗ θ i ) − f ( ⃗ 2 η · t + η G 2 ∑ t θ ∗ ) ≤ = ⇒ Theorem. Step 2: 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend