convex optimization
play

Convex Optimization ( EE227A: UC Berkeley ) Lecture 25 (Newton, - PowerPoint PPT Presentation

Convex Optimization ( EE227A: UC Berkeley ) Lecture 25 (Newton, quasi-Newton) 23 Apr, 2013 Suvrit Sra Admin Project poster presentations: Soda 306 HP Auditorium Fri May 10, 2013 4pm 8pm HW5 due on May 02, 2013 Will be


  1. Convex Optimization ( EE227A: UC Berkeley ) Lecture 25 (Newton, quasi-Newton) 23 Apr, 2013 ◦ Suvrit Sra

  2. Admin ♠ Project poster presentations: Soda 306 HP Auditorium Fri May 10, 2013 4pm – 8pm ♠ HW5 due on May 02, 2013 Will be released today. 2 / 25

  3. Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . 3 / 25

  4. Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . 3 / 25

  5. Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) 3 / 25

  6. Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) g ( x + ∆ x ) = g ( x ) + g ′ ( x )∆ x + o ( | ∆ x | ) . 3 / 25

  7. Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) g ( x + ∆ x ) = g ( x ) + g ′ ( x )∆ x + o ( | ∆ x | ) . ◮ Equation g ( x + ∆ x ) = 0 approximated by g ( x ) + g ′ ( x )∆ x = 0 ⇒ ∆ x = − g ( x ) /g ′ ( x ) . = 3 / 25

  8. Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) g ( x + ∆ x ) = g ( x ) + g ′ ( x )∆ x + o ( | ∆ x | ) . ◮ Equation g ( x + ∆ x ) = 0 approximated by g ( x ) + g ′ ( x )∆ x = 0 ⇒ ∆ x = − g ( x ) /g ′ ( x ) . = ◮ If x is close to x ∗ , we can expect ∆ x ≈ ∆ x ∗ = x ∗ − x 3 / 25

  9. Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) g ( x + ∆ x ) = g ( x ) + g ′ ( x )∆ x + o ( | ∆ x | ) . ◮ Equation g ( x + ∆ x ) = 0 approximated by g ( x ) + g ′ ( x )∆ x = 0 ⇒ ∆ x = − g ( x ) /g ′ ( x ) . = ◮ If x is close to x ∗ , we can expect ∆ x ≈ ∆ x ∗ = x ∗ − x ◮ Thus, we may write x ∗ ≈ x − g ( x ) g ′ ( x ) 3 / 25

  10. Newton method ◮ Recall numerical analysis: Newton method for solving equations g ( x ) = 0 x ∈ R . ◮ Key idea: linear approximation . ◮ Suppose we are at some x close to x ∗ (the root ) g ( x + ∆ x ) = g ( x ) + g ′ ( x )∆ x + o ( | ∆ x | ) . ◮ Equation g ( x + ∆ x ) = 0 approximated by g ( x ) + g ′ ( x )∆ x = 0 ⇒ ∆ x = − g ( x ) /g ′ ( x ) . = ◮ If x is close to x ∗ , we can expect ∆ x ≈ ∆ x ∗ = x ∗ − x ◮ Thus, we may write x ∗ ≈ x − g ( x ) g ′ ( x ) ◮ Which suggests the iterative process x k +1 ← x k − g ( x k ) g ′ ( x k ) 3 / 25

  11. Newton method ◮ Suppose we have a system of nonlinear equations G : R n → R n . G ( x ) = 0 4 / 25

  12. Newton method ◮ Suppose we have a system of nonlinear equations G : R n → R n . G ( x ) = 0 ◮ Again, arguing as above we arrive at the Newton system G ( x ) + G ′ ( x )∆ x = 0 , where G ′ ( x ) is the Jacobian . 4 / 25

  13. Newton method ◮ Suppose we have a system of nonlinear equations G : R n → R n . G ( x ) = 0 ◮ Again, arguing as above we arrive at the Newton system G ( x ) + G ′ ( x )∆ x = 0 , where G ′ ( x ) is the Jacobian . ◮ Assume G ′ ( x ) is non-degenerate (invertible), we obtain x k +1 = x k − [ G ′ ( x k )] − 1 G ( x k ) . 4 / 25

  14. Newton method ◮ Suppose we have a system of nonlinear equations G : R n → R n . G ( x ) = 0 ◮ Again, arguing as above we arrive at the Newton system G ( x ) + G ′ ( x )∆ x = 0 , where G ′ ( x ) is the Jacobian . ◮ Assume G ′ ( x ) is non-degenerate (invertible), we obtain x k +1 = x k − [ G ′ ( x k )] − 1 G ( x k ) . ◮ This is Newton’s method for solving nonlinear equations 4 / 25

  15. Newton method f ( x ) such that x ∈ R n min 5 / 25

  16. Newton method f ( x ) such that x ∈ R n min ∇ f ( x ) = 0 is necessary for optimality 5 / 25

  17. Newton method f ( x ) such that x ∈ R n min ∇ f ( x ) = 0 is necessary for optimality Newton system ∇ f ( x ) + ∇ 2 f ( x )∆ x = 0 , which leads to x k +1 = x k − [ ∇ 2 f ( x k )] − 1 ∇ f ( x k ) . the Newton method for optimization 5 / 25

  18. Newton method – remarks ◮ Newton method for equations is more general than minimizing f ( x ) by finding roots of ∇ f ( x ) = 0 6 / 25

  19. Newton method – remarks ◮ Newton method for equations is more general than minimizing f ( x ) by finding roots of ∇ f ( x ) = 0 ◮ Reason: Not every function G : R n → R n is a derivative! Example Consider the linear system Ax − b = 0 . Unless A is symmetric, does not correspond to a derivative (Why?) 6 / 25

  20. Newton method – remarks ◮ Newton method for equations is more general than minimizing f ( x ) by finding roots of ∇ f ( x ) = 0 ◮ Reason: Not every function G : R n → R n is a derivative! Example Consider the linear system Ax − b = 0 . Unless A is symmetric, does not correspond to a derivative (Why?) ◮ If it were a derivative, then its own derivative is a Hessian, and we know that Hessians must be symmetric, QED. 6 / 25

  21. Newton method – remarks ◮ In general, Newton method highly nontrivial to analyze Example Consider the iteration x k +1 = x k − 1 x k , x 0 = 2 . May be viewed as iter for e x 2 / 2 = 0 (which has no real solution ) 7 / 25

  22. Newton method – remarks ◮ In general, Newton method highly nontrivial to analyze Example Consider the iteration x k +1 = x k − 1 x k , x 0 = 2 . May be viewed as iter for e x 2 / 2 = 0 (which has no real solution ) Unknown whether this iteration generates a bounded sequence! 7 / 25

  23. Newton method – remarks ◮ In general, Newton method highly nontrivial to analyze Example Consider the iteration x k +1 = x k − 1 x k , x 0 = 2 . May be viewed as iter for e x 2 / 2 = 0 (which has no real solution ) Unknown whether this iteration generates a bounded sequence! Newton fractals (Complex dynamics) z 3 − 2 z + 2 x 8 + 15 x 4 − 16 7 / 25

  24. Newton method – alternative view Quadratic approximation 2 �∇ 2 f ( x k )( x − x k ) , x − x k � . φ ( x ) := f ( x ) + �∇ f ( x k ) , x − x k � + 1 8 / 25

  25. Newton method – alternative view Quadratic approximation 2 �∇ 2 f ( x k )( x − x k ) , x − x k � . φ ( x ) := f ( x ) + �∇ f ( x k ) , x − x k � + 1 Assuming ∇ 2 f ( x k ) ≻ 0 , choose x k +1 as argmin of φ ( x ) 8 / 25

  26. Newton method – alternative view Quadratic approximation 2 �∇ 2 f ( x k )( x − x k ) , x − x k � . φ ( x ) := f ( x ) + �∇ f ( x k ) , x − x k � + 1 Assuming ∇ 2 f ( x k ) ≻ 0 , choose x k +1 as argmin of φ ( x ) φ ′ ( x k +1 ) = ∇ f ( x k ) + ∇ 2 f ( x k )( x k +1 − x k ) = 0 . 8 / 25

  27. Newton method – convergence ◮ Method breaks down if ∇ 2 f ( x k ) �≻ 0 ◮ Only locally convergent Example Find the root of x g ( x ) = √ 1 + x 2 . Clearly, x ∗ = 0 . 9 / 25

  28. Newton method – convergence ◮ Method breaks down if ∇ 2 f ( x k ) �≻ 0 ◮ Only locally convergent Example Find the root of x g ( x ) = √ 1 + x 2 . Clearly, x ∗ = 0 . Exercise: Analyze behavior of Newton method for this problem. Hint: Consider the cases: | x 0 | < 1 , x 0 = ± 1 and | x 0 | > 1 . 9 / 25

  29. Newton method – convergence ◮ Method breaks down if ∇ 2 f ( x k ) �≻ 0 ◮ Only locally convergent Example Find the root of x g ( x ) = √ 1 + x 2 . Clearly, x ∗ = 0 . Exercise: Analyze behavior of Newton method for this problem. Hint: Consider the cases: | x 0 | < 1 , x 0 = ± 1 and | x 0 | > 1 . Damped Newton method x k +1 = x k − α k [ ∇ 2 f ( x k )] − 1 ∇ f ( x k ) 9 / 25

  30. Newton – local convergence rate ◮ Suppose method generates sequence { x k } → x ∗ 10 / 25

  31. Newton – local convergence rate ◮ Suppose method generates sequence { x k } → x ∗ ◮ where x ∗ is a local min, i.e., ∇ f ( x ∗ ) = 0 and ∇ 2 f ( x ∗ ) ≻ 0 10 / 25

  32. Newton – local convergence rate ◮ Suppose method generates sequence { x k } → x ∗ ◮ where x ∗ is a local min, i.e., ∇ f ( x ∗ ) = 0 and ∇ 2 f ( x ∗ ) ≻ 0 ◮ Let g ( x k ) ≡ ∇ f ( x k ) ; Taylor’s theorem: 0 = g ( x ∗ ) = g ( x k ) + �∇ g ( x k ) , x ∗ − x k � + o ( � x k − x ∗ � ) 10 / 25

  33. Newton – local convergence rate ◮ Suppose method generates sequence { x k } → x ∗ ◮ where x ∗ is a local min, i.e., ∇ f ( x ∗ ) = 0 and ∇ 2 f ( x ∗ ) ≻ 0 ◮ Let g ( x k ) ≡ ∇ f ( x k ) ; Taylor’s theorem: 0 = g ( x ∗ ) = g ( x k ) + �∇ g ( x k ) , x ∗ − x k � + o ( � x k − x ∗ � ) ◮ Multiply by [ ∇ g ( x k )] − 1 to obtain x k − x ∗ − [ ∇ g ( x k )] − 1 g ( x k ) = o ( � x k − x ∗ � ) 10 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend