in case you missed it who am i
play

In case you missed it - Who am I ? Name: S ebastien Gros - PowerPoint PPT Presentation

In case you missed it - Who am I ? Name: S ebastien Gros Nationality: Swiss Residence: G oteborg, Sweden Affiliation: Chalmers University of Technology Department: Signals & Systems Position: Assistant Professor Email:


  1. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  2. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  3. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  4. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  5. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  6. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  7. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step w w ← w + ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 This is a full-step Newton iteration 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  8. Core idea Goal : solve r ( w ) = 0... how ?!? 0.2 Algorithm: Newton method Input : w , tol w while � r ( w ) � ∞ ≥ tol do 0 Compute r ( w ) r ( w ) and ∇ r ( w ) -0.2 Compute the Newton direction ∇ r ( w ) T ∆ w = − r ( w ) -0.4 Newton step, t ∈ ]0 , 1] w w ← w + t ∆ w Key idea : guess w , iterate the linear model: return w r ( w + ∆ w ) ≈ r ( w ) + ∇ r ( w ) ⊤ ∆ w = 0 This is a full-step Newton iteration Reduced steps are often needed 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 8 / 32

  9. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 0.5 r ( w ) 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  10. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) 0 w -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  11. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  12. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) 0 w -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  13. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  14. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) 0 w -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  15. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  16. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 1 0.5 r ( w ) w 0 -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  17. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) 0 w -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  18. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) w 0 -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  19. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) 0 w -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  20. Why reduced steps ? Newton step with t ∈ ]0 , 1]: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + t ∆ w 1.5 1 t = 0.8 0.5 r ( w ) 0 w -0.5 -1 -1.5 w The full-step Newton iteration can be unstable !! While the reduced-steps Newton iteration is stable... 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 9 / 32

  21. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  22. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  23. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  24. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  25. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. I.e. 2 r ( w ) T d d t r ( w + t ∆ w ) t =0 < 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  26. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. I.e. 2 r ( w ) T d d t r ( w + t ∆ w ) t =0 < 0 We have d d t r ( w + t ∆ w ) t =0 = ∇ r ( w ) T ∆ w = −∇ r ( w ) T ∇ r ( w ) − T r ( w ) = − r ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  27. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but Proof : � r ( w + t ∆ w ) � < � r ( w ) � holds for some t > 0 if � d � d t � r ( w + t ∆ w ) � 2 < 0 � � t =0 with � r ( w ) � 2 differentiable. I.e. 2 r ( w ) T d d t r ( w + t ∆ w ) t =0 < 0 We have d d t r ( w + t ∆ w ) t =0 = ∇ r ( w ) T ∆ w = −∇ r ( w ) T ∇ r ( w ) − T r ( w ) = − r ( w ) Then � d � = − 2 � r ( w ) � 2 < 0 d t � r ( w + t ∆ w ) � 2 � � t =0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  28. Does Newton always work ? Is the Newton step ∆ w always providing a direction ”improving” r ( w ) ? I.e. is there always a t > 0 s.t. � r ( w + t ∆ w ) � < � r ( w ) � is true ? Yes... but How to select the step size t ∈ ]0 , 1] ? Globalization... Line-search : reduce t until some criteria of progression on � r � are met Trust region : confine the step ∆ w within a region where ∇ r ( w ) provides a good model of r ( w ) Filter techniques : monitor progress on specific components of r ( w ) separately ... ... ensures that progress is made in one way or another . Note: most of these techniques are specific to optimization. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 10 / 32

  29. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  30. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  31. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  32. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  33. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  34. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  35. But still, Newton can fail... Solve r ( w ) = 0 0.6 0.4 0.2 r(w) 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  36. But still, Newton can fail... Solve r ( w ) = 0 Newton stops with r ( w ) � = 0 and ∇ r ( w ) singular 0.6 i.e. the Newton direction ∆ w given by 0.4 ∇ r ( w ) ⊤ ∆ w = − r ( w ) 0.2 r(w) is undefined... 0 w -0.2 -0.4 w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  37. But still, Newton can fail... Solve r ( w ) = 0 Newton stops with r ( w ) � = 0 and ∇ r ( w ) singular 0.6 i.e. the Newton direction ∆ w given by 0.4 ∇ r ( w ) ⊤ ∆ w = − r ( w ) 0.2 r(w) is undefined... 0 w -0.2 -0.4 w This is a common failure mode for Newton-based solvers when tackling very non-linear r and starting with a poor initial guess !! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 11 / 32

  38. Convergence of full-step Newton methods Newton method: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + ∆ w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  39. Convergence of full-step Newton methods Newton method: ∇ r ( w ) ⊤ ∆ w = − r ( w ) w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  40. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  41. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  42. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  43. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � ≤ ω � w − w ∗ � , for � Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � � M − 1 � Jacobian approximation error: � ≤ κ k < 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  44. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � M − 1 � � Jacobian approximation error: � ≤ κ k < 1 Good initial guess � w 0 − w ∗ � ≤ 2 ω (1 − max { κ k } ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  45. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � � M − 1 � Jacobian approximation error: � ≤ κ k < 1 Good initial guess � w 0 − w ∗ � ≤ 2 ω (1 − max { κ k } ) Then w k → w ∗ with the following linear-quadratic contraction in each iteration: � � κ k + ω � w k +1 − w ∗ � 2 � w k − w ∗ � � w k − w ∗ � . ≤ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  46. Convergence of full-step Newton methods Newton method: Newton-type method (Jacobian approx.) ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + ∆ w w ← w + ∆ w Yields the iteration k = 0 , 1 , .... : Yields the iteration k = 0 , 1 , .... : w k +1 ← w k − ∇ r ( w k ) −⊤ r ( w k ) w k +1 ← w k − M − 1 k r ( w k ) Theorem : assume � � ∇ r ( w ) T − ∇ r ( w ∗ ) T �� � � M − 1 � � ≤ ω � w − w ∗ � , for Nonlinearity of r : k w ∈ [ w k , w ⋆ ] � � k ( ∇ r ( w k ) T − M k ) � � M − 1 � Jacobian approximation error: � ≤ κ k < 1 Good initial guess � w 0 − w ∗ � ≤ 2 ω (1 − max { κ k } ) Then w k → w ∗ with the following linear-quadratic contraction in each iteration: � � κ k + ω � w k +1 − w ∗ � 2 � w k − w ∗ � � w k − w ∗ � . ≤ What about reduced steps ? Slow convergence when t < 1 (damped phase). When full steps become feasible, fast convergence to the solution. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 12 / 32

  47. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  48. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  49. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  50. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  51. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution Inexact full ( t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good” 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  52. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution Inexact full ( t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good” Newton iteration fails if ∇ r becomes singular 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  53. Newton methods - Short Survival Guide Exact Newton method: Newton-type method ∇ r ( w ) ⊤ ∆ w = − r ( w ) M ∆ w = − r ( w ) w ← w + t ∆ w w ← w + t ∆ w Exact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] Inexact Newton direction ∆ w improves r for a sufficiently small step size t ∈ ]0 , 1] if M > 0 Exact full ( t = 1) Newton steps converge quadratically if close enough to the solution Inexact full ( t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good” Newton iteration fails if ∇ r becomes singular Newton methods with globalization converge in two phases: damped (slow) phase when reduced steps ( t < 1) are needed, quadratic/ linear when full steps are possible. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 13 / 32

  54. Outline 1 KKT conditions - Quick Reminder The Newton method 2 Newton on the KKT conditions 3 Sequential Quadratic Programming 4 Hessian approximation 5 Maratos effect 6 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 14 / 32

  55. Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32

  56. Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) Let’s consider for now equality constrained problems, i.e. find w , λ s.t.: ∇ w L ( w , λ ) = 0 g ( w ) = 0 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32

  57. Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) Let’s consider for now equality constrained problems, i.e. find w , λ s.t.: ∇ w L ( w , λ ) = 0 g ( w ) = 0 Idea: apply the Newton method on the KKT conditions, i.e. Solve... � ∇ w L ( w , λ ) � r ( w , λ ) = = 0 g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32

  58. Core idea A vast majority of solvers try to find a KKT point w , µ , λ i.e: g ( w ) = 0 , h ( w ) ≤ 0 , Primal Feasibility: ∇ w L ( w , µ , λ ) = 0 , µ ≥ 0 , Dual Feasibility: Complementarity Slackness: µ i h i ( w ) = 0 , i = 1 , ... where L = Φ ( w ) + λ ⊤ g ( w ) + µ ⊤ h ( w ) Let’s consider for now equality constrained problems, i.e. find w , λ s.t.: ∇ w L ( w , λ ) = 0 g ( w ) = 0 Idea: apply the Newton method on the KKT conditions, i.e. Solve... ... by iterating � ∇ w L ( w , λ ) � ∆ w � � r ( w , λ ) = = 0 ∇ r ( w , λ ) T = − r ( w , λ ) g ( w ) ∆ λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 15 / 32

  59. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  60. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: ∇ 2 w L ( w , λ )∆ w + ∇ w , λ L ( w , λ )∆ λ = −∇ w L ( w , λ ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  61. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ w , λ L ( w , λ )∆ λ = −∇ w L ( w , λ ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  62. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w )∆ λ = −∇ w L ( w , λ ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  63. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w )∆ λ = −∇ Φ( w ) − ∇ g ( w ) λ ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  64. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  65. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � ∇ 2 � ∇ Φ( w ) � � � � w L ( w , λ ) ∇ g ( w ) ∆ w = − ∇ g ( w ) T 0 λ + ∆ λ g ( w ) � �� � KKT matrix (symmetric indefinite) 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  66. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∇ Φ( w ) � � � � ∇ g ( w ) ∆ w = − ∇ g ( w ) T 0 λ + ∆ λ g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  67. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∆ w � ∇ Φ( w ) � � � ∇ g ( w ) = − ∇ g ( w ) T λ + 0 g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. Note: update of the dual variable is λ + = λ + ∆ λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  68. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∆ w � ∇ Φ( w ) � � � ∇ g ( w ) = − ∇ g ( w ) T λ + 0 g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. Note: update of the dual variable is λ + = λ + ∆ λ ∇ w L ( w , λ ) is not needed for computing the Newton step 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  69. Newton method on the KKT conditions KKT conditions Newton direction � ∇ w L ( w , λ ) � ∆ w � � ∇ r ( w , λ ) T r ( w , λ ) = = 0 = − r ( w , λ ) g ( w ) ∆ λ Given by: using ∇ w L ( w , λ ) = ∇ Φ( w ) + ∇ g ( w ) λ ∇ 2 w L ( w , λ )∆ w + ∇ g ( w ) ( λ + ∆ λ ) = −∇ Φ( w ) ∇ g ( w ) T ∆ w = − g ( w ) The Newton direction on the KKT conditions � H ( w , λ ) � ∆ w � ∇ Φ( w ) � � � ∇ g ( w ) = − ∇ g ( w ) T λ + 0 g ( w ) � �� � KKT matrix (symmetric indefinite) where H ( w , λ ) = ∇ 2 w L ( w , λ ) is the Hessian of the problem. Note: update of the dual variable is λ + = λ + ∆ λ ∇ w L ( w , λ ) is not needed for computing the Newton step The updated dual variables λ + are readily provided ! 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 16 / 32

  70. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T min 1 4 0 w s.t. g ( w ) = w T w − 1 = 0 2 1.5 1 0.5 w 1 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  71. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g 2 1.5 1 0.5 w 1 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  72. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 0.5 w 1 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  73. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 L ( w , λ ) = Φ ( w ) + λ g ( w ) � 2 � 1 0.5 � � 1 w 1 0 ∇ w L ( w , λ ) = w + + 2 λ w 1 4 0 -0.5 -1 -1.5 -2 -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  74. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 L ( w , λ ) = Φ ( w ) + λ g ( w ) � 2 � 1 0.5 � � 1 w 1 0 ∇ w L ( w , λ ) = w + + 2 λ w 1 4 0 -0.5 -1 � 2 + 2 λ -1.5 � 1 H ( w , λ ) = -2 1 4 + 2 λ -2 -1 0 1 2 w 1 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  75. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T Iterate: min 1 4 0 w � � ∆ w � ∇ Φ � � � H ∇ g s.t. g ( w ) = w T w − 1 = 0 = − ∇ g T λ + 0 g � 2 w 1 with: � 2 ∇ g ( w ) = 2 w = 2 w 2 1.5 1 L ( w , λ ) = Φ ( w ) + λ g ( w ) � 2 � 1 0.5 � � 1 w 1 0 ∇ w L ( w , λ ) = w + + 2 λ w 1 4 0 -0.5 -1 � 2 + 2 λ -1.5 � 1 H ( w , λ ) = -2 1 4 + 2 λ -2 -1 0 1 2 w 1 � � 2 w 1 + w 2 + 1 ∇ Φ ( w ) = w 1 + 4 w 2 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  76. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T min Algorithm: Newton method 1 4 0 w Input : guess w , λ s.t. g ( w ) = w T w − 1 = 0 while �∇L� or � g � ≥ tol do Guess λ = 0, step t = 1 Compute 2 H ( w , λ ) , ∇ g ( w ) , ∇ Φ ( w ) , g ( w ) 1.5 1 Compute Newton direction 0.5 � � ∆ w � ∇ Φ � � � ∇ g H w 1 0 = − ∇ g T λ + 0 g -0.5 -1 ∆ λ = λ + − λ -1.5 -2 Compute Newton step, t ∈ ]0 , 1] -2 -1 0 1 2 w 1 w ← w + t ∆ w , λ ← λ + t ∆ λ return w , λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

  77. Newton Iteration for Optimization - Example � 2 � 1 � � 1 1 2 w T w + w T min Algorithm: Newton method 1 4 0 w Input : guess w , λ s.t. g ( w ) = w T w − 1 = 0 while �∇L� or � g � ≥ tol do Guess λ = 0, step t = 1 Compute 2 H ( w , λ ) , ∇ g ( w ) , ∇ Φ ( w ) , g ( w ) 1.5 1 Compute Newton direction 0.5 � � ∆ w � ∇ Φ � � � ∇ g H w 1 0 = − ∇ g T λ + 0 g -0.5 -1 ∆ λ = λ + − λ -1.5 -2 Compute Newton step, t ∈ ]0 , 1] -2 -1 0 1 2 w 1 w ← w + t ∆ w , λ ← λ + t ∆ λ return w , λ 17 th of February, 2016 S. Gros Optimal Control with DAEs, lecture 5 17 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend