metti 5
play

Metti 5 Optimization for nonlinear parameter estimation and - PowerPoint PPT Presentation

Metti 5 Optimization for nonlinear parameter estimation and function estimation Lecture 7 Roscoff, June 13-18, 2011 Objectives Direct problem input BC Model State solution IC Parameters


  1. Metti 5 Optimization for nonlinear parameter estimation and function estimation Lecture 7 – Roscoff, June 13-18, 2011 –

  2. Objectives Direct problem ˛ input ˛ ˛ BC ˛ − → Model − → State solution ˛ IC ˛ ˛ Parameters ˛ Notations R ( u, ψ ) = 0 Model u State ψ Unknown Inverse problem From state measurements u d , find unknown ψ that minimizes j ( ψ ) := J ( u )

  3. Examples Thermal conductivity BC • λ = c te λ = P λ i ξ i ( T ) S ( u, ψ ) = 0 • λ ( u ≡ T ) ⇒ λ = P λ i ξ i ( x ) ψ ← λ ( x ) • λ ( x ) ⇒ Heat transfer coefficient BC • h = c te S ( u, ψ ) = 0 • h ( u ≡ T ) • h ( x ) ψ ← h ( x )

  4. Inverse problem From state measurements u d , find unknown ψ that minimizes j ( ψ ) := J ( u ) where R ( u, ψ ) = 0 : ψ �→ u Contents 1 n -D Optimization 2 Gradient computation 3 An example of heat transfer coefficient identification

  5. Non-linear optimization Direct methods of the kind of those seen in Lecture 2 usable • for linear estimation, • when dim ψ is “low” We need Specific algorithms • for non linear parameter estimation (iterations) • for function estimation, i.e. dim ψ is “high” Function → parameters X ψ i ξ i ( s ) ψ ← ψ ( s ) =

  6. Optimization We search ¯ ψ = arg ψ ∈K⊂V j ( ψ ) min Methods (quite a lot . . . ) n -D Optimization Methods Gradient Free With Gradient Deterministic Stochastic Order 1 Order 2 Order between 1 & 2 . . . . . . Simplex Steepest Conjugate gradients Levenberg PSO AG Newton DFP BFGS . . . . . . and much more than that ! for gradient free, see [ OnWubolu,G.C. and Babu,B.V., New optimization techniques in engineering, Springer, 2003 ]

  7. Gradient-type methods

  8. Gradient-type methods : Steepest method First iteration ∇ j ( ψ ) d

  9. Gradient-type methods : Steepest method Second iteration d 1 ∇ j ( ψ 0 ) ∇ j ( ψ 1 ) d 0

  10. Gradient-type methods : Steepest method Successive displacement : Orthogonality → zig-zag

  11. Gradient-type methods : Steepest method Algorithm 1 : Steepest descent while (Stopping criterion not satisfied) do (We are at the point ψ p , iteration p ) • compute the gradient ∇ j ( ψ ) • the descent direction, d p = −∇ j ( ψ p ) • Line-search : α> 0 g ( α ) = j ( ψ p + αd p ) Find ¯ α = arg min

  12. Stopping criterion �∇ j ( ψ p ) � 2 or ∞ ≤ ε ˛ j ( ψ p ) − j ( ψ p − 1 ) ˛ ≤ ε ˛ ˛ ψ p − ψ p − 1 ≤ ε j ( ψ p ) ≤ ε

  13. Gradient-type methods : Steepest method Successive displacement : Orthogonality → zig-zag Why such zig-zagging ? Step p Direction of descent : d p = −∇ j ( ψ p ) Line search : α> 0 g ( α ) = j ( ψ p + αd p ) Find ¯ α = arg min So : g ′ ( α p ) = 0 = ( d p , ∇ j ( ψ p + αd p )) ` d p , ∇ j ` ψ p +1 ´´ = So : d p , d p +1 ´ ` = 0

  14. Gradient-type methods Admissible directions ∇ j ( ψ ) ( ∇ j, d ) < 0

  15. Conjugate directions ℓ 1 ( α ) = x 1 + αp x 2 ¯ ¯ x 1 ℓ 2 ( α ) = x 2 + αp ⇒ The vector ¯ x 1 − ¯ x 2 is conjugate to the direction p

  16. Conjugate directions e 2 z ¯ x 0 e 1 x 1 ¯ ⇒ The vector z − ¯ x 1 is conjugate to the direction e 1

  17. Conjugate directions for n –D Algorithm Let the quadratic cost j ( ψ ) = 1 2 ( A ψ, ψ ) , First iteration d 0 = −∇ j ( ψ 0 ) Then, from gradient orthogonality : d 0 , ∇ j ψ 1 ´´ d 0 , A ψ 1 ´ ` ` ` = 0 = ψ 0 + α 0 d 0 ´´ ` d 0 , A ` = ` d 0 , A ψ 0 ´ + α 0 ` d 0 , A d 0 ´ = . So we have the step length : ` d 0 , A ψ 0 ´ α 0 = − ( d 0 , A d 0 ) .

  18. Conjugate directions Algorithm Step p The direction d p is chosen A − conjugate to d p − 1 : d p , A d p − 1 ´ −∇ j ( ψ p ) + β p d p − 1 , A d p − 1 ´ ` ` = ` ∇ j ( ψ p ) , A d p − 1 ´ + β p ` d p − 1 , A d p − 1 ´ = − = 0 So : ∇ j ( ψ p ) , A d p − 1 ´ ` β p = . ( d p − 1 , A d p − 1 )

  19. Conjugate directions Algorithm Algorithm 2 : The conjugate gradient algorithm applied on quadratic functions Let p = 0 , ψ 0 be the starting point, • Compute the gradient and the descent direction, d 0 = −∇ j ( ψ 0 ) , ` d 0 , A ψ 0 ´ • Compute the step size α 0 = − ( d 0 , A d 0 ) . while (Stopping criterion not satisfied) do At step p , we are at the point ψ p . We define ψ p +1 = ψ p + α p d p with : • the step size α p = − ( d p , ∇ j ( ψ p )) ( d p , A d p ) • the direction d p = −∇ j ( ψ p ) + β p d p − 1 ∇ j ( ψ p ) , A d p − 1 ´ ` • where the coefficient needed for conjugate directions : β p = ; ( d p − 1 , A d p − 1 )

  20. Gradient-type methods d 1 ∇ j ( ψ 0 ) d 0 ∇ j ( ψ 1 )

  21. Conjugate gradients for non-quadratic functions We use : ψ p − ψ p − 1 ´ ∇ j ( ψ p ) − ∇ j ( ψ p − 1 ) ` = A ψ p − 1 + α p − 1 d p − 1 − ψ p − 1 ´ ` = A = α p − 1 A d p − 1 , and combine with previously-seen relationships to get β p through the • Polak and Ribiere’s method : ∇ j ( ψ p ) , ∇ j ( ψ p ) − ∇ j ( ψ p − 1 ) ` ´ β p = , ( ∇ j ( ψ p − 1 ) , ∇ j ( ψ p − 1 )) • Fletcher and Reeves’ method : ( ∇ j ( ψ p ) , ∇ j ( ψ p )) β p = ( ∇ j ( ψ p − 1 ) , ∇ j ( ψ p − 1 )) .

  22. Conjugate gradients for non-quadratic functions Algorithm 3 : The conjugate gradient algorithm applied on arbitrary functions Let p = 0 , ψ 0 be the starting point, d 0 = −∇ j ( ψ 0 ) , perform the Line-search while (Stopping criterion not satisfied) do At step p , we are at the point ψ p ; we define ψ p +1 = ψ p + α p d p with : • the step size α p = arg min α ∈ R + g ( α ) = j ( ψ p + αd p ) with : • the direction d p = −∇ j ( ψ p ) + β p d p − 1 where • the conjugate condition parameter β p satisfies either • Polak and Ribiere’s method • Fletcher and Reeves’ method

  23. Newton Assume that j ( ψ ) is • twice continuously differentiable, • that second derivatives exist Approach j ( ψ ) by its quadratic approximation δψ p + O ( δψ p ) 2 , ∇ j ( ψ p +1 ) = ∇ j ( ψ p ) + ∇ 2 j ( ψ p ) ˆ ˜ so that δψ p = −∇ j ( ψ p ) ∇ 2 j ( ψ p ) ˆ ˜ with ψ p +1 = δψ p + ψ p Convergence rate Quadratically But • difficult to compute ∇ 2 j , expensive • convergence ensured only if ∇ 2 j is positive definite

  24. Quasi-Newton Newton ˜ − 1 ∇ j ( ψ p ) . ψ p +1 = ψ p − ∇ 2 j ( ψ p ) ˆ Idea : ˜ − 1 ← H p ∇ 2 j ( ψ p ) ˆ H p +1 = H p + Λ p Imposed condition = ψ p − ψ p − 1 ∇ j ( ψ p ) − ∇ j ( ψ p − 1 ) ˆ ˜ H Different methods for the correction Λ p

  25. Quasi-Newton Different methods for the correction Λ p • We set δ p = ψ p +1 − ψ p and γ p = ∇ j ( ψ p +1 ) − ∇ j ( ψ p ) → Davidon-Fletcher-Powell H p +1 = H p + δ p ( δ p ) t ( δ p ) t γ p − H p γ p ( γ p ) t H p ( γ p ) t H γ p ⇒ Broyden – Fletcher – Goldfarb – Shanno – δ p ( δ p ) t ( δ p ) t γ p − δ p γ pt H p + H p γ p δ pt 1 + γ pt H p γ p » H p +1 = H p + . δ pt γ p δ pt γ p Convergence rate Superlinear Remark BFGS is less sensitive than DFP to line-search inacuracy

  26. Test : Rosenbrock „ − 1 « „ 1 « Guess : , Optimum : 1 1 2500 2000 1500 1000 500 3.0 2.5 2.0 2.0 1.5 1.5 1.0 y 1.0 0.5 0.5 0.0 x 0.5 0.0 1.0 0.5 1.5

  27. PSO 2 1.5 1 0.5 0 -0.5 -1 -1 -0.5 0 0.5 1 1.5 2 ⊲⊲ http ://clerc.maurice.free.fr/pso/

  28. Steepest descent 2 1.5 1 0.5 0 -0.5 -1 -1 -0.5 0 0.5 1 1.5 2 ⊲⊲ GSL Library

  29. Conjugate Gradient 2 1.5 1 0.5 0 -0.5 -1 -1 -0.5 0 0.5 1 1.5 2 ⊲⊲ GSL Library

  30. BFGS 2 1.5 1 0.5 0 -0.5 -1 -1 -0.5 0 0.5 1 1.5 2 ⊲⊲ GSL Library

  31. Algos based on cost gradient Previously algo are only based on the cost gradient ∇ j ( ψ ) : • Steepest, • conjugate gradient, • DFP, BFGS There are others that also use the “sensitivity” of the state wrt parameters (cf Lecture 2) Cost of the kind ˆ ( u − u d ) 2 d s j ( ψ ) := J ( u ) = S

  32. Definition (Directional derivative) u ′ ( ψ ; δψ ) is the derivative of the state u ( ψ ) at the point ψ in the direction δψ : u ( ψ + ǫδψ ) − u ( ψ ) u ′ ( ψ ; δψ ) := lim ǫ ǫ → 0 then the directional derivative of the cost function writes : j ′ ( ψ ; δψ ) = J ′ ( u ) , u ′ ( ψ ; δψ ) ` ´ , where j ′ ( ψ ; δψ ) = ( ∇ j ( ψ ) , δψ )

  33. Gauss–Newton Second derivative the second derivative of j ( ψ ) at the point ψ in the directions δψ and δφ is given by : j ′′ ( ψ ; δψ, δφ ) = J ′ ( u ) , u ′′ ( ψ ; δψ, δφ ) J ′′ ( u ) , u ′ ( ψ ; δψ ) , u ′ ( ψ ; δφ ) ` ´ `` ´ ´ + . Neglecting the second-order term (this is actually the Gauss–Newton approach), we have : j ′′ ( ψ ; δψ, δφ ) ≈ J ′′ ( u ) , u ′ ( ψ ; δψ ) , u ′ ( ψ ; δφ ) `` ´ ´ . Gauss–Newton S t Sδψ k = −∇ j ( ψ k ) Matrix S t S usually badly conditionned

  34. Damp the system Levenberg–Marquardt δψ k = −∇ j ( ψ k ) S t S + ℓI ˆ ˜ or better : δψ k = −∇ j ( ψ k ) S t S + ℓ diag ( S t S ) ˆ ˜ Remark Note that ℓ → 0 yields the Gauss–Newton algorithm while ℓ bigger gives an approximation of the steepest descent gradient algorithm. In practice, the parameter ℓ may be adjusted at each iteration. Remark when dim ψ is high → prefer gradient-based methods

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend