outline
play

Outline Higher order is commonly used on convergence and on - PowerPoint PPT Presentation

Workshop AD Higher Order Workshop AD Higher Order Outline Higher order is commonly used on convergence and on derivatives in opti- Trust Region with a Cubic Model mization. First order methods are gradient based and have


  1. Workshop AD Higher Order Workshop AD Higher Order ✬ ✩ ✬ ✩ Outline Higher order is commonly used on convergence and on derivatives in opti- Trust Region with a Cubic Model mization. First order methods are gradient based and have Q-order 1 or Trond Steihaug Q-super-linear (for Quasi-Newton methods) rate of convergence. Second or- Department of Informatics der methods are using the Hessian and have Q-order 2 rate of convergence. University of Bergen, Norway Rate of convergence (Q-order) and the degree of the derivatives will not match and for ’difficult’ problems. Humboldt Universit¨ at zu Berlin • Regularization ⇒ Trust-region Subproblem (TRS) • Trust region Methods in Unconstrained Optimization → TRS • AD can give higher order Workshop on Automatic Differentiation, Nice April 15-15, 2005 • Higher Order TRS ✫ ✪ ✫ ✪ Slide 1 Slide 2 Workshop AD Higher Order Workshop AD Higher Order ✬ ✩ ✬ ✩ Singular Values σ i for Rank Deficient Problem Linear Least Squares (LLS) 2.5 R m where m ≥ n . Compute x ∈ I R n so that Given m × n matrix A and b ∈ I min 1 2 � Ax − b � 2 2 Let A = V Σ U T be the singular value decomposition and let singular values 1.5 Σ † = diag( 1 , . . . , 1 , 0 , . . . , 0) , r = rank( A ) . σ 1 σ r 1 Define A † = V Σ † U T . The solution x is r u T i b 0.5 x = A † b = � v i σ i i =1 0 where U = [ u 1 · · · u n ] and V = [ v 1 · · · v m ]. 0 5 10 15 20 25 30 i ✫ ✪ ✫ ✪ Slide 3 Slide 4

  2. Workshop AD Higher Order Workshop AD Higher Order ✬ ✩ ✬ ✩ Singular Values σ i for Discrete Ill-posed Problem Singular values for a Discrete Ill−Posed Problem. Problem: ill−cond. heat, n=50 Discrete Picard Condition 0.4 0.35 A, b come from discretization from an ill-posed problem. All σ i > 0 so for- mally 0.3 n u T i b x = A † b = � v i 0.25 σ i singular values i =1 0.2 However 0.15 u T i b ց 0 as i increases ( the discrete Picard condition .) σ i 0.1 Introduce noise in problem b = ˜ 0.05 b + ε . 0 0 5 10 15 20 25 30 35 40 45 50 i ✫ ✪ ✫ ✪ Slide 5 Slide 6 Workshop AD Higher Order Workshop AD Higher Order ✬ ✩ ✬ ✩ Coefficients u T i b σ i for exact data and noisy data One Solution to the Noisy Problem: Regularization Coefficients of right singular vectors in LS solution. Problem: deriv2,n=50 3 10 The following three problems are equivalent and make the ’noisy’ prob- * exact data o noisy data lem smooth 2 10 Given µ ≥ 0 solve min 1 2 � Ax − b � 2 2 + µ � x � 2 2 . coefficients of right singular vectors Given λ ≥ 0 solve ( A T A + λI ) x = A T b. 1 10 1 Given ∆ ≥ 0 solve min � x �≤ ∆ 2 � Ax − b � 2 . TRS 0 10 Equivalence from the Karush-Kuhn-Tucker conditions. (There exits open intervals for the three parameters µ, λ, ∆ so that x is the solution to all −1 three problems) 10 Where is AD? −2 10 0 5 10 15 20 25 30 35 40 45 50 ✫ ✪ i ✫ ✪ Slide 8 Slide 7

  3. Workshop AD Higher Order Workshop AD Higher Order ✬ ✩ ✬ ✩ Gauss - Newton and Nonlinear Least Squares Higher Order Model Function R n → I R m . Given a nonlinear function F : I Gauss-Newton is based on 1.order approximation of F at x , i.e. F ( x + s ) ≈ M 1 ( s ) = F ( x ) + F ′ ( x ) s and solve for the step s Inexact Gauss-Newton Method: R n � M ( s ) � 2 Given x 0 min 2 . s ∈ I while not converged do Finding approximate solution s i by constraining � s � ≤ ∆ leads to Levenberg Compute F ′ ( x i ) - Marquard methods. These are trust-region methods that use a linear model R n 1 2 � F ′ ( x i ) s + F ( x i ) � 2 Find approximate solution s i of min s ∈ I 2 M ( s ) = F ′ ( x i ) s + F ( x i ) at x i of F ( x i + s ) with approximate solution Update x i +1 = x i + s i end-while � M ( s ) � 2 min 2 . � s �≤ ∆ F ′ ( x ) is the m × n Jacobian matrix at x Use more accurate model Noise is inherit in the LLS problem! M 2 ( s ) = F ( x i ) + F ′ ( x i ) s + 1 2( T s ) s, T = F ′′ ( x i ) unless high accuracy of F and F ′ ✫ ✪ ✫ ✪ Slide 9 Slide 10 Workshop AD Higher Order Workshop AD Higher Order ✬ ✩ ✬ ✩ The Basic Trust Region Method Given x 0 and ∆ 0 (0 ≤ γ 2 < γ 1 < 1, 0 ≤ γ 4 ≤ γ 5 < 1 ≤ γ 3 ) Higher Order Model Function (2) while not converged do Let m ( s ) ≈ f ( x + s ) = F ( x + s ) T F ( x + s ) and solve Compute model m i ( s ). Compute approximate solution s i of TRS: � s �≤ ∆ m ( s ) min m i ( s ) . min � s �≤ ∆ actual Compute f ( x i + s i ), m i ( s i ) and ρ i = f ( x i ) − f ( x i + s i ) = where predicted f ( x i ) − m i ( s i ) m 2 ( s ) = f ( x ) + ∇ f ( x ) T s + 1 ⎧ 2 s T ∇ 2 f ( x ) s x i + s i if ρ ≥ γ 2 ⎨ Update x i +1 = x i otherwise m 3 ( s ) = f ( x ) + ∇ f ( x ) T s + 1 2 s T ∇ 2 f ( x ) s + 1 ⎩ 6 s T ( T s ) s, T = ∇ 3 f ( x ) Update ∆ i +1 : � s i � ≤ ∆ i +1 ≤ γ 3 � s i � if ρ i ≥ γ 1 γ 4 � s i � ≤ ∆ i +1 ≤ γ 5 � s i � if ρ i < γ 1 end-while ✫ ✪ ✫ ✪ Slide 11 Slide 12

  4. Workshop AD Higher Order Workshop AD Higher Order ✬ ✩ ✬ ✩ Properties m 1 ( s ) = f ( x ) + ∇ f ( x ) T s - linear model Exact Solution of TRS m i ( s ) , i = 1 , 2 , 3 2 s T ∇ 2 f ( x ) s - quadratic model m 2 ( s ) = m 1 ( s ) + 1 The trust region subproblem with m 1 6 s T ( T s ) s - cubic model. m 3 ( s ) = m 2 ( s ) + 1 � s �≤ ∆ f + g T s min Under ’reasonable’ conditions the basic trust region algorithm be globally convergent, i.e. for given ε > 0 and any x 0 there exists an index i so that gives the Step Constrained Cauchy point ˜ s �∇ f ( x i ) � ≤ ε . for the models. s = − ∆ Need to understand the Trust Region Subproblem (TRS) ˜ � g � g min m ( s ) . � s �≤ ∆ ✫ ✪ ✫ ✪ Slide 13 Slide 14 Workshop AD Higher Order Workshop AD Higher Order ✬ ✩ ✬ ✩ Exact Solution of TRS m 2 ( s ) H positive definite � s �≤ ∆ f + g T s + 1 2 s T Hs min 2 8 16 4 1.5 s is a solution with Lagrange multiplier δ if and only if 32 8 4 1 1 (i) ( H + δI ) s + g = 0 16 4 1 0.5 8 4 (ii) H + δI is positive semi definite 8 1 0 (iii) δ ≥ 0 and δ ( � s � − ∆) = 0. 4 16 −0.5 (Gay (1981) and Sorensen (1982)) 8 4 8 The solution is on the form s ( δ ) = − ( H + δI ) − 1 g provided H + δI pos.def. −1 8 6 32 1 and s ( δ ) = ∆ (i.e. small ∆ gives large δ ). For H + δI positive semi-definite −1.5 we have two cases: g is orthogonal to the null-space of H + δI and we have 16 16 2 3 −2 the so called ’hard-case’ and g not orthogonal in which case we have a smooth 0 0.5 1 1.5 2 2.5 3 3.5 4 ✫ ✪ ✫ ✪ solution. Slide 15 Slide 16

  5. Workshop AD Higher Order Workshop AD Higher Order ✬ ✩ ✬ ✩ H semi definite 2 Exact Solution of TRS for LLS 64 0 −4 4 8 1.5 32 16 � s �≤ ∆ f + g T s + 1 2 s T Hs min 1 8 14.023 0.5 Let S 1 = N ( H + λI ) ( N is the nullspace). We have the hard case when g ⊥ S 1 . For LLS recall that H = A T A and g = − A T b (so λ = 0 for the 0 32 hard case) −0.5 g T v k = − σ k b T u j , 1 ≤ j ≤ m k 64 −1 where u j , v j is associated with singular value σ k with multiplicity m k . 128 (Rojas-Sorensen (2002)) −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 ✫ ✪ ✫ ✪ Slide 17 Slide 18 Workshop AD Higher Order Workshop AD Higher Order ✬ ✩ ✬ ✩ The Hard Case is the Normal 2 10 A major Challenge: The Cubic Model 0 10 −2 10 −4 10 min m 3 ( s ) . � s �≤ ∆ −6 10 |g T v i | • We can characterize ( if and only if ) the (local) solution of TRS. −8 10 −10 10 • We can compute the local minimizers. In a way −12 10 • What do we know about the (global) solution path? In the general case −14 10 it bifurcates, stops and is not continuous −16 10 • The solution path we want consists of local and global solutions. −18 10 0 50 100 150 200 250 300 i Note that g T v j = 0 is the (exact) hard case and g T v j = − σ j u T j b . ✫ ✪ ✫ ✪ Slide 19 Slide 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend