generalized derivatives
play

Generalized Derivatives Automatic Evaluation & Implications for - PowerPoint PPT Presentation

Generalized Derivatives Automatic Evaluation & Implications for Algorithms Paul I. Barton, Kamil A. Khan & Harry A. J. Watson Process Systems Engineering Laboratory Massachusetts Institute of Technology Nonsmooth Equation Solving


  1. Generalized Derivatives Automatic Evaluation & Implications for Algorithms Paul I. Barton, Kamil A. Khan & Harry A. J. Watson Process Systems Engineering Laboratory Massachusetts Institute of Technology

  2. Nonsmooth Equation Solving ◆ Semismooth Newton method: G ( x k )( x − x k ) = − f ( x k ) ◆ Linear programming (LP) Newton method: γ , x γ min s.t. f ( x k ) + G ( x k )( x − x k ) ∞ ≤ γ f ( x k ) 2 ∞ ( x − x k ) ∞ ≤ γ f ( x k ) ∞ x ∈ X Polyhedral set G ( x k ) ◆ some element of a generalized derivative Kojima & Shindo (1986), Qi & Sun (1993), Facchinei, Fischer & Herrich (2014) . 2

  3. Generalized Derivatives f ◆ Suppose locally Lipschitz => differentiable on a set S ◆ B-subdifferential: ∂ B f ( x ): = { H : H = lim i →∞ Jf ( x ( i ) ), x = lim i →∞ x ( i ) , x ( i ) ∈ S } ◆ Clarke Jacobian: ∂ f ( x ): = conv ∂ B f ( x ) f ( x ) = x ∂ f ( x ) = {1} ∂ f ( x ) = { − 1} x ∂ B f ( x ) = { − 1,1}, ∂ f ( x ) = [ − 1,1] ◆ Useful properties of : ∂ f ( x ) Ø Nonempty, convex, and compact Ø Satisfies mean-value theorem, implicit/inverse function theorems Ø Reduces to subdifferential/derivative when is convex/strictly f differentiable Clarke (1973) . 3

  4. Convergence Properties ◆ Suppose generalized derivative contains no singular matrices at the solution ◆ Semismooth Newton method: G ( x k ) ∈∂ f ( x k ) Ø local Q-superlinear convergence if Ø local Q-quadratic convergence if strongly semismooth ◆ Semismooth Newton & LP-Newton methods for PC 1 or strongly semismooth functions: G ( x k ) ∈∂ B f ( x k ) Ø local Q-quadratic convergence if ◆ Automatic/Algorithmic Differentiation (AD) Ø Automatic methods for computing derivatives in complex settings Ø Automatic method for computing elements of generalized derivatives? Ø Computationally relevant generalized derivatives 4

  5. All generalized derivatives are equal… But, some are more equal than others. 5

  6. Obstacles to Automatic Gen. Derivative Evaluation 1 ◆ Automatically evaluating Clarke Jacobian elements is difficult ◆ Lack of sharp calculus rules: g ( x ) = max{0, x } h ( x ) = min{0, x } f ( x ) = g ( x ) + h ( x ) x x x (0 + 0) ∉∂ f (0) = {1} 0 ∈ ∂ h (0) = [0,1] 0 ∈ ∂ g (0) = [0,1] ∂ f (0) ⊂ ∂ g (0) + ∂ h (0) 6

  7. Directional Derivatives & PC 1 Functions ◆ Directional derivative: f ( x + t d ) − f ( x ) f '( x ; d ) = lim t t → 0 + ◆ Sharp chain rule for locally Lipschitz functions: [ f ! g ]'( x ; d ) = f '( g ( x ); g '( x ; d )) ◆ AD gives the directional derivative ◆ PC 1 functions: finite collection of C 1 functions for which { } , ∀ y ∈ N ( x ) f ( y ) ∈ φ ( y ): φ ∈ F f ( x ) ◆ 2-norm not PC 1 Griewank (1994), Scholtes (2012) . 7

  8. Obstacles 2 ◆ PC 1 functions have piecewise linear directional derivative d 2 ′ f (x ; d) = B ( 1 ) d f (x ; d) = B ( 2 ) d ′ d 1 ′ f (x ; d) = B ( 3 ) d 8

  9. Obstacles 2 ◆ PC 1 functions have piecewise linear directional derivative d 2 ′ f (x ; d) = B ( 1 ) d f (x ; d) = B ( 2 ) d ′ d 1 ′ f (x ; d) = B ( 3 ) d ◆ Directional derivatives in the coordinate directions do not necessarily give B- subdifferential elements ◆ Also defeats finite differences 9

  10. Obstacles 3 ∏ m ∂ f(x) ∂ f i (x) ◆ may be a strict subset of i = 1 ⎧ ⎫ ⎡ ⎤ 2 s − 1 1 ⎡ ⎤ x 1 + | x 2 | ∂ f(0) = ⎥ : s ∈ 0 , 1 ⎡ ⎤ ⎨ ⎬ ⎢ ⎣ ⎦ f :( x 1 , x 2 ) ! ⎢ ⎥ 1 − 2 s 1 ⎪ ⎪ ⎣ ⎦ x 1 − | x 2 | ⎩ ⎭ ⎣ ⎦ ⎧ ⎫ 2 s 1 − 1 ⎤ ⎡ ⎪ ⎪ 1 2 ⎡ ⎤ ∂ f 1 (0) × ∂ f 2 (0) = ⎥ :( s 1 , s 2 ) ∈ 0 , 1 ⎨ ⎬ ⎢ ⎣ ⎦ 2 s 2 − 1 1 ⎣ ⎪ ⎪ ⎦ ⎩ ⎭ π 2 ∂ f(0) π 2 ( ∂ f 1 (0) × ∂ f 2 (0)) 10

  11. L-smooth Functions f : X ∈ R n → R m ◆ The following functions are L-smooth: Ø Continuously differentiable functions Ø Convex functions (e.g. abs, 2-norm) Ø PC 1 functions x ! h ( g ( x )) Ø Compositions of L-smooth functions: Ø Integrals of L-smooth functions: b ∫ x ! g ( t , x ) dt a Ø Solutions of ODEs with L-smooth right-hand sides: c ! x ( b , c ), where d x dt ( t , c ) = g ( t , x ( t , c )), x (0, c ) = c Nesterov (1987), Khan and Barton (2014), Khan and Barton (2015). 11

  12. Lexicographic Derivatives L-subdifferential: ◆ ∂ L f ( x ) = { J L f ( x ; M ):det M ≠ 0} J L f ( x ; M ), det M ≠ 0 Ø Contains L-derivatives in directions M : Useful properties: ◆ Ø L-derivatives classical derivative wherever strictly differentiable Ø L-derivatives elements of Clarke gradient Ø Contains only subgradients when f convex Ø Contained in plenary hull of Clarke Jacobian, and can be used in place of Clarke Jacobian in numerical methods: { Ad : A ∈∂ L f ( x )} ⊂ { Ad : A ∈∂ f ( x )} for each d ∈ R n Ø For PC 1 functions, L-derivatives elements of B-subdifferential Ø Satisfies sharp chain rule, expressed naturally using LD-derivatives Nesterov (1987), Khan and Barton (2014), Khan and Barton (2015). 12

  13. Lexicographic Directional (LD)-Derivatives ◆ Extension of classical directional derivative M : = [ m (1) ! m ( p ) ] ∈ R n × p , ◆ LD-derivative: for any (0) ( m (1) ) ! f x , M ( p − 1) ( m ( p ) )] f '( x ; M ) = [ f x , M ◆ If M is square and nonsingular: f '( x ; M ) = J L f ( x ; M ) M ◆ If f is differentiable at x : f '( x ; M ) = Jf ( x ) M ◆ Sharp LD-derivative chain rule: [ f ! g ]'( x ; M ) = f '( g ( x ); g '( x ; M )) Khan and Barton (2015). 13

  14. Vector Forward AD Mode for LD-derivatives ◆ Sharp chain rule immediately implies, given the “seed directions” M , forward-mode AD can compute: f '( x ; M ) ◆ Need calculus rules for “elementary functions”: ⋅ 2 Ø abs, min, max, mid, , etc. Ø algorithm for “elemental PC 1 functions” Ø linear programs and lexicographic linear programs parameterized by their RHSs Ø implicit function: h ( w ( z ), z ) = 0 is the unique solution N of w '(ˆ z ; M ) ( ) = 0 h ' (ˆ y ,ˆ z );( N , M ) Khan and Barton (2015), Khan and Barton (2013), Hoeffner et al. (2015). 14

  15. Semismooth Inexact Newton Method i = 1,2, … J ( x ) d i , ◆ Inexact Newton method: ◆ Solve iteratively: J L f ( x ; M ) Δ x = − f ( x ) ◆ But, directional derivative not a linear function of the directions… ⎡ ⎤ M = d 1 , d 2 , … ◆ Let , M nonsingular. Then: ⎣ ⎦ f '( x ; M ) = J L f ( x ; M ) M ◆ But, M not known in advance f '( x ; M ) ◆ Compute columns of one at time Ø computation of a column affects subsequent columns Ø automatic code can be “locked” to record influence of earlier columns ◆ Local Q-superlinear & Q-quadratic convergence rates can be achieved 15

  16. Approximation of LD-derivatives using FDs M : = [ m (1) ! m ( p ) ] ∈ R n × p LD-derivative: ◆ (0) ( m (1) ) ! f x , M ( p − 1) ( m ( p ) )] f '( x ; M ) = [ f x , M FD approx. of using p+1 function evaluations: f '( x ; M ) ◆ (0) ( m (1) ) ≈ α − 1 [ f ( x + α m (1) ) − f ( x )] = : D α m (1) [ f ]( x ) f x , M (1) ( m (2) ) ≈ D α m (2) [ f x , M (0) ]( m (1) ) = D α m (2) D α m (1) [ f ]( x ) f x , M ! ( p − 1) ( m ( p ) ) ≈ D α m ( p ) [ f x , M ( p − 2) ]( m ( p − 1) ) = D α m ( p ) " D α m (2) D α m (1) [ f ]( x ) f x , M x + α m (1) + α 2 m (2) (0) ( m (1) ) f x , M (1) ( m (2) )] f '( x ; M ) = [ f x , M (0) ( m (1) ) ≈ α − 1 [ f ( x + α m (1) ) − f ( x )] f x , M x + α m (1) x (1) ( m (2) ) ≈ α − 2 [ f ( x + α m (1) + α 2 m (2) ) − f ( x + α m (1) )] f x , M 16

  17. Sparse Accumulation for L- derivatives ◆ Cost of AD can be reduced when the Jacobian is sparse Ø Find structurally orthogonal columns n × n n × p I ∈ M ∈ ϒ ϒ Ø Perform vector forward pass with seed matrix rather than ⎡ ⎤ ⎡ ⎤ a b 0 0 1 0 ⎢ ⎥ ⎢ ⎥ c 0 d 0 ⎢ ⎥ 0 1 ⎢ ⎥ ⎢ ⎥ 0 e 0 f ⎢ ⎥ 0 1 ⎢ ⎥ ⎢ ⎥ 0 0 g h 1 0 ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ◆ AD for LD-derivatives à order of the directions matters ⎡ ⎤ Ø Corresponding to M is an uncompressed (permutation) matrix Q : 1 0 0 0 ⎢ ⎥ » M = QD for some matrix D 0 0 1 0 ⎢ ⎥ ⎢ ⎥ 0 0 0 1 Ø Procedure: ⎢ ⎥ 0 1 0 0 ⎣ ⎦ » Identify matrices Q , D , and M ′ f (x ; M) » Perform vector forward pass to calculate ′ ′ f (x ; M) f (x ; Q) » Copy entries of into entries of sparse data structure for f (x ; M) = ′ ′ ◆ Done based on assumption that f (x ; Q)D f (x ; Q)Q − 1 J L f(x ; Q) = ′ » Calculate (i.e. by sparse permutation) f (x ; M) = ′ ′ f (x ; Q)D Ø is not true in general 17

  18. Generalized Derivatives of Algorithms: MHEX model out F 1 , T in F 1 , T 1 1 ! ! out out F | H | , T | H | F | H | , T | H | MHEX in out f 1 , t 1 f 1 , t 1 ! ! out f | C | , t | C | out f | C | , t | C | ( ) ( ) in − T i out − t i ∑ ∑ = out in F i T i f j t i i ∈ H j ∈ C ( ) = 0 p − EBP p min p ∈ P EBP C H Δ Q k ∑ UA − = 0 Δ T LM k k ∈ K k ≠ | K | Watson et al . (2015). 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend