adjoint derivative computation
play

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan - PowerPoint PPT Presentation

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan 1/16 There are several methods for calculating derivatives: 1 By hand 2 Symbolic differentiation 3 Numerical


  1. Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 1/16

  2. There are several methods for calculating derivatives: 1 By hand 2 Symbolic differentiation 3 Numerical differentiation 4 “Imaginary trick” in MATLAB 5 Automatic differentiation Forward mode Adjoint (or backward or reverse) mode Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 2/16

  3. Calculating derivatives by hand Time consuming & error prone Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 3/16

  4. Symbolic differentiation We can obtain an expression of the derivatives we need with: Mathematica, Maple, ... Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 4/16

  5. Symbolic differentiation We can obtain an expression of the derivatives we need with: Mathematica, Maple, ... Often this results in a very long code which is expensive to evaluate. Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 4/16

  6. Numerical differentiation 1/2 Consider a function f : R n → R ∇ f ( x ) T p ≈ f ( x + tp ) − f ( x ) t Really easy to implement. Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 5/16

  7. Numerical differentiation 1/2 Consider a function f : R n → R ∇ f ( x ) T p ≈ f ( x + tp ) − f ( x ) t Really easy to implement. Problem How should we choose t ? Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 5/16

  8. Numerical differentiation 2/2 Problem How should we chose t ? A rule of thumb Set t = √ ǫ , where ǫ is set to machine precision or the precision of f . The accuracy of the derivative is approximately √ ǫ . Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 6/16

  9. “Imaginary trick” in MATLAB Consider an analytic function f : R n → R . Set t = 10 − 100 . ∇ f ( x ) T p = I ( f ( x + itp )) t ∇ f ( x ) T p can be calculated up to machine precision! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 7/16

  10. Automatic differentiation Consider a function f : R n → R defined by using m elementary operations φ i . Function evaluation Input: x 1 , x 2 , . . . , x n Output: x n + m for i = n + 1 to n + m x i ← φ i ( x 1 , . . . , x i − 1 ) end for Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 8/16

  11. Automatic differentiation Consider a function f : R n → R defined by using m elementary operations φ i . Function evaluation Input: x 1 , x 2 , . . . , x n Output: x n + m for i = n + 1 to n + m x i ← φ i ( x 1 , . . . , x i − 1 ) end for Example f ( x 1 , x 2 , x 3 ) = sin( x 1 x 2 ) + exp( x 1 x 2 x 3 ) Evaluation code (for m = 5 elementary operations): x 1 x 2 ; sin( x 4 ); x 4 x 3 ; x 4 ← x 5 ← x 6 ← x 7 ← exp( x 6 ) x 8 ← x 5 + x 7 ; Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 8/16

  12. Automatic differentiation: forward mode Assume x ( t ) and f ( x ( t )). x = dx f = df ˙ ˙ dt = J f ( x )˙ x dt For i = 1 , . . . , m n + i − 1 dx n + i ∂φ n + i dx j � = dt ∂ x j dt j =1 Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 9/16

  13. Automatic differentiation: forward mode Assume x ( t ) and f ( x ( t )). x = dx f = df ˙ ˙ dt = J f ( x )˙ x dt For i = 1 , . . . , m n + i − 1 dx n + i ∂φ n + i dx j � = dt ∂ x j dt j =1 Forward automatic differentiation x n and (and all partial derivatives ∂φ n + i Input: ˙ x 1 , ˙ x 2 , . . . , ˙ ) ∂ x j Output: ˙ x n + m for i = 1 to m ∂φ n + i x n + i ← � n + i − 1 ˙ x j ˙ j =1 ∂ x j end for Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 9/16

  14. Automatic differentiation: reverse mode Reverse automatic differentiation Input: all ∂φ i ∂ x j Output: ¯ x 1 , . . . , ¯ x n ¯ x 1 , . . . , ¯ x n ← 0 ¯ x n + m ← 1 for j = n + m down to n + 1 for all i = 1 , 2 , . . . , j − 1 ∂φ j ¯ x i ← ¯ x i + ¯ x j ∂ x i end for end for Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 10/16

  15. Automatic differentiation summary so far f : R n → R Cost of forward mode per directional derivative cost( ∇ f T p ) ≤ 2 cost( f ) For full gradient ∇ f , need 2 n cost( f ) ! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16

  16. Automatic differentiation summary so far f : R n → R Cost of forward mode per directional derivative cost( ∇ f T p ) ≤ 2 cost( f ) For full gradient ∇ f , need 2 n cost( f ) ! Cost of reverse mode: full gradient cost( ∇ f ) ≤ 3 cost( f ) Independent of n ! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16

  17. Automatic differentiation summary so far f : R n → R Cost of forward mode per directional derivative cost( ∇ f T p ) ≤ 2 cost( f ) For full gradient ∇ f , need 2 n cost( f ) ! Cost of reverse mode: full gradient cost( ∇ f ) ≤ 3 cost( f ) Independent of n ! Only drawback: large memory needed for all intermediate values Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16

  18. Automatic differentiation: summary Automatic differentiation can be used for any f : R n → R m . Cost of forward mode for forward direction p ∈ R n cost( J f p ) ≤ 2 cost( f ) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16

  19. Automatic differentiation: summary Automatic differentiation can be used for any f : R n → R m . Cost of forward mode for forward direction p ∈ R n cost( J f p ) ≤ 2 cost( f ) Cost of reverse mode per reverse direction p ∈ R m cost( p T J f ) ≤ 3 cost( f ) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16

  20. Automatic differentiation: summary Automatic differentiation can be used for any f : R n → R m . Cost of forward mode for forward direction p ∈ R n cost( J f p ) ≤ 2 cost( f ) Cost of reverse mode per reverse direction p ∈ R m cost( p T J f ) ≤ 3 cost( f ) For computation of full Jacobian J f , choice of best mode depends on size of n and m . Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16

  21. Derivation of Adjoint Mode 1/3 Regard function code as the computation of a vector which is “growing” at every iteration  x 1        x 1 x 1 x 2   x 2 x 2         x 3         x 1 = ˜ = Φ 1 = x 3 x 3         . . .         . . . . . .         x n   x n +1 x n φ n +1 ( x 1 , x 2 , x 3 , . . . , x n ) . . .   x 1       x 1 x 1 x 2   x 2 x 2         x 3         x m = ˜ = Φ m = x 3 x 3         . . .         . . . . . .         x n + m − 1   x n + m x n + m − 1 φ n + m ( x 1 , x 2 , x 3 , . . . , x n + m − 1) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 13/16

  22. Derivation of Adjoint Mode 2/3 Evaluation of f : R n → R q can then be written as f ( x ) = Q Φ m (Φ m − 1 ( . . . Φ 2 (Φ 1 ( x )) . . . )) with Q ∈ R q × ( n + m ) a 0-1 matrix selecting the output variables, e.g. for q = 1 � � Q = 0 0 0 1 . . . Then the full Jacobian is given by J f ( x ) = QJ Φ m (˜ x m ) J Φ m − 1 (˜ x m − 1 ) . . . J Φ 1 ( x ) where the Jacobians of Φ i are   1 0 0 0 . . . 0 1 0 . . . 0     . . . . . . . . . . . . . . .   J Φ i =   0 0 0 . . . 1     ∂φ n + i ∂φ n + i ∂φ n + i ∂φ n + i   . . . ∂ x 1 ∂ x 2 ∂ x 3 ∂ x n + i − 1 Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 14/16

  23. Derivation of Adjoint Mode 3/3 Forward mode: J f p = QJ Φ m J Φ m − 1 . . . J Φ 1 p = Q ( J Φ m ( J Φ m − 1 . . . ( J Φ 1 p ))) Adjoint mode: p T J f p T QJ Φ m J Φ m − 1 . . . J Φ 1 = ((( p T Q ) J Φ m ) J Φ m − 1 ) . . . J Φ 1 = The adjoint mode corresponds just to the efficient evaluation of the vector matrix product p T J f ! Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 15/16

  24. Software for Adjoint Derivatives Generic Tools to Differentiate Code ADOL-C for C/C++, using operator overloading (open source) ADIC / ADIFOR for C/FORTRAN, using source code transformation (open source) TAPENADE, CppAD (open source), ... Differential Algebraic Equation Solvers with Adjoints SUNDIALS Suite CVODES / IDAS (Sandia, open source) DAESOL-II (Uni Heidelberg) ACADO Integrators (Leuven, open source) Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 16/16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend