Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan - - PowerPoint PPT Presentation

adjoint derivative computation
SMART_READER_LITE
LIVE PREVIEW

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan - - PowerPoint PPT Presentation

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan 1/16 There are several methods for calculating derivatives: 1 By hand 2 Symbolic differentiation 3 Numerical


slide-1
SLIDE 1

Adjoint Derivative Computation

Moritz Diehl and Carlo Savorgnan

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 1/16

slide-2
SLIDE 2

There are several methods for calculating derivatives:

1 By hand 2 Symbolic differentiation 3 Numerical differentiation 4 “Imaginary trick” in MATLAB 5 Automatic differentiation

Forward mode Adjoint (or backward or reverse) mode

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 2/16

slide-3
SLIDE 3

Calculating derivatives by hand Time consuming & error prone

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 3/16

slide-4
SLIDE 4

Symbolic differentiation

We can obtain an expression of the derivatives we need with: Mathematica, Maple, ...

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 4/16

slide-5
SLIDE 5

Symbolic differentiation

We can obtain an expression of the derivatives we need with: Mathematica, Maple, ... Often this results in a very long code which is expensive to evaluate.

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 4/16

slide-6
SLIDE 6

Numerical differentiation 1/2

Consider a function f : Rn → R ∇f (x)Tp ≈ f (x + tp) − f (x) t Really easy to implement.

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 5/16

slide-7
SLIDE 7

Numerical differentiation 1/2

Consider a function f : Rn → R ∇f (x)Tp ≈ f (x + tp) − f (x) t Really easy to implement.

Problem

How should we choose t?

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 5/16

slide-8
SLIDE 8

Numerical differentiation 2/2

Problem

How should we chose t?

A rule of thumb

Set t = √ǫ, where ǫ is set to machine precision or the precision of f . The accuracy of the derivative is approximately √ǫ.

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 6/16

slide-9
SLIDE 9

“Imaginary trick” in MATLAB

Consider an analytic function f : Rn → R. Set t = 10−100. ∇f (x)Tp = I(f (x + itp)) t ∇f (x)Tp can be calculated up to machine precision!

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 7/16

slide-10
SLIDE 10

Automatic differentiation

Consider a function f : Rn → R defined by using m elementary

  • perations φi.

Function evaluation

Input: x1, x2, . . . , xn Output: xn+m for i = n + 1 to n + m xi ← φi(x1, . . . , xi−1) end for

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 8/16

slide-11
SLIDE 11

Automatic differentiation

Consider a function f : Rn → R defined by using m elementary

  • perations φi.

Function evaluation

Input: x1, x2, . . . , xn Output: xn+m for i = n + 1 to n + m xi ← φi(x1, . . . , xi−1) end for

Example

f (x1, x2, x3) = sin(x1x2) + exp(x1x2x3) Evaluation code (for m = 5 elementary operations): x4 ← x1x2; x5 ← sin(x4); x6 ← x4x3; x7 ← exp(x6) x8 ← x5 + x7;

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 8/16

slide-12
SLIDE 12

Automatic differentiation: forward mode

Assume x(t) and f (x(t)). ˙ x = dx dt ˙ f = df dt = Jf (x)˙ x For i = 1, . . . , m dxn+i dt =

n+i−1

  • j=1

∂φn+i ∂xj dxj dt

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 9/16

slide-13
SLIDE 13

Automatic differentiation: forward mode

Assume x(t) and f (x(t)). ˙ x = dx dt ˙ f = df dt = Jf (x)˙ x For i = 1, . . . , m dxn+i dt =

n+i−1

  • j=1

∂φn+i ∂xj dxj dt

Forward automatic differentiation

Input: ˙ x1, ˙ x2, . . . , ˙ xn and (and all partial derivatives ∂φn+i ∂xj ) Output: ˙ xn+m for i = 1 to m ˙ xn+i ← n+i−1

j=1

∂φn+i ∂xj ˙ xj end for

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 9/16

slide-14
SLIDE 14

Automatic differentiation: reverse mode

Reverse automatic differentiation

Input: all ∂φi ∂xj Output: ¯ x1, . . . , ¯ xn ¯ x1, . . . , ¯ xn ← 0 ¯ xn+m ← 1 for j = n + m down to n + 1 for all i = 1, 2, . . . , j − 1 ¯ xi ← ¯ xi + ¯ xj ∂φj ∂xi end for end for

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 10/16

slide-15
SLIDE 15

Automatic differentiation summary so far

f : Rn → R

Cost of forward mode per directional derivative

cost(∇f Tp) ≤ 2 cost(f ) For full gradient ∇f , need 2n cost(f ) !

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16

slide-16
SLIDE 16

Automatic differentiation summary so far

f : Rn → R

Cost of forward mode per directional derivative

cost(∇f Tp) ≤ 2 cost(f ) For full gradient ∇f , need 2n cost(f ) !

Cost of reverse mode: full gradient

cost(∇f ) ≤ 3 cost(f ) Independent of n!

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16

slide-17
SLIDE 17

Automatic differentiation summary so far

f : Rn → R

Cost of forward mode per directional derivative

cost(∇f Tp) ≤ 2 cost(f ) For full gradient ∇f , need 2n cost(f ) !

Cost of reverse mode: full gradient

cost(∇f ) ≤ 3 cost(f ) Independent of n! Only drawback: large memory needed for all intermediate values

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 11/16

slide-18
SLIDE 18

Automatic differentiation: summary

Automatic differentiation can be used for any f : Rn → Rm.

Cost of forward mode for forward direction p ∈ Rn

cost(Jf p) ≤ 2 cost(f )

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16

slide-19
SLIDE 19

Automatic differentiation: summary

Automatic differentiation can be used for any f : Rn → Rm.

Cost of forward mode for forward direction p ∈ Rn

cost(Jf p) ≤ 2 cost(f )

Cost of reverse mode per reverse direction p ∈ Rm

cost(pTJf ) ≤ 3 cost(f )

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16

slide-20
SLIDE 20

Automatic differentiation: summary

Automatic differentiation can be used for any f : Rn → Rm.

Cost of forward mode for forward direction p ∈ Rn

cost(Jf p) ≤ 2 cost(f )

Cost of reverse mode per reverse direction p ∈ Rm

cost(pTJf ) ≤ 3 cost(f ) For computation of full Jacobian Jf , choice of best mode depends

  • n size of n and m.

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 12/16

slide-21
SLIDE 21

Derivation of Adjoint Mode 1/3

Regard function code as the computation of a vector which is “growing” at every iteration ˜ x1 =       x1 x2 x3 . . . xn+1       = Φ1             x1 x2 x3 . . . xn             =         x1 x2 x3 . . . xn φn+1(x1, x2, x3, . . . , xn)         . . . ˜ xm =       x1 x2 x3 . . . xn+m       = Φm             x1 x2 x3 . . . xn+m−1             =         x1 x2 x3 . . . xn+m−1 φn+m(x1, x2, x3, . . . , xn+m−1)        

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 13/16

slide-22
SLIDE 22

Derivation of Adjoint Mode 2/3

Evaluation of f : Rn → Rq can then be written as f (x) = QΦm(Φm−1(. . . Φ2(Φ1(x)) . . . )) with Q ∈ Rq×(n+m) a 0-1 matrix selecting the output variables, e.g. for q = 1 Q =

  • . . .

1

  • Then the full Jacobian is given by

Jf (x) = QJΦm(˜ xm)JΦm−1(˜ xm−1) . . . JΦ1(x) where the Jacobians of Φi are JΦi =         1 . . . 1 . . . . . . . . . . . . . . . . . . . . . 1 ∂φn+i ∂x1 ∂φn+i ∂x2 ∂φn+i ∂x3 . . . ∂φn+i ∂xn+i−1        

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 14/16

slide-23
SLIDE 23

Derivation of Adjoint Mode 3/3

Forward mode: Jf p = QJΦmJΦm−1 . . . JΦ1p = Q(JΦm(JΦm−1 . . . (JΦ1p))) Adjoint mode: pTJf = pTQJΦmJΦm−1 . . . JΦ1 = (((pTQ)JΦm)JΦm−1) . . . JΦ1 The adjoint mode corresponds just to the efficient evaluation of the vector matrix product pTJf !

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 15/16

slide-24
SLIDE 24

Software for Adjoint Derivatives

Generic Tools to Differentiate Code

ADOL-C for C/C++, using operator overloading (open source) ADIC / ADIFOR for C/FORTRAN, using source code transformation (open source) TAPENADE, CppAD (open source), ...

Differential Algebraic Equation Solvers with Adjoints

SUNDIALS Suite CVODES / IDAS (Sandia, open source) DAESOL-II (Uni Heidelberg) ACADO Integrators (Leuven, open source)

Adjoint Derivative Computation – Moritz Diehl and Carlo Savorgnan 16/16