automatic differentiation for optimum design applied to
play

Automatic Differentiation for Optimum Design, Applied to Sonic Boom - PowerPoint PPT Presentation

Automatic Differentiation for Optimum Design, Applied to Sonic Boom Reduction Laurent Hasco et, Mariano V azquez, Alain Dervieux Laurent.Hascoet@sophia.inria.fr Tropics Project, INRIA Sophia-Antipolis AD Workshop, Cranfield, June 5-6, 2003


  1. Automatic Differentiation for Optimum Design, Applied to Sonic Boom Reduction Laurent Hasco¨ et, Mariano V´ azquez, Alain Dervieux Laurent.Hascoet@sophia.inria.fr Tropics Project, INRIA Sophia-Antipolis AD Workshop, Cranfield, June 5-6, 2003

  2. 1 PLAN: • Part 1: A gradient-based shape optimization to reduce the Sonic Boom • Part 2: Utilization and improvements of reverse A.D to compute the Adjoint • Conclusion

  3. 2 PART 1: GRADIENT-BASED SONIC BOOM REDUCTION • The Sonic Boom optimization problem • A mixed Adjoint/AD strategy • Resolution of the Adjoint and Gradient • Numerical results

  4. 3 The Sonic Boom Problem Control Euler Pressure Cost points ⇒ Geometry ⇒ flow ⇒ shock ⇒ function γ Ω( γ ) W ( γ ) ∇ p ( W ) j ( γ )

  5. 4 Gradient-based Optimization j ( γ ) models the strength of the downwards Sonic Boom ⇒ Compute the gradient j ′ ( γ ) and use it in an optimization loop! ⇒ Use reverse-mode AD to compute this gradient

  6. 5 Differentiate the iterative solver? W ( γ ) is defined implicitly through the Euler equations Ψ( γ, W ) = 0 ⇒ The program uses an iterative solver ⇒ Brute force reverse AD differentiates the whole iterative process • Does it make sense? • Is it efficient ?

  7. 6 A mixed Adjoint/AD strategy Back to the mathematical adjoint system:  Ψ( γ, W ) = 0 (state system)        ∂W ( γ, W ) − ( ∂ Ψ ∂J ∂W ( γ, W )) t . Π   = 0 (adjoint system)     j ′ ( γ ) = ∂J ∂γ ( γ, W ) − ( ∂ Ψ  ∂γ ( γ, W )) t . Π  = 0 (optimality condition)    lower level ⇒ reverse AD cf Part 2 upper level ⇒ hand-coded specific solver for Π

  8. 7 Upper level Solve ∂ Ψ ∂W ( γ, W )) t . Π = ∂J ∂W ( γ, W ) ⇒ Use a matrix-free solver Preconditioning: “defect correction” ⇒ use the inverse of 1 st order ∂ Ψ ∂W to precondition 2 nd order ∂ Ψ ∂W

  9. 8 Overall Optimization Loop Loop: • compute Π with a matrix-free solver • use Π to compute j ′ ( γ ) • 1-D search in the direction j ′ ( γ ) • update γ (using transpiration conditions)

  10. 8 Overall Optimization Loop Loop: • compute Π with a matrix-free solver • use Π to compute j ′ ( γ ) • 1-D search in the direction j ′ ( γ ) • update γ (using transpiration conditions) Performances: Assembling ∂ Ψ ∂W ( γ, W )) t . Π takes about 7 times as long as assembing the state residual Ψ( γ, W )) ⇒ Solving for Π takes about 4 times as long as solving for W .

  11. 9 Numerical results 1 Gradient j ′ ( γ ) on the skin of the plane:

  12. 10 Numerical results 2 Improvement of the sonic boom after 8 optimization cycles:

  13. 11 PART 2: REVERSE AD TO COMPUTE THE ADJOINT • Some principles of Reverse AD • The “Tape” memory problem, the “Checkpointing” tactic • Impact of Data Dependences Analysis • Impact of In-Out Data Flow Analysis

  14. 12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I

  15. 12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 )

  16. 12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 ) chain rule: f ′ ( x ) = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 )

  17. 12 Principles of reverse AD AD rewrites source programs to make them compute derivatives. R m → I R n consider: P : { I 1 ; I 2 ; . . . I p ; } implementing f : I identify with: f = f p ◦ f p − 1 ◦ · · · ◦ f 1 name: x 0 = x and x k = f k ( x k − 1 ) chain rule: f ′ ( x ) = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 ) f ′ ( x ) generally too large and expensive ⇒ take useful views! y = f ′ ( x ) . ˙ ˙ x = f ′ p ( x p − 1 ) .f ′ p − 1 ( x p − 2 ) . . . . .f ′ 1 ( x 0 ) . ˙ x tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y reverse AD Evaluate both from right to left !

  18. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y

  19. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) .

  20. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) .

  21. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .

  22. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .

  23. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x p − 2 = f p − 2 ( x p − 3 ) ; x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) .

  24. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . x p − 2 = f p − 2 ( x p − 3 ) ; x p − 1 = f p − 1 ( x p − 2 ) ; time y f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) . backward sweep

  25. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . x p − 2 = f p − 2 ( x p − 3 ) ; ② ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✓ ✓ ✓ ✓ time retrieve ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✴ y ✐ f ′∗ p ( x p − 1 ) . f ′∗ p − 1 ( x p − 2 ) . . . . x = f ′∗ 1 ( x 0 ) . backward sweep

  26. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep x 1 = f 1 ( x 0 ) ; . . . ② x p − 2 = f p − 2 ( x p − 3 ) ; ✁ ✁ ✁ ② ✁ ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ time retrieve retrieve ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✴ ✓ y ✁ ✐ f ′∗ p ( x p − 1 ) . ✁ f ′∗ p − 1 ( x p − 2 ) . ✁ ☛ ✐ . . . x = f ′∗ 1 ( x 0 ) . backward sweep

  27. 13 Reverse AD is more tricky than Tangent AD x = f ′∗ ( x ) .y = f ′∗ 1 ( x 0 ) . . . . f ′∗ p − 1 ( x p − 2 ) .f ′∗ p ( x p − 1 ) .y x 0 ; forward sweep ② x 1 = f 1 ( x 0 ) ; . . . ② x p − 2 = f p − 2 ( x p − 3 ) ; ✁ ✁ ✁ ② ✁ ✓ x p − 1 = f p − 1 ( x p − 2 ) ; ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ time retrieve retrieve ✁ ✓ ✁ ✓ ✁ ✓ retrieve ✁ ✓ ✁ ✓ ✁ ✓ ✁ ✓ ✓ ✴ y ✁ ✐ f ′∗ p ( x p − 1 ) . ✁ f ′∗ p − 1 ( x p − 2 ) . ❄ ✁ ☛ ✐ ✐ . . . x = f ′∗ 1 ( x 0 ) . backward sweep Memory usage (“Tape”) is the bottleneck!

  28. 14 Example: Program fragment: ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 /v 2 ...

  29. 14 Example: Program fragment: ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 /v 2 ... Corresponding transposed Partial Jacobians:     1 0 1 2 1 − p 1 ∗ v 3 1 0     v 2 f ′∗ ( x ) = ...  ...     2 p 1 1    1  v 2    1 0

  30. 15 Reverse mode on the example ... v 4 ∗ (1 − p 1 ∗ v 3 /v 2 v 2 = ¯ ¯ v 2 + ¯ 2 ) v 3 = ¯ ¯ v 3 + ¯ v 4 ∗ p 1 /v 2 v 4 = 0 ¯ v 1 = ¯ ¯ v 1 + 2 ∗ ¯ v 2 v 2 = 0 ¯ ...

  31. 15 Reverse mode on the example ... v 2 = 2 ∗ v 1 + 5 v 4 = v 2 + p 1 ∗ v 3 /v 2 ... ... v 4 ∗ (1 − p 1 ∗ v 3 /v 2 v 2 = ¯ ¯ v 2 + ¯ 2 ) v 3 = ¯ ¯ v 3 + ¯ v 4 ∗ p 1 /v 2 v 4 = 0 ¯ v 1 = ¯ ¯ v 1 + 2 ∗ ¯ v 2 v 2 = 0 ¯ ...

  32. 15 Reverse mode on the example ... Push( v 2 ) v 2 = 2 ∗ v 1 + 5 Push( v 4 ) v 4 = v 2 + p 1 ∗ v 3 /v 2 ... ... Pop( v 4 ) v 4 ∗ (1 − p 1 ∗ v 3 /v 2 v 2 = ¯ ¯ v 2 + ¯ 2 ) v 3 = ¯ ¯ v 3 + ¯ v 4 ∗ p 1 /v 2 v 4 = 0 ¯ Pop( v 2 ) v 1 = ¯ ¯ v 1 + 2 ∗ ¯ v 2 v 2 = 0 ¯ ...

  33. 16 Reverse mode on our application From subroutine Psi : Psi : γ, W �→ Ψ( γ, W ) , Use reverse AD to build subroutine Psi : Psi : γ, W, Ψ �→ ∂ Ψ ∂W ( γ, W )) t . Ψ

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend