Automatic Differentiation for Optimum Design, Applied to Sonic Boom - - PowerPoint PPT Presentation

automatic differentiation for optimum design applied to
SMART_READER_LITE
LIVE PREVIEW

Automatic Differentiation for Optimum Design, Applied to Sonic Boom - - PowerPoint PPT Presentation

Automatic Differentiation for Optimum Design, Applied to Sonic Boom Reduction Laurent Hasco et, Mariano V azquez, Alain Dervieux Laurent.Hascoet@sophia.inria.fr Tropics Project, INRIA Sophia-Antipolis AD Workshop, Cranfield, June 5-6, 2003


slide-1
SLIDE 1

Automatic Differentiation for Optimum Design, Applied to Sonic Boom Reduction

Laurent Hasco¨ et, Mariano V´ azquez, Alain Dervieux Laurent.Hascoet@sophia.inria.fr Tropics Project, INRIA Sophia-Antipolis AD Workshop, Cranfield, June 5-6, 2003

slide-2
SLIDE 2

1

PLAN:

  • Part 1: A gradient-based shape optimization

to reduce the Sonic Boom

  • Part 2: Utilization and improvements of reverse A.D

to compute the Adjoint

  • Conclusion
slide-3
SLIDE 3

2

PART 1:

GRADIENT-BASED SONIC BOOM REDUCTION

  • The Sonic Boom optimization problem
  • A mixed Adjoint/AD strategy
  • Resolution of the Adjoint and Gradient
  • Numerical results
slide-4
SLIDE 4

3

The Sonic Boom Problem

Control Euler Pressure Cost points ⇒ Geometry ⇒ flow ⇒ shock ⇒ function γ Ω(γ) W(γ) ∇p(W) j(γ)

slide-5
SLIDE 5

4

Gradient-based Optimization

j(γ) models the strength of the downwards Sonic Boom ⇒ Compute the gradient j′(γ) and use it in an optimization loop! ⇒ Use reverse-mode AD to compute this gradient

slide-6
SLIDE 6

5

Differentiate the iterative solver?

W(γ) is defined implicitly through the Euler equations Ψ(γ, W) = 0 ⇒ The program uses an iterative solver ⇒ Brute force reverse AD differentiates the whole iterative process

  • Does it make sense?
  • Is it efficient ?
slide-7
SLIDE 7

6

A mixed Adjoint/AD strategy

Back to the mathematical adjoint system:                    Ψ(γ, W) = (state system) ∂J ∂W (γ, W) − ( ∂Ψ ∂W (γ, W))t . Π = (adjoint system) j′(γ) = ∂J ∂γ(γ, W) − (∂Ψ ∂γ (γ, W))t . Π = (optimality condition) lower level ⇒ reverse AD cf Part 2 upper level ⇒ hand-coded specific solver for Π

slide-8
SLIDE 8

7

Upper level

Solve ∂Ψ ∂W (γ, W))t . Π = ∂J ∂W (γ, W) ⇒ Use a matrix-free solver Preconditioning: “defect correction” ⇒ use the inverse of 1st order ∂Ψ ∂W to precondition 2nd order ∂Ψ ∂W

slide-9
SLIDE 9

8

Overall Optimization Loop

Loop:

  • compute Π with a matrix-free solver
  • use Π to compute j′(γ)
  • 1-D search in the direction j′(γ)
  • update γ (using transpiration conditions)
slide-10
SLIDE 10

8

Overall Optimization Loop

Loop:

  • compute Π with a matrix-free solver
  • use Π to compute j′(γ)
  • 1-D search in the direction j′(γ)
  • update γ (using transpiration conditions)

Performances: Assembling ∂Ψ ∂W (γ, W))t . Π takes about 7 times as long as assembing the state residual Ψ(γ, W)) ⇒ Solving for Π takes about 4 times as long as solving for W.

slide-11
SLIDE 11

9

Numerical results 1

Gradient j′(γ) on the skin of the plane:

slide-12
SLIDE 12

10

Numerical results 2

Improvement of the sonic boom after 8 optimization cycles:

slide-13
SLIDE 13

11

PART 2:

REVERSE AD TO COMPUTE THE ADJOINT

  • Some principles of Reverse AD
  • The “Tape” memory problem, the “Checkpointing” tactic
  • Impact of Data Dependences Analysis
  • Impact of In-Out Data Flow Analysis
slide-14
SLIDE 14

12

Principles of reverse AD

AD rewrites source programs to make them compute derivatives. consider: P : {I1; I2; . . . Ip; } implementing f : I Rm → I Rn

slide-15
SLIDE 15

12

Principles of reverse AD

AD rewrites source programs to make them compute derivatives. consider: P : {I1; I2; . . . Ip; } implementing f : I Rm → I Rn identify with: f = fp ◦ fp−1 ◦ · · · ◦ f1 name: x0 = x and xk = fk(xk−1)

slide-16
SLIDE 16

12

Principles of reverse AD

AD rewrites source programs to make them compute derivatives. consider: P : {I1; I2; . . . Ip; } implementing f : I Rm → I Rn identify with: f = fp ◦ fp−1 ◦ · · · ◦ f1 name: x0 = x and xk = fk(xk−1) chain rule: f ′(x) = f ′

p(xp−1).f ′ p−1(xp−2). . . . .f ′ 1(x0)

slide-17
SLIDE 17

12

Principles of reverse AD

AD rewrites source programs to make them compute derivatives. consider: P : {I1; I2; . . . Ip; } implementing f : I Rm → I Rn identify with: f = fp ◦ fp−1 ◦ · · · ◦ f1 name: x0 = x and xk = fk(xk−1) chain rule: f ′(x) = f ′

p(xp−1).f ′ p−1(xp−2). . . . .f ′ 1(x0)

f ′(x) generally too large and expensive ⇒ take useful views! ˙ y = f ′(x). ˙ x = f ′

p(xp−1).f ′ p−1(xp−2). . . . .f ′ 1(x0). ˙

x tangent AD x = f ′∗(x).y = f ′∗

1 (x0). . . . f ′∗ p−1(xp−2).f ′∗ p (xp−1).y

reverse AD Evaluate both from right to left !

slide-18
SLIDE 18

13

Reverse AD is more tricky than Tangent AD

x = f ′∗(x).y = f ′∗

1 (x0). . . . f ′∗ p−1(xp−2).f ′∗ p (xp−1).y

time y

slide-19
SLIDE 19

13

Reverse AD is more tricky than Tangent AD

x = f ′∗(x).y = f ′∗

1 (x0). . . . f ′∗ p−1(xp−2).f ′∗ p (xp−1).y

time y f ′∗

p (xp−1).

slide-20
SLIDE 20

13

Reverse AD is more tricky than Tangent AD

x = f ′∗(x).y = f ′∗

1 (x0). . . . f ′∗ p−1(xp−2).f ′∗ p (xp−1).y

time y f ′∗

p (xp−1).

f ′∗

p−1(xp−2).

slide-21
SLIDE 21

13

Reverse AD is more tricky than Tangent AD

x = f ′∗(x).y = f ′∗

1 (x0). . . . f ′∗ p−1(xp−2).f ′∗ p (xp−1).y

time y f ′∗

p (xp−1).

f ′∗

p−1(xp−2).

. . . x = f ′∗

1 (x0).

slide-22
SLIDE 22

13

Reverse AD is more tricky than Tangent AD

x = f ′∗(x).y = f ′∗

1 (x0). . . . f ′∗ p−1(xp−2).f ′∗ p (xp−1).y

time y f ′∗

p (xp−1).

f ′∗

p−1(xp−2).

. . . x = f ′∗

1 (x0).

xp−1 = fp−1(xp−2);

slide-23
SLIDE 23

13

Reverse AD is more tricky than Tangent AD

x = f ′∗(x).y = f ′∗

1 (x0). . . . f ′∗ p−1(xp−2).f ′∗ p (xp−1).y

time y f ′∗

p (xp−1).

f ′∗

p−1(xp−2).

. . . x = f ′∗

1 (x0).

xp−1 = fp−1(xp−2); xp−2 = fp−2(xp−3);

slide-24
SLIDE 24

13

Reverse AD is more tricky than Tangent AD

x = f ′∗(x).y = f ′∗

1 (x0). . . . f ′∗ p−1(xp−2).f ′∗ p (xp−1).y

time y f ′∗

p (xp−1).

f ′∗

p−1(xp−2).

. . . x = f ′∗

1 (x0).

xp−1 = fp−1(xp−2); xp−2 = fp−2(xp−3); . . . x1 = f1(x0); x0; forward sweep backward sweep

slide-25
SLIDE 25

13

Reverse AD is more tricky than Tangent AD

x = f ′∗(x).y = f ′∗

1 (x0). . . . f ′∗ p−1(xp−2).f ′∗ p (xp−1).y

time y f ′∗

p (xp−1).

f ′∗

p−1(xp−2).

. . . x = f ′∗

1 (x0).

xp−1 = fp−1(xp−2); xp−2 = fp−2(xp−3); . . . x1 = f1(x0); x0; forward sweep backward sweep

② ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✴

retrieve

slide-26
SLIDE 26

13

Reverse AD is more tricky than Tangent AD

x = f ′∗(x).y = f ′∗

1 (x0). . . . f ′∗ p−1(xp−2).f ′∗ p (xp−1).y

time y f ′∗

p (xp−1).

f ′∗

p−1(xp−2).

. . . x = f ′∗

1 (x0).

xp−1 = fp−1(xp−2); xp−2 = fp−2(xp−3); . . . x1 = f1(x0); x0; forward sweep backward sweep

② ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✴

retrieve

✐ ② ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ☛

retrieve

slide-27
SLIDE 27

13

Reverse AD is more tricky than Tangent AD

x = f ′∗(x).y = f ′∗

1 (x0). . . . f ′∗ p−1(xp−2).f ′∗ p (xp−1).y

time y f ′∗

p (xp−1).

f ′∗

p−1(xp−2).

. . . x = f ′∗

1 (x0).

xp−1 = fp−1(xp−2); xp−2 = fp−2(xp−3); . . . x1 = f1(x0); x0; forward sweep backward sweep

② ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✴

retrieve

✐ ② ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ☛

retrieve

✐ ② ❄

retrieve

Memory usage (“Tape”) is the bottleneck!

slide-28
SLIDE 28

14

Example:

Program fragment: ... v2 = 2 ∗ v1 + 5 v4 = v2 + p1 ∗ v3/v2 ...

slide-29
SLIDE 29

14

Example:

Program fragment: ... v2 = 2 ∗ v1 + 5 v4 = v2 + p1 ∗ v3/v2 ... Corresponding transposed Partial Jacobians: f ′∗(x) = ...      1 2 1 1           1 1 1−p1∗v3

v2

2

1

p1 v2

     ...

slide-30
SLIDE 30

15

Reverse mode on the example

... ¯ v2 = ¯ v2 + ¯ v4 ∗ (1 − p1 ∗ v3/v2

2)

¯ v3 = ¯ v3 + ¯ v4 ∗ p1/v2 ¯ v4 = 0 ¯ v1 = ¯ v1 + 2 ∗ ¯ v2 ¯ v2 = 0 ...

slide-31
SLIDE 31

15

Reverse mode on the example

... ¯ v2 = ¯ v2 + ¯ v4 ∗ (1 − p1 ∗ v3/v2

2)

¯ v3 = ¯ v3 + ¯ v4 ∗ p1/v2 ¯ v4 = 0 ¯ v1 = ¯ v1 + 2 ∗ ¯ v2 ¯ v2 = 0 ... ... v2 = 2 ∗ v1 + 5 v4 = v2 + p1 ∗ v3/v2 ...

slide-32
SLIDE 32

15

Reverse mode on the example

... ¯ v2 = ¯ v2 + ¯ v4 ∗ (1 − p1 ∗ v3/v2

2)

¯ v3 = ¯ v3 + ¯ v4 ∗ p1/v2 ¯ v4 = 0 ¯ v1 = ¯ v1 + 2 ∗ ¯ v2 ¯ v2 = 0 ... ... v2 = 2 ∗ v1 + 5 v4 = v2 + p1 ∗ v3/v2 ... Push(v4) Push(v2) Pop(v4) Pop(v2)

slide-33
SLIDE 33

16

Reverse mode on our application

From subroutine Psi : Psi: γ, W → Ψ(γ, W), Use reverse AD to build subroutine Psi : Psi: γ, W, Ψ → ∂Ψ ∂W (γ, W))t . Ψ

slide-34
SLIDE 34

16

Reverse mode on our application

From subroutine Psi : Psi: γ, W → Ψ(γ, W), Use reverse AD to build subroutine Psi : Psi: γ, W, Ψ → ∂Ψ ∂W (γ, W))t . Ψ But the Tape grows too large on large meshes!

slide-35
SLIDE 35

17

The Checkpointing mechanism

A Storage/Recomputation tradeoff:

slide-36
SLIDE 36

17

The Checkpointing mechanism

A Storage/Recomputation tradeoff: Tapenade does it on the Call Graph : Tape size reaches 4 local maxima.

slide-37
SLIDE 37

18

Tape size maxima

Tape local maximum # 1 2 3 4 12.40 12.37 13.60 9.66

773 R*8/node: (still) too expensive in memory ⇒ Use Data Flow analysis

slide-38
SLIDE 38

19

Data-Flow “to the rescue” (1)

Data Dependence Graphs of P and P are isomorphic, so... improve Independent-Iterations loops (“II-loops”) Standard: Improved: do // i = 1,N body(i) end do i = N,1 body(i) end ⇐ ⇒ do i = 1,N body(i) body(i) end

slide-39
SLIDE 39

20

Tape local maximum # 1 2 3 4 No modification: 12.40 12.37 13.60 9.66 II-loops improvement: 12.38 7.98 4.10 0.02

Improvement on (4), but hidden by (1) !

slide-40
SLIDE 40

21

Data-Flow “to the rescue” (2)

Analyze the size of Snapshots: Snapshot = IN(checkpoint) OUT(checkpoint and after)

slide-41
SLIDE 41

22

Tape local maximum # 1 2 3 4 No modification: 12.40 12.37 13.60 9.66 Only snapshot reduction: 1.02 0.85 9.70 9.33 Only II-loops improvement: 12.38 7.98 4.10 0.02 Both improvements: 1.02 0.61 0.22 0.02

58 R*8/node: quite acceptable !

slide-42
SLIDE 42

23

CONCLUSION:

  • Part 1: Brute-force reverse AD, including on the iterative solver, is

a hazardous strategy ⇒ define a manual strategy before AD.

  • Part 1: ... but the matrix-free solver proves a delicate step.
  • Part 2: Reverse AD can use reasonable memory space, thanks to

data flow analyses.

  • Advertisment! Tapenade

: an AD tool on the web http://tapenade.inria.fr:8080/tapenade

  • r alternatively FTP from our web site

http://www.inria.fr/tropics.