Adjoint approach to optimization Praveen. C - PowerPoint PPT Presentation

Adjoint approach to optimization Praveen. C praveen@math.tifrbng.res.in Tata Institute of Fundamental Research Center for Applicable Mathematics Bangalore 560065 http://math.tifrbng.res.in Health, Safety and Environment Group BARC 6-7 October, 2010 Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 1 / 58

Outline • Minimum of a function • Constrained minimization • Finite difference approach • Adjoint approach • Automatic differentiation • Example Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 2 / 58

Minimum of a function f(x) f'(x 0 )=0 x x 0 Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 3 / 58

Steepest descent method x 2 grad f(x 1 ,x 2 ) x 1 x n +1 = x n − s n ∇ f ( x n ) Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 4 / 58

Objectives and controls • Objective function J ( α ) = J ( α, u ) mathematical representation of system performance • Control variables α ◮ Parametric controls α ∈ R n ◮ Infinite dimensional controls α : X → Y ◮ Shape α ∈ set of admissible shapes • State variable u : solution of an ODE or PDE R ( α, u ) = 0 ⇓ u = u ( α ) Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 5 / 58

Gradient-based minimization: blackbox/FD approach β ∈ R N I ( β, Q ( β )) min • Initialize β 0 , n = 0 • For n = 0 , . . . , N iter ◮ Solve R ( β n , Q n ) = 0 ◮ For j = 1 , . . . , N ⋆ β n ( j ) = [ β n 1 , . . . , β n j + ∆ β j , . . . , β n N ] ⊤ ⋆ Solve R ( β n ( j ) , Q n ( j ) ) = 0 I ( β n ( j ) ,Q n ( j ) ) − I ( β n ,Q n ) d I ⋆ d β j ≈ ∆ β j ◮ Steepest descent step β n +1 = β n − s n d I d β ( β n ) Cost of FD-based steepest-descent Cost = O ( N + 1) N iter = O ( N + 1) O ( N ) = O ( N 2 ) Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 6 / 58

Accuracy of FD: Choice of step size d xf ( x 0 ) = f ( x 0 + δ ) − f ( x 0 ) d + O ( δ ) δ In principle, if we choose a small δ , we reduce the error. But computers have finite precision. Instead of f ( x 0 ) the computers gives f ( x 0 ) + O ( ǫ ) where ǫ = machine precision. [ f ( x 0 + δ ) + O ( ǫ )] − [ f ( x 0 ) + O ( ǫ )] = f ( x 0 + δ ) − f ( x 0 ) ǫ + C 1 δ δ δ d ǫ = d xf ( x 0 ) + C 2 δ + C 1 , C 1 , C 2 depend on f, x 0 , precision δ � �� Total error Total error is least when √ ǫ, � δ = δ opt = C 3 C 3 = C 1 /C 2 Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 7 / 58

Accuracy of FD: Choice of step size Error Precision ǫ δ opt 10 − 8 10 − 4 Single 10 − 16 10 − 8 Double See: C 1 δ Brian J. McCartin, Seven Deadly Sins of Numerical Computation The American Mathematical Monthly, Vol. 105, No. 10 (Dec., C 2 ǫ/δ 1998), pp. 929-941 δ δ opt Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 8 / 58

Drag gradient using FD (Samareh) Errors in Finite Difference Approximations 1 10 Mid chord LE Sweep2 0 10 Tip chord LE sweep3 Twist (root) -1 10 Twist (mid) Twist (tip) -2 10 % Error LE Sweep2 -3 Round � off 10 Mid chord LE Sweep3 -4 10 Truncation Tip chord -5 10 Twist (mid) Twist (root) Twist (tip) -6 10 -9 -8 -7 -6 -5 -4 -3 -2 -1 10 10 10 10 10 10 10 10 10 Scaled Step Size Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 9 / 58

Iterative problems I ( β, Q ) , where R ( β, Q ) = 0 • Q is implicitly defined, require an iterative solution method. • Assume a Q 0 and iterate Q n − → Q n +1 until || R ( β, Q n ) || ≤ TOL • If TOL is too small, need too many iterations • Many problems, we cannot reduce || R ( β, Q n ) || to small values • This means that numerical value of I will be noisy Finite difference will contain too much error, and is useless RAE5243 airfoil, Mach=0.68, Re=19 million, AOA=2.5 deg. iter Lift Drag 41496 0.824485788042416 1.627593747613790E-002 41497 0.824485782714867 1.627593516695762E-002 41498 0.824485777387834 1.627593285794193E-002 41499 0.824485772061306 1.627593054909022E-002 41500 0.824485766735297 1.627592824040324E-002 Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 10 / 58

Complex variable method f ( x 0 + iδ ) = f ( x 0 ) + iδf ′ ( x 0 ) + O ( δ 2 ) + iO ( δ 3 ) f ′ ( x 0 ) = 1 δ imag[ f ( x 0 + iδ )] + O ( δ 2 ) • No roundoff error • We can take δ to be very small, δ = 10 − 20 • Can be easily implemented ◮ fortran: redeclare real variables as complex ◮ matlab: no change • Iterative problems: β − → β + i ∆ β ◮ Obtain ˜ Q = Q ( β + i ∆ β ) by solving R ( β + i ∆ β, ˜ Q ) = 0 ◮ Then gradient 1 I ′ ( β ) ≈ ∆ β imag[ I ( β + i ∆ β, Q ( β + i ∆ β )] • Computational cost is O ( N 2 ) or higher (due to complex arithmetic) Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 11 / 58

Objectives and controls • Objective function J ( α ) = J ( α, u ) mathematical representation of system performance • Control variables α ◮ Parametric controls α ∈ R n ◮ Infinite dimensional controls α : X → Y ◮ Shape α ∈ set of admissible shapes • State variable u : solution of an ODE or PDE R ( α, u ) = 0 Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 12 / 58

Mathematical formulation • Constrained minimization problem min α J ( α, u ) subject to R ( α, u ) = 0 • Find δα such that δJ < 0 ∂J ∂αδα + ∂J δJ = ∂uδu ∂J ∂αδα + ∂J ∂u = ∂αδα ∂u � ∂J � ∂α + ∂J ∂u = δα =: Gδα ∂u ∂α • Steepest descent δα = − ǫG ⊤ δJ = − ǫGG ⊤ = − ǫ � G � 2 < 0 Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 13 / 58

Mathematical formulation • Constrained minimization problem min α J ( α, u ) subject to R ( α, u ) = 0 • Find δα such that δJ < 0 ∂J ∂αδα + ∂J δJ = ∂uδu ∂J ∂αδα + ∂J ∂u = ∂αδα ∂u � ∂J � ∂α + ∂J ∂u = δα =: Gδα ∂u ∂α • Steepest descent δα = − ǫG ⊤ δJ = − ǫGG ⊤ = − ǫ � G � 2 < 0 Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 14 / 58

Sensitivity approach • Linearized state equation R ( α, u ) = 0 ∂R ∂α δα + ∂R ∂u δu = 0 or ∂R ∂α = − ∂R ∂u ∂u ∂α • Solve sensitivity equation iteratively, e.g., ∂ ∂α + ∂R ∂u ∂α = − ∂R ∂u ∂t ∂u ∂α • Gradient d J d α = ∂J ∂α + ∂J ∂u ∂u ∂α Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 15 / 58

Sensitivity approach: Computational cost • n design variables: α = ( α 1 , . . . , α n ) • Solve primal problem R ( α, u ) = 0 to get u ( α ) • For i = 1 , . . . , n ◮ Solve sensitivity equation wrt α i ∂R ∂u = − ∂R ∂u ∂α i ∂α i ◮ Compute derivative wrt α i d J = ∂J + ∂J ∂u d α i ∂α i ∂u ∂α i • One primal equation, n sensitivity equations Computational cost = n + 1 Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 16 / 58

Adjoint approach • We have δJ = ∂J ∂αδα + ∂J ∂R ∂α δα + ∂R ∂uδu and ∂u δu = 0 • Introduce a new unknown v � ∂R � ∂J ∂αδα + ∂J ∂α δα + ∂R ∂uδu + v ⊤ δJ = ∂u δu � ∂J � � ∂J � ∂α + v ⊤ ∂R ∂u + v ⊤ ∂R = δα + δu ∂α ∂u • Adjoint equation � ∂R � ⊤ � ∂J � ⊤ v = − ∂u ∂u • Iterative solution � ∂R � ⊤ � ∂J � ⊤ ∂v ∂t + v = − ∂u ∂u Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 17 / 58

Adjoint approach: Computational cost • n design variables: α = ( α 1 , . . . , α n ) • Solve primal problem R ( α, u ) = 0 to get u ( α ) • Solve adjoint problem � ∂R � ⊤ � ∂J � ⊤ v = − ∂u ∂u • For i = 1 , . . . , n ◮ Compute derivative wrt α i d J = ∂J + v ⊤ ∂R d α i ∂α i ∂α i • One primal equation, one adjoint equation Computational cost = 2, independent of n Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 18 / 58

Adjoint: Two approaches Continuous or Discrete or differentiate-discretize discretize-differentiate PDE PDE Adjoint Discrete PDE PDE Discrete Discrete adjoint adjoint Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 19 / 58

Techniques for computing gradients • Hand differentiation • Finite difference method • Complex variable method • Automatic Differentiation (AD) ◮ Computer code to compute J ( α, u ) and R ( α, u ) ◮ Chain rule of differentiation ◮ Generates a code to compute derivatives ◮ ADIFOR, ADOLC, ODYSEE, TAMC, TAF, TAPENADE see http://www.autodiff.org Code for α Code for AD tool ∂J J ( α, u ) ∂α Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 20 / 58

Derivatives • Given a program P computing a function F R m R n F : → X → Y • build a program that computes derivatives of F • X : independent variables • Y : dependent variables Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 21 / 58

Derivatives � ∂Y j � • Jacobian matrix: J = ∂X i • Directional or tangent derivative Y = J ˙ ˙ X • Adjoint mode X = J ⊤ ¯ ¯ Y • Gradients ( n = 1 output) � ∂Y � J = = ∇ Y ∂X i Praveen. C (TIFR-CAM) Optimization BARC, 6 Oct 2010 22 / 58

Adjoint approach to optimization Praveen. C - PowerPoint PPT Presentation

Adjoint approach to optimization Praveen. C praveen@math.tifrbng.res.in Tata Institute of Fundamental Research Center for Applicable Mathematics Bangalore 560065 http://math.tifrbng.res.in Health, Safety and Environment Group BARC 6-7

Adjoint Solver Workshop Why is an Adjoint Solver useful? Design and manufacture for better

Adjoint Orbits, Principal Components, and Neural Nets Some facts about Lie groups and

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation

Extension of the adjoint method Stanislas Larnier Institut de Mathmatiques de Toulouse

Variational approach to data assimilation: optimization aspects and adjoint method Eric Blayo

Variational approach to data assimilation: optimization aspects and adjoint method Eric Blayo

Adjoint approach to optimization using automatic differentiation (AD) Praveen. C

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

INDUSTRIAL APPLICATION OF CONTINUOUS ADJOINT FLOW SOLVERS FOR THE OPTIMIZATION OF AUTOMOTIVE

High-Order, Time-Dependent Aerodynamic Optimization using a Discontinuous Galerkin Discretization

Symbolic Unfolding of Multi-adjoint Logic Programs es Moreno 1 Jaime Penabad 2 e Antonio Riaza 1

Introduction to Adjoint Models Dr. Ronald M. Errico Goddard Earth Sciences and Technology Center

Adjoint Data-Flow analyses applied to checkpointing - Tradeoff between snapshots and TBR Benjamin

Direct/Adjoint Methods Lecture 12 ME EN 575 Andrew Ning aning@byu.edu Outline Motivating

ANISORROPIA: Development of the Adjoint of ISORROPIA Shannon Capps,

Weak adjoint functor theorems Stephen Lack Macquarie University Kyoto, 23 December 2019 joint

Stochastic Gradient Descent Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI),

Scaled gradient projection methods in image deblurring and denoising Mario Bertero 1 Patrizia

Part 7.5 Stochastic Gradient Descent and Stochastic Newton 181 Wolfgang Bangerth Background

Deep Learning: Theory and Practice Matrix Calculus 31-1-2019 Linear and Logistic Regression

Inverse Scattering Problems Chaiwoot Boonyasiriwat October 14, 2020 Direct Scattering Problem

New developments of LOBPCG for large-scale nonlinear eigenvalue problems Fei Xue University of

Metric-Optimized Example Weights Sen Zhao , Mahdi Milani Fard, Harikrishna Narasimhan, Maya Gupta

Developing a Predic0ve Model for Internet Video