c cientific The Hessian Module omputing Current status and future - PowerPoint PPT Presentation

c cientific The Hessian Module omputing • Current status and future perspectives • Reference: J. Abate, C. Bischof, A. Carle, L. Roh: Algorithms and Design for a Second- Order Automatic Differentiation Module Int. Symposium on Symbolic and Algebraic Computing (ISSAC) , 1997 • Presentation by: Arno Rasch, RWTH Aachen University

c cientific The Hessian Module omputing C or Fortran Hessian ADIC 1.1b4 AIF source code or Module ADIFOR 3.0 Optional: � C- macros for C and Fortran Differentiated source: standard tasks, libraries for 1st and 2nd derivatives e.g., seeding, computing via library functions extraction of 1st & 2nd order results derivatives � automatically generated easy-to -use drivers (ADIFOR 3.0) EXE

c cientific (1) Global forward mode omputing • Break up statements into elementary unary/binary operations: compute ∇ z , ∇ 2 z in special subroutine: • n = # indep. vars q = (n+1)·n / 2 • Hessians are stored in LAPACK packed symmetric scheme (size: q)

c cientific (1) Global forward mode omputing • Break up statements into elementary unary/binary operations: compute ∇ z , ∇ 2 z in special subroutine: • n = # indep. vars + q = (n+1)·n / 2 + + + + + • Hessians are stored in LAPACK packed symmetric scheme (size: q) • Smart implementation requires (6n+4q) FP-mult , (3n+3q) FP-add.

c cientific (1) Global forward mode omputing • Depending on elementary function f often fewer operations necessary, e.g., f = „multiplication“ with both variables active: ∂ ∂ ∂ 2 2 2 z z z = = = 0 1 , ∂ 2 ∂ 2 ∂ ∂ x y x y • With f = „multiplication“, and 2 active vars: • (n+4q) FP-mult • (n+3q) FP-add. • Depending on the activity information of arguments (i.e., 1st active / 2nd active / 1st and 2nd active) special routines for computing gradients and Hessians of binary functions f ∈ { + , – , * , / } are used.

c (2) Preaccumulation cientific omputing Preaccumulation on statement level; z=f ( x 1 ... x k ) • compute local gradients and Hessians w.r.t. x 1 ... x k , i.e., • in practice: k ≤ 5 • ∂ ∂ ∂ 2 2 z z z i=1,..,.k , , ∂ ∂ 2 ∂ ∂ x x x x j > i i i i j Computation of global gradients/Hessians („recombination“) by : •

c (2) Preaccumulation cientific omputing • z=f ( x 1 ... x k ) • In practice: k ≤ 5 ⇒ shapes of local derivative objects are: and • Preaccumulation of local derivatives either in Forward mode, or by propagation of Taylor coefficients • Preaccumulation and recombination by special sequence of subroutine calls, depending on the sparsity pattern of the local 5x5 Hessian

FM preaccumulation – example 1 c cientific omputing Original statement was: Performed by fpinit: Special subroutine for initializing local gradients Performed by fpmula3: Special preaccumulation routine for multiplication where the local Hessians of both arguments are known to be zero

FM preaccumulation – example 1 c cientific omputing Original statement was: lh 3 = Performed by fpmula1: Special preaccumulation routine for multiplication where the local Hessian of the 2nd argument is known to be zero

FM preaccumulation – example 1 c cientific omputing Original statement was: lh 3 = Performed by accumhg1: Update of global Hessian using one local gradient. For updating the global Hessian due to 2,3,4,5 local gradients, routines accumhg [2,3,4,5] are used

FM preaccumulation – example 1 c cientific omputing Original statement was: Sparsity information of the local Hessian is known. → use it in the update of the global Hessian Two extreme possibilities: 1. One single accumulation routine for the whole local Hessian → many checks at runtime lh 3 = 2. One subroutine call per single non-zero entry → more subroutine calls, memory accesses Current implementation makes a compromise by Performed by accumh1: generating a sequence of routines from the Update of one diagonal set accumh [1,2,3,4,5,6,7,8] where at most 3 entry in global Hessian runtime checks per subroutine are performed. due to the local Hessian

FM preaccumulation – example 1 c cientific omputing Original statement was: Without pre- n = # indep. vars accumulation, computing ∇ 2 f q = (n+1)·n / 2 would cost: 8q Mult, 6q Add. For large n, these operations are expensive 1q Mult. 2q Mult , 1q Add. 3q Mult , 1q Add.

FM preaccumulation – example 2 c cientific omputing Original statement was: lh 5 =

FM preaccumulation – example 2 c cientific omputing Original statement was: lh 5 = performed by: accumhg3 accumh5 accumh4

FM preaccumulation – example 2 c cientific omputing Original statement was: Without preaccumulation, computing ∇ 2 f would cost: 8q Mult, 6q Add. 3q Mult , 2q Add. 3q Mult , 2q Add. 5q Mult , 3q Add. 11q Mult , 7q Add.

c cientific When use preaccumulation? omputing • Several switching strategies for preaccumulation, globally controlled by the user • Switch to preaccumulation, if – there is more than one operator [OG1] – the number of operators is greater than the number of active variables on the RHS [OPVAR] – the number of operators is greater than the number of active variables on the RHS plus 1 [OPVAR1] – there are three or more active variables on the RHS [VG3] • Performance model based on flop & memory access count • Actual machine-specific timing information (currently broken) • Switch off preaccumulation, if – the number of active variables on RHS is greater than five – Intrinsic functions (other than + – * / ) are used

c Preaccumulation & ADIC 1.1b4 cientific omputing • Interface to ADIC 1.1b4 and ADIFOR 3.0 is AIF • Hessian module identifies variables by name • In ADIC 1.1b4, every occurence of a RHS variable gets a new name: f = a1*a2*a3 f = a*a*a ADIC 1.1b4 • Hessian Module can‘t recognize that a1,a2,a3 are all the same

c cientific (3) Hessian-vector product (H · v) omputing n = # indep. vars k = # cols. v For every active variable x, propagate: • g_x = ( ∇ x , v T · ∇ x ) : array of length n+k (k= #columns of v ) • h_x = ∇ 2 x · v : 2D-array of dimension (n,k)

c cientific (4) Projected Hessians omputing • Symmetric projected Hessian (v T ·H·v , v ∈ R n × k , H ∈ R n × n ): – use the same code as in standard forward mode – for every active variable x, propagate g_x = v T · ∇ x and h_x = v T · ∇ 2 x · v – symmetric storage scheme can be employed for h_x – size for g_x is: k , size for h_x is: (k+1)*k / 2 Unsymmetric projected Hessian (v T ·H·w, w ∈ R n × m ) • – use the same code as for the Hessian-vector product – for every active variable x, propagate • g_x = (v T · ∇ x , ∇ x · w ) array of length (k+m) • h_x = v T ·H·w 2D-array of size (k,m)

c cientific (5) Global Taylor mode omputing • Propagate 1st and 2nd order Taylor coefficients , • Rules similar to Global forward mode: � Very efficient for sparse Hessians � Sparsity pattern needed • For length- n gradients and sparse n x n – Hessians with s off-diagonal entries, k=n+s univariate Taylor series are required • , are k -vectors

c cientific Future perspectives I omputing • Parallel computation of (dense) Hessians using OpenMP: • Work on Hessian objects (1D-packed arrays) explicitly distributed, schedule static • Redundant computation of original function and gradients • OpenMP‘s orphaning orphaning concept allows parallel constructs outside the lexical scope of a parallel region → automatically generated wrapper with OpenMP directives (done) → Additional OpenMP directives in the libraries that are actually computing 1st and 2nd order derivatives (done) → Experimental implementation with ADIFOR 3.0: user invokes parallel AD-code the same way like serial code

c cientific Future perspectives II omputing m ∑ • Partially separable function = f ( x ) f i x ( ) = i 1 • Element functions f i (x) have Hessians of rank < n • Symmetric projection v T · ∇ 2 f (x)· v can be used to compute Hessians of element functions • Parallelism • Synchronization when updating global Hessian of f • Sparsity patterns of ∇ 2 f i (x) must be available

c cientific Future perspectives III omputing • Sparsity pattern of Hessian not not available • Use SparsLinC (Bischof, Khademi, Bouaricha, Carle) for computing sparse Hessians • Since gradients and Hessians both are stored in 1D-Arrays, the standard SparsLinC operation N ∑ = w a * v with sparse vectors w and v i i i = i 1 + := can be used for • A new „SparsLinC-like“ routine for sparse symmetric outer product is needed := • Appropriate sequence of SparsLinC calls by the Hessian Module

c cientific Concluding Remarks omputing • Several strategies for computing Hessians: – Global forward mode , Global Taylor mode • without preaccumulation • with forward preaccumulation • with Taylor preaccumulation – Different switching strategies for preaccumulation • Some ideas for future directions, e.g., parallel computation of Hessians • Successfully computed 2nd derivatives for Aachen CFD code TFS („The Flow Solver“) using ADIFOR 3.0 and the Hessian Module • TFS consists of ~24,000 lines of (mostly) Fortran code, in 220 subroutines

c cientific The Hessian Module omputing Current status and future - PowerPoint PPT Presentation

c cientific The Hessian Module omputing Current status and future perspectives Reference: J. Abate, C. Bischof, A. Carle, L. Roh: Algorithms and Design for a Second- Order Automatic Differentiation Module Int. Symposium on Symbolic

PTCL Teresa Palomero, PhD Institute for Cancer Genetics Department of Pathology and Cell Biology

Basic Text Processing Regular Expressions Regular expressions A formal

GCF 26th Meeting of the Board (B.26) Margaret-Ann Splawn Active Private Sector Observer to the

I NTRODUCTION ( n = 1) Leau-Fatou flower theorem (1920). M ARCO A BATE (U NIVERSIT DI P ISA ) M

3D Dynamic Scene Graphs Actionable Spatial Perception with Places, Objects, and Humans Antoni

May a City Ordinance . . . 1. Limit the number of cats per residence? 2. Set the rates charged by

NLG, Wrap up Surface realizer Linearization SimpleNLG Lexicon Scott Farrar Design ideas

Climate Change Economics Dr. Shana McDermott Department of Economics Climate Changed FYE- Fall

Proportional Lumpability Andrea Marin 1 Carla Piazza 2 Sabina Rossi 1 1 Universit` a Ca Foscari

[T HREAD S AFETY ] Threads have you in a bind? With Objects and Concurrency at play Are nerves

Financial Disclosure I have no financial interests or relationships to disclose. The presentation

improving agricultural production efficiency Habtamu Yesigat Ayenew 1 , Elisenda Estruch 2 ,

Global Climate Games: How Pricing and a Green Fund Foster Cooperation Peter Cramton, University

Leveraging solver preferences to tame your package managers Roberto Di Cosmo (I+i+D), Pietro

Adoption of academic tools in open source communities: the Debian case study Pietro Abate ,

MSHA Investigations and Document Requests Mine Safety & Health Law

The Challenge of Global Warming: Economic Models and Environmental Policy William Nordhaus (Yale)

ECON4910 Environmental economics, Spring 2015 Michael Hoel: Lecture 12: International

BIG TALKS ABOUT THE FUTURE OF Oct 13 Hybrid Workforce | Sarabeth Berk Oct 27 Streets | Jeff

Disclaimer The information provided in this seminar is general in nature and not intended to be

WHAT YOU SHOULD KNOW BEFORE PUTTING ON A RESPIRATOR; WHAT TCEQ & OSHA HAVE TO SAY. DATE

News from EDOS: Finding Outdated Packages Ralf Treinen PPS, Universit e Paris Diderot Debconf

EXTERNALITIES BASIC CONCEPT Almost every economic action of consumers and producers creates

THE UNIFORM ENVIRONMENTAL COVENANTS ACT The Basics, the Benefits, the Challenges U.S.

c cientific The Hessian Module omputing Current status and future - PowerPoint PPT Presentation

c cientific The Hessian Module omputing Current status and future perspectives Reference: J. Abate, C. Bischof, A. Carle, L. Roh: Algorithms and Design for a Second- Order Automatic Differentiation Module Int. Symposium on Symbolic

PTCL Teresa Palomero, PhD Institute for Cancer Genetics Department of Pathology and Cell Biology

Basic Text Processing Regular Expressions Regular expressions A formal

GCF 26th Meeting of the Board (B.26) Margaret-Ann Splawn Active Private Sector Observer to the

I NTRODUCTION ( n = 1) Leau-Fatou flower theorem (1920). M ARCO A BATE (U NIVERSIT DI P ISA ) M

3D Dynamic Scene Graphs Actionable Spatial Perception with Places, Objects, and Humans Antoni

May a City Ordinance . . . 1. Limit the number of cats per residence? 2. Set the rates charged by

NLG, Wrap up Surface realizer Linearization SimpleNLG Lexicon Scott Farrar Design ideas

Climate Change Economics Dr. Shana McDermott Department of Economics Climate Changed FYE- Fall

Proportional Lumpability Andrea Marin 1 Carla Piazza 2 Sabina Rossi 1 1 Universit` a Ca Foscari

[T HREAD S AFETY ] Threads have you in a bind? With Objects and Concurrency at play Are nerves

Financial Disclosure I have no financial interests or relationships to disclose. The presentation

improving agricultural production efficiency Habtamu Yesigat Ayenew 1 , Elisenda Estruch 2 ,

Global Climate Games: How Pricing and a Green Fund Foster Cooperation Peter Cramton, University

Leveraging solver preferences to tame your package managers Roberto Di Cosmo (I+i+D), Pietro

Adoption of academic tools in open source communities: the Debian case study Pietro Abate ,

MSHA Investigations and Document Requests Mine Safety &amp; Health Law

The Challenge of Global Warming: Economic Models and Environmental Policy William Nordhaus (Yale)

ECON4910 Environmental economics, Spring 2015 Michael Hoel: Lecture 12: International

BIG TALKS ABOUT THE FUTURE OF Oct 13 Hybrid Workforce | Sarabeth Berk Oct 27 Streets | Jeff

Disclaimer The information provided in this seminar is general in nature and not intended to be

WHAT YOU SHOULD KNOW BEFORE PUTTING ON A RESPIRATOR; WHAT TCEQ &amp; OSHA HAVE TO SAY. DATE

News from EDOS: Finding Outdated Packages Ralf Treinen PPS, Universit e Paris Diderot Debconf

EXTERNALITIES BASIC CONCEPT Almost every economic action of consumers and producers creates

THE UNIFORM ENVIRONMENTAL COVENANTS ACT The Basics, the Benefits, the Challenges U.S.

MSHA Investigations and Document Requests Mine Safety & Health Law

WHAT YOU SHOULD KNOW BEFORE PUTTING ON A RESPIRATOR; WHAT TCEQ & OSHA HAVE TO SAY. DATE