0 Towards Polyhedral Automatic Differentiation uckelheim 1,2 Navjot - PowerPoint PPT Presentation

Towards Polyhedral Automatic Differentiation uckelheim 1,2 Navjot Kukreja 1 Jan H¨ December 14, 2019 1 Imperial College London, UK 2 Argonne National Laboratory, USA 0

Recap: Automatic differentiation (AD) AD modes Andreas Griewank, Andrea Walther: Evaluating Derivatives Forward or reverse? • Infinitely many ways to implement primal, tangent, gradient • Some of them are more useful than others • Success story of AD: take inspiration from given program, which is hopefully a reasonable implementation of F • In this work: Derive efficient gradient/tangent from efficient primal? 1

Motivation: Efficiency ”End of Moore’s law” • Serial performance not growing for the last decade • Code does not get faster just by waiting a few years How to compute more? • Adapt to different processors (GPU, TPU, ...) • Expose and use parallelism • Use cache hierarchy well, e.g. tiling, cache blocking • Minimize passes over memory, e.g. fusion 2

AD approaches There are ways to categorize AD tools, for example: High level • ML frameworks, Halide, BLAS: define high-level operations, hide implementation details under the hood • AD operates on high level of abstraction • Problem: Limited expressiveness, someone needs to write gradient operators, composition of existing blocks is not always efficient Low level • Directly operate on low-level language (e.g. C) • Very expressive, general • Performance optimizations are not abstracted away, mixed into computation 3

Question: How should Automatic Differentiation respond? • Can we maintain correctness? • Can we maintain performance? 4

Question: How should Automatic Differentiation respond? • Can we maintain correctness? • Can we maintain performance? Something we can not do: • Just re-use the primal parallelization a c c Loop 1 Loop 2 b b • In reverse-mode AD, shared read (ok) becomes shared write (not ok) 5

Question: How should Automatic Differentiation respond? • Can we maintain correctness? • Can we maintain performance? Something we can not do: • Just re-use the primal parallelization a c c Loop 1 Loop 2 b b • In reverse-mode AD, shared read (ok) becomes shared write (not ok) Another thing we can not do • AD, then hand everything to optimizing compiler 6

Idea: Generate fast code inspired by primal • Out of all possible codes, generate the one that closely mimics the primal • Get as much information as possible from primal, to parallelize the derivative 7

Example: AD on a Stencil Figure 1: AD on a gather produces a scatter 8

1D Stencil Example The Stencil is originally a gather operation #pragma omp parallel for private(i) for ( i=1; i<=n - 1; i++ ) { r[i] = c[i]*(2.0*u[i-1]-3.0*u[i]+4*u[i+1]); } 9

1D Stencil Example AD converts it to a scatter for ( i=1; i<=n-1; i++ ) { ub[i-1] += 2.0 * c[i] * rb[i]; ub[i] -= 3.0 * c[i] * rb[i]; ub[i+1] += 4.0 * c[i] * rb[i]; } 10

Can we auto-optimize? for ( i=1; i<=n-1; i++ ) { ub[i-1] += 2.0 * c[i] * rb[i]; ub[i] -= 3.0 * c[i] * rb[i]; ub[i+1] += 4.0 * c[i] * rb[i]; } • Looked at in isolation, there are challenges: • Is the trip count large enough to make parallelization profitable? • Are ub, c, rb aliased? • So many ways to transform this, which one is best? • Would tiling help? What parameters are optimal? 11

PerforAD • Prototype to generate gradient code that looks like primal code • https://github.com/jhueckelheim/PerforAD • Primal and gradient performance end up being similar • Looks at loops in terms of iteration space, and statements • We are free to restructure code, as long as statement is applied to same overall iteration space 12

1D Stencil Example The scatter can be split into individual updates for ( i=1; i<=n-1; i++ ) { ub[i-1] += 2.0 * c[i] * rb[i]; } for ( i=1; i<=n-1; i++ ) { ub[i] -= 3.0 * c[i] * rb[i]; } for ( i=1; i<=n-1; i++ ) { ub[i+1] += 4.0* c[i] * rb[i]; } 13

1D Stencil Example Shift indices to write to loop counter element for ( j=0; j<=n-2; j++ ) { ub[j] += 2.0 * c[j+1] * rb[j+1]; } for ( j=1; j<=n-1; j++ ) { ub[j] -= 3.0 * c[j] * rb[j]; } for ( j=2; j<=n; j++ ) { ub[j] += 4.0 * c[j-1] * rb[j-1]; } 14

1D Stencil Example #pragma omp parallel for private(j) for ( j=2; j<=n-2; j++ ) { ub[j] += 2.0 * c[j+1] * rb[j+1]; ub[j] -= 3.0 * c[j] * rb[j]; ub[j] += 4.0 * c[j-1] * rb[j-1]; } ub[0] += 2.0 * c[1] * rb[1]; // ... other remainders: ub[1], ub[n-1], ub[n] 15

Higher dimensions In higher dimensions, we need remainders for edges and corners 16

Performance Results - Scalability Scalability of the Wave Equation on Broadwell 12 8 6 Primal 4 Speedup Adjoint Atomics 2 PerforAD Ideal 1 1 2 4 6 8 12 Number of Threads Figure 2: Speedups for the wave equation solver on a Broadwell processor, using up to 12 threads. The conventinal adjoint code with manual parallelisation does not scale at all. The primal and PerforAD-generated adjoint benefit from using all 12 cores. 17

Other optimizations • Good block sizes for primal and gradient are related. This should be leveraged 18

Conclusion, Future Work • We can automatically borrow ideas from primal to speed up gradient • Can also use this for reproducibility, roundoff • We have a paper: https://dl.acm.org/citation.cfm?doid=3337821.3337906 • Future work: • Try this with more examples • Try this with more diverse transformations • Need a better API to make this useful for more people 19

Thank you Thank you Questions? 20

References i 21

0 Towards Polyhedral Automatic Differentiation uckelheim 1,2 Navjot - PowerPoint PPT Presentation

0 Towards Polyhedral Automatic Differentiation uckelheim 1,2 Navjot Kukreja 1 Jan H December 14, 2019 1 Imperial College London, UK 2 Argonne National Laboratory, USA 0 1 Recap: Automatic differentiation (AD) AD modes Andreas Griewank,

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Computing the Cohomology Ring of a Polyhedral Complex Joint work with D. Kravatz, R.

A study of some pitfalls preventing peak performance in polyhedral compilation using a polyhedral

The Polyhedral Model Beyond Loops Recursion Optimization and Parallelization Through Polyhedral

Computing the Cohomology Algebra of a Polyhedral Complex Joint work with R. Gonzalez-Diaz &

AlphaZ: A System for Design Space Exploration in the Polyhedral Model Tomofumi Yuki, Gautam

Differentiation Differentiation stems from beliefs about differences among learners, how they

JUST THE MATHS SLIDES NUMBER 10.3 DIFFERENTIATION 3 (Elementary techniques of

JUST THE MATHS SLIDES NUMBER 10.4 DIFFERENTIATION 4 (Products and quotients) &

4.4. Vertical Differentiation Matilde Machado Industrial Organization- Matilde Machado Vertical

Beautiful differentiation Conal Elliott LambdaPix 1 September, 2009 ICFP Conal Elliott

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Differentiation Tools for FreeFem++ Workshop FreeFem++ Sylvain Auliac (

CSC321 Lecture 10: Automatic Differentiation Roger Grosse Roger Grosse CSC321 Lecture 10:

CSC421/2516 Lecture 6: Automatic Differentiation Roger Grosse and Jimmy Ba Roger Grosse and

CHIJ St Josephs Convent Sec 2 Parent-Teacher Meeting (PTM) 12 Jan 2017 Sec Sec 2 PTM 2 PTM

2019 Sec 3 Express Subject Options Briefing CHIJ KATONG CONVENT 11 APRIL 2018 Objectives 1.

CONVENT OF THE HOLY INFANT JESUS SECONDARY (TOWN CONVENT) School Leaders Principal Mrs Karen

Herefordshire and Worcestershires Sustainability and Transformation Partnerships delivery

Til death do us part: the exalted position of the widow in medieval Normandy Prof.

Cele lebrating Mary ry Ward Week 2017 Mary ry Ward 1585 - 1645 Mary Ward North

Cops and Robber games and applications Nicolas Nisse Inria, France Univ. Nice-Sophia Antipolis,

Cops and Robbers on Graphs David Ellison RMIT, School of Science david.ellison2@rmit.edu.au

0 Towards Polyhedral Automatic Differentiation uckelheim 1,2 Navjot - PowerPoint PPT Presentation

0 Towards Polyhedral Automatic Differentiation uckelheim 1,2 Navjot Kukreja 1 Jan H December 14, 2019 1 Imperial College London, UK 2 Argonne National Laboratory, USA 0 1 Recap: Automatic differentiation (AD) AD modes Andreas Griewank,

Polyhedral Volumes Visual Techniques T. V. Raman &amp; M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman &amp; M. S. Krishnamoorthy Polyhedral Volumes

Computing the Cohomology Ring of a Polyhedral Complex Joint work with D. Kravatz, R.

A study of some pitfalls preventing peak performance in polyhedral compilation using a polyhedral

The Polyhedral Model Beyond Loops Recursion Optimization and Parallelization Through Polyhedral

Computing the Cohomology Algebra of a Polyhedral Complex Joint work with R. Gonzalez-Diaz &amp;

AlphaZ: A System for Design Space Exploration in the Polyhedral Model Tomofumi Yuki, Gautam

Differentiation Differentiation stems from beliefs about differences among learners, how they

JUST THE MATHS SLIDES NUMBER 10.3 DIFFERENTIATION 3 (Elementary techniques of

JUST THE MATHS SLIDES NUMBER 10.4 DIFFERENTIATION 4 (Products and quotients) &amp;

4.4. Vertical Differentiation Matilde Machado Industrial Organization- Matilde Machado Vertical

Beautiful differentiation Conal Elliott LambdaPix 1 September, 2009 ICFP Conal Elliott

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Differentiation Tools for FreeFem++ Workshop FreeFem++ Sylvain Auliac (

CSC321 Lecture 10: Automatic Differentiation Roger Grosse Roger Grosse CSC321 Lecture 10:

CSC421/2516 Lecture 6: Automatic Differentiation Roger Grosse and Jimmy Ba Roger Grosse and

CHIJ St Josephs Convent Sec 2 Parent-Teacher Meeting (PTM) 12 Jan 2017 Sec Sec 2 PTM 2 PTM

2019 Sec 3 Express Subject Options Briefing CHIJ KATONG CONVENT 11 APRIL 2018 Objectives 1.

CONVENT OF THE HOLY INFANT JESUS SECONDARY (TOWN CONVENT) School Leaders Principal Mrs Karen

Herefordshire and Worcestershires Sustainability and Transformation Partnerships delivery

Til death do us part: the exalted position of the widow in medieval Normandy Prof.

Cele lebrating Mary ry Ward Week 2017 Mary ry Ward 1585 - 1645 Mary Ward North

Cops and Robber games and applications Nicolas Nisse Inria, France Univ. Nice-Sophia Antipolis,

Cops and Robbers on Graphs David Ellison RMIT, School of Science david.ellison2@rmit.edu.au

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Polyhedral Volumes Visual Techniques T. V. Raman & M. S. Krishnamoorthy Polyhedral Volumes

Computing the Cohomology Algebra of a Polyhedral Complex Joint work with R. Gonzalez-Diaz &

JUST THE MATHS SLIDES NUMBER 10.4 DIFFERENTIATION 4 (Products and quotients) &