Progress with M ATLAB S ource transformation AD MSAD Rahul Kharche - - PowerPoint PPT Presentation

progress with m atlab s ource transformation ad
SMART_READER_LITE
LIVE PREVIEW

Progress with M ATLAB S ource transformation AD MSAD Rahul Kharche - - PowerPoint PPT Presentation

Progress with M ATLAB S ource transformation AD MSAD Rahul Kharche Cranfield University, Shrivenham R.V.Kharche@Cranfield.ac.uk AD Fest 2005, Nice 14 th - 15 th April 2005 MSAD p. 1/18 Agenda Project Goals Previous work on MSAD Further


slide-1
SLIDE 1

Progress with MATLAB Source transformation AD

MSAD

Rahul Kharche Cranfield University, Shrivenham

R.V.Kharche@Cranfield.ac.uk

AD Fest 2005, Nice 14th - 15th April 2005

MSAD – p. 1/18

slide-2
SLIDE 2

Agenda

Project Goals Previous work on MSAD Further developments Test results from MATLAB ODE examples MINPACK optimisation problems bvp4cAD Summary Future Directions References

MSAD – p. 2/18

slide-3
SLIDE 3

Project Goals

Enhance performance by eliminating overheads introduced by operator overloading in MAD [For04] Explore MATLABa source analysis and transformation techniques to aid AD Create a portable tool that easily integrates with MATLAB based solvers Provide control over readability of generated code Provide an array of selectable AD specific optimisations

aMATLAB is a trademark of The MathWorks, Inc.

MSAD – p. 3/18

slide-4
SLIDE 4

Previous work on MSAD

Was shown to successfully compute the gradient/Jacobian

  • f MATLAB programs involving vector valued functions

using the forward mode of AD and source transformation [Kha04] Augmented code generated by inlining the fmad class

  • perations from MAD

the derivvec class continued to hold the derivatives and perform derivative combinations resulted in a hybrid approach analogous to [Veh01] Simple forward dependence based activity analysis Active independent variables and supplementary shape size information can be provided through user directives inserted in the code

MSAD – p. 4/18

slide-5
SLIDE 5

Previous work on MSAD (contd.)

Rudimentary size(shape) and type(constant, real, imaginary) inference Thus removed one level of overheads encountered in MAD giving discernible savings over MAD for small problem sizes but these savings grew insignificant as the problem size was increased

MSAD – p. 5/18

slide-6
SLIDE 6

Further developments

Now uses size, type inference to specialise and further inline derivvec class operations Optionally generates code for holding and propagating sparse derivatives Incorporated sparsity inference (propagating MATLAB sparse types for derivative variables) if S implies a sparse operand and F full, then rules such as S + F → F, S ∗ F → F

  • S. ∗ F → S, S&F → S

T = S(i, j) → T is sparse, if i, j are vectors T(i, j) = S → T retains its full or sparse storage type are applied

MSAD – p. 6/18

slide-7
SLIDE 7

Further developments (contd.)

Run-times are obtained using MATLAB 7.0 on a Linux machine with a 2.8GHz Pentium-4 processor and 512MB of RAM.

MSAD – p. 7/18

slide-8
SLIDE 8

previous Results - Brusselator ODE

40 80 160 320 640 1280 2560 5120 3 6 13 30 65 130 253 600 1200 2500 5000 15000 n (log scale) CPU(JF)/CPU(F) (log scale) BrusselatorODE CPU(JF)/CPU(F) Vs n NUMJAC (full) MSAD (full) MAD (full) NUMJAC (comp) MSAD (comp) MAD (comp) MSAD (sparse) MAD (sparse)

5 10 15 20 25 30 5 10 15 20 25 30 nz = 124 Jacobian Sparsity − Brussode (n = 32)

30% improvement over MAD for small n down to 4% for large n, with compression performance matches that of finite-differencing, numjac(vec), only asymptotically

MSAD – p. 8/18

slide-9
SLIDE 9

previous Results - Brusselator ODE

40 80 160 320 640 1280 2560 5120 3 6 13 30 65 130 253 600 1200 2500 5000 15000 n (log scale) CPU(JF)/CPU(F) (log scale) BrusselatorODE CPU(JF)/CPU(F) Vs n NUMJAC (full) MSAD (full) MAD (full) NUMJAC (comp) MSAD (comp) MAD (comp) MSAD (sparse) MAD (sparse)

30% improvement over MAD for small n down to 4% for large n, with compression performance matches that of finite-differencing, numjac(vec), only asymptotically using sparse derivatives, performance converges asymptotically to that of MAD almost exponentially increasing savings over full evaluation with increasing n

40 320 640 1280 2560 5120 10 20 30 40 50 60 n CPU(JF)/CPU(F) BrusselatorODE CPU(JF)/CPU(F) Vs n NUMJAC (comp) MSAD (comp) MAD (comp) MSAD (sparse) MAD (sparse) MSAD – p. 8/18

slide-10
SLIDE 10

Results - Brusselator ODE

40 80 160 320 640 1280 2560 5120 10240 20480 40960 3 6 13 30 65 130 253 400 n (log scale) CPU(JF)/CPU(F) (log scale) BrusselatorODE CPU(JF)/CPU(F) Vs n NUMJAC (comp) MSAD (comp) MAD (comp) MSAD (sparse) MAD (sparse)

5 10 15 20 25 30 5 10 15 20 25 30 nz = 124 Jacobian Sparsity − Brussode (n = 32)

91% → 30% speedup over MAD with increasing n using compression

  • utperforms numjac(vec) n = 640 onward, with gains upto 25%

MSAD – p. 9/18

slide-11
SLIDE 11

Results - Brusselator ODE

40 80 160 320 640 1280 2560 5120 10240 20480 40960 3 6 13 30 65 130 253 400 n (log scale) CPU(JF)/CPU(F) (log scale) BrusselatorODE CPU(JF)/CPU(F) Vs n NUMJAC (comp) MSAD (comp) MAD (comp) MSAD (sparse) MAD (sparse)

91% → 30% speedup over MAD with increasing n using compression

  • utperforms numjac(vec) n = 640 onward, with gains upto 25%

decreasing relative speedup, but a small constant saving, over MAD using sparse derivatives

40 5120 10240 20480 40960 50 100 150 200 250 300 n CPU(JF)/CPU(F) BrusselatorODE CPU(JF)/CPU(F) Vs n NUMJAC (comp) MSAD (comp) MAD (comp) MSAD (sparse) MAD (comp) MSAD – p. 9/18

slide-12
SLIDE 12

Results - Burgersode ODE

16 32 64 128 256 512 1024 2048 4096 8192 1638432768 12 18 30 50 80 150 250 n (log scale) CPU(JF)/CPU(F) (log scale) BurgersODE CPU(JF)/CPU(F) Vs n NUMJAC (comp) MSAD (comp) MAD (comp) MSAD (sparse) MAD (sparse)

5 10 15 20 25 30 5 10 15 20 25 30 nz = 340 Jacobian Sparsity − Burgersode (n = 32)

87% → 37% speedup over MAD with increasing n, using compression

  • utperforms numjac n = 64 onward, with gains between 28% and 45%

MSAD – p. 10/18

slide-13
SLIDE 13

Results - Burgersode ODE

16 32 64 128 256 512 1024 2048 4096 8192 1638432768 12 18 30 50 80 150 250 n (log scale) CPU(JF)/CPU(F) (log scale) BurgersODE CPU(JF)/CPU(F) Vs n NUMJAC (comp) MSAD (comp) MAD (comp) MSAD (sparse) MAD (sparse)

87% → 37% speedup over MAD with increasing n, using compression

  • utperforms numjac n = 64 onward, with gains between 28% and 45%

decreasing relative speedup, but a small constant saving, over MAD using sparse derivatives

16 4096 8192 16384 32768 50 100 150 200 250 n CPU(JF)/CPU(F) BurgersODE CPU(JF)/CPU(F) Vs n NUMJAC (comp) MSAD (comp) MAD (comp) MSAD (sparse) MAD (sparse) MSAD – p. 10/18

slide-14
SLIDE 14

Results - Data Fitting problem

200 400 600 800 1000 1200 1400 1600 1800 2000 10

1

10

2

10

3

10

4

n CPU(JF)/CPU(F) (log scale) Data−Fitting CPU(JF)/CPU(F) Vs n (m = 4) NUMJAC (full) MSAD (full) MAD (full) MSAD (sparse) MAD (sparse)

  • utperforms both MAD and numjac in direct evaluation of the Jacobian by > 60%

MSAD – p. 11/18

slide-15
SLIDE 15

Results - Data Fitting problem

200 400 600 800 1000 1200 1400 1600 1800 2000 10

1

10

2

10

3

10

4

n CPU(JF)/CPU(F) (log scale) Data−Fitting CPU(JF)/CPU(F) Vs n (m = 4) NUMJAC (full) MSAD (full) MAD (full) MSAD (sparse) MAD (sparse)

  • utperforms both MAD and numjac in direct evaluation of the Jacobian by > 60%

if we take note of the sparsity in the Jacobian of the intermediate Vandermonde matrix [For04] and use sparse derivatives, we get an order of magnitude improvement over numjac, but a decreasing relative improvement over MAD

5 10 5 10 15 20 25 30 35 40 nz = 30 Vandermonde matrix Jacobian Sparsity − DataFit (n = 10, m = 4) MSAD – p. 11/18

slide-16
SLIDE 16

Observations

Significantly better performance using Jacobian compression compared to other methods and to numjac, MAD and the previous approach using compression, even for large n MSAD using full evaluation of the Jacobian performs well compared to MAD and numjac using full Decrease in relative performance with increasing n, when using sparse derivatives.

MSAD – p. 12/18

slide-17
SLIDE 17

Observations

Significantly better performance using Jacobian compression compared to other methods and to numjac, MAD and the previous approach using compression, even for large n MSAD using full evaluation of the Jacobian performs well compared to MAD and numjac using full When using the full or the compressed mode, the generated code contains only native data-types qualifying it for any MATLAB JIT-Acceleration Decrease in relative performance with increasing n, when using sparse derivatives. This can be attributed to the larger overheads in manipulating the internal sparse representation of a matrix, making any savings relatively small

MSAD – p. 12/18

slide-18
SLIDE 18

Results - MINPACK problems

4 16 64 256 1024 4096 16384 10 20 50 110 250 600 1500 n (log scale) CPU(gf)/CPU(f) (log scale) MINPACK − DGL2 CPU(gf)/CPU(f) Vs n NUMJAC (full) MSAD (full) MAD (full) MSAD (sparse) MAD (sparse) 4 16 64 256 1024 4096 16384 65536 5 10 25 60 160 400 900 2000 n (log scale) CPU(gf)/CPU(f) (log scale) MINPACK − DSSC CPU(gf)/CPU(f) NUMJAC (full) MSAD (full) MAD (full) MSAD (sparse) MAD (sparse)

Results from 2-D Ginzburg-Landau and Steady-state combustion problems using full derivatives to evaluate the gradient shows 80% → 50% improvement over MAD, and outperforms numjac by a similar margin over medium and large n

MSAD – p. 13/18

slide-19
SLIDE 19

Results - MINPACK problems

4 16 64 256 1024 4096 16384 10 20 50 110 250 600 1500 n (log scale) CPU(gf)/CPU(f) (log scale) MINPACK − DGL2 CPU(gf)/CPU(f) Vs n NUMJAC (full) MSAD (full) MAD (full) MSAD (sparse) MAD (sparse) 4 16 64 256 1024 4096 16384 65536 5 10 25 60 160 400 900 2000 n (log scale) CPU(gf)/CPU(f) (log scale) MINPACK − DSSC CPU(gf)/CPU(f) NUMJAC (full) MSAD (full) MAD (full) MSAD (sparse) MAD (sparse)

Results from 2-D Ginzburg-Landau and Steady-state combustion problems using full derivatives to evaluate the gradient shows 80% → 50% improvement over MAD, and outperforms numjac by a similar margin over medium and large n using sparse derivatives shows a vast improvement over MAD here, 75% → 85%

MSAD – p. 13/18

slide-20
SLIDE 20

Results - MINPACK problems

4 16 64 256 1024 4096 16384 10 20 50 110 250 600 1500 n (log scale) CPU(gf)/CPU(f) (log scale) MINPACK − DGL2 CPU(gf)/CPU(f) Vs n NUMJAC (full) MSAD (full) MAD (full) MSAD (sparse) MAD (sparse) 4 16 64 256 1024 4096 16384 65536 5 10 25 60 160 400 900 2000 n (log scale) CPU(gf)/CPU(f) (log scale) MINPACK − DSSC CPU(gf)/CPU(f) NUMJAC (full) MSAD (full) MAD (full) MSAD (sparse) MAD (sparse)

Results from 2-D Ginzburg-Landau and Steady-state combustion problems using full derivatives to evaluate the gradient shows 80% → 50% improvement over MAD, and outperforms numjac by a similar margin over medium and large n using sparse derivatives shows a vast improvement over MAD here, 75% → 85% caused by redundant computations involving some inactive intermediates treated as active in MAD [For04, 7-8]

MSAD – p. 13/18

slide-21
SLIDE 21

Results - Smaller problems

Ratio CPU(Jf)/CPU(f) Problem n numjac MSAD MAD Coating Thickness Standardization 134 256.28 49.65 107.87 Pollution ODE 20 10.86a 9.84 113.37 Combustion of Propane - Full 11 22.22 35.03 394.29 Human Heart Dipole 8 23.12 53.08 737.16 Chemical AzkoNobel 6 16.24 17.22 252.71 Combustion of Propane - Reduced 5 20.67 64.94 921.80 Amplifier DAE 5 13.86 15.08 170.11 Enzyme Reaction 4 18.69 9.51 111.89 Robertson ODE 3 11.05 10.48 124.22 Smaller sized problems from MINPACK, Test set for IVPs and MATLAB ODE examples almost all cases show an order of magnitude speedup over MAD performance is fairly close to that of finite-differencing(numjac), in four cases better

afunction vectorised to the advantage of numjac

MSAD – p. 14/18

slide-22
SLIDE 22

Results - bvp4cAD

I II III IV V VI VII VIII 1 2 3 4 5 6 problem absolute time (s) Absolute run−times to obtain solutions of BVP problems bvp4c − FD bvp4c − MSAD bvp4c − MAD

1 2 3 4 5 6 7 8 2 4 6 8 10 12 Test problems CPU time in sec bvp4cFD bvp4cMSAD bvp4cMAD

Results from bvp4c using MSAD (previous results on the right) significant speedup over previously adopted hybrid approach in MSAD performance better than using numjaca in six of eight cases, and comparable otherwise

anote the improved speed using numjac compared to earlier results (previously MATLAB 6.5 was used)

MSAD – p. 15/18

slide-23
SLIDE 23

Summary

MSAD shows definite improvement in full and compressed Jacobian evaluation over MAD and numjac

  • rder of magnitude speedup in small and medium sized

test cases In problems with sparsity in the derivatives of results or intermediates, using sparse derivatives in MAD and MSAD shows a large saving over the full evaluation of gradients/Jacobian In general, MSAD shows only a constant saving over MAD using sparse derivatives. In certain cases larger gains may be obtained Use of only native data types in the output code allows MATLAB JIT to perform some run-time optimisations

MSAD – p. 16/18

slide-24
SLIDE 24

Future Directions

Feature enhancement Support for branching constructs involving active variables Handle cells and structures Incorporate exception handling to trap non-differentiability and syntactic errors Improving performance Optimising generated code using dependency analysis (CFG, call-graphs) Use more refined shape inference techniques Apply constant folding Testing Include a mechanism for systematic testing Construct a comprehensive test suite

MSAD – p. 17/18

slide-25
SLIDE 25

References

[ASU86] A.V. Aho, R. Sethi, and J.D. Ullman. Compilers, Principles, Techniques, and

  • Tools. Computer Science. Addison-Wesley, Reading, Massachusetts, 1986.

[Eat02] J.W. Eaton. GNU Octave – a high-level language for numerical computations. http://www.octave.org, 2002. [For04] S.A. Forth. An efficient overloaded implementation of forward mode automatic differentiation in MATLAB. Technical report, Cranfield University (RMCS Shrivenham), Swindon, UK, June 2004. [Gri00]

  • A. Griewank. Evaluating Derivatives: Principles and Techniques of Algorithmic
  • Differentiation. Frontiers in Applied Mathematics. SIAM, Philadelphia, 2000.

[Kha04]

  • R. Kharche. Source transformation for AD in MATLAB. Masters thesis, AMOR,

Cranfield University, Shrivenham, UK, 2004. [SKF03] L.F . Shampine, R. Ketzscher, and S.A. Forth. Using AD to solve BVPs in

  • MATLAB. Technical report, Cranfield University (RMCS Shrivenham),

Swindon, UK, 2003. [Veh01]

  • A. Vehreschild. Semantic augmentation of MATLAB programs to compute
  • derivatives. Diploma thesis, Institute for Scientific Computing, Aachen

University, Germany, 2001. [Ver98]

  • A. Verma. Structured Automatic Differentiation. PhD thesis, Cornell University,

May 1998.

MSAD – p. 18/18