stce
play

STCE Mixed Integer Programming for Call Tree Reversal J. Lotz, U. - PowerPoint PPT Presentation

STCE Mixed Integer Programming for Call Tree Reversal J. Lotz, U. Naumann, S. Mitra LuFG Informatik 12: Software and Tools for Computational Engineering (STCE) RWTH Aachen University and Chemical Engineering, Carnegie Mellon University


  1. STCE Mixed Integer Programming for Call Tree Reversal J. Lotz, U. Naumann, S. Mitra LuFG Informatik 12: Software and Tools for Computational Engineering (STCE) RWTH Aachen University and Chemical Engineering, Carnegie Mellon University [CSC16, Albuquerque, NM, Oct. 10–13, 2016]

  2. STCE Outline Motivation Call Tree Reversal Conclusion and Outlook , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 2

  3. STCE Outline Motivation Call Tree Reversal Conclusion and Outlook , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 3

  4. Motivation STCE Checkpointing of Adjoint Simulations Consider solution x τ = x ( τ ) of initial value problem (IVP) dx R n , x (0) = x 0 dt = f ( t, x ) , t ≥ 0 , x = x ( t ) ∈ I at target time τ > 0 . Gradient-based calibration of initial condition x 0 benefits from adjoint T x 0 = dx τ x τ ∈ I R n , ¯ · ¯ dx 0 R n is gradient of, e.g, least-squares objective matching solution of x τ ∈ I where ¯ x τ to given observations. x 0 amounts to solution of adjoint IVP Computation of ¯ T − d ¯ dt = d x f ( t, x ) R n , ¯ x τ . · ¯ x, τ ≥ t ≥ 0 , ¯ x = ¯ x ( t ) ∈ I x ( τ ) = ¯ dx , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 4

  5. Motivation STCE Data Flow Reversal W.l.o.g, explicit Euler time stepping yields for ∆ t = τ/m ◮ primal x i +1 = x i + ∆ t · f ( x i ) , i = 0 , . . . , m − 1 ◮ algorithmic adjoint T x i +1 + ∆ t · d f x i = ¯ ( x i ) · ¯ x i +1 , ¯ i = m − 1 , . . . , 0 dx ◮ symbolic adjoint T x i +1 + ∆ t · d f x i = ¯ ( x i +1 ) · ¯ x i +1 , ¯ i = m − 1 , . . . , 0 dx Note use of primal iterates in reverse order! ◮ A. Griewank: Achieving Logarithmic Growth of Temporal and Spatial Complexity in Reverse Automatic Differentiation, Opt. Meth. Softw. 1, 35-54 (1992). , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 5

  6. Motivation STCE DAG Reversal 4 5 Problem: Recovery of | V | = n + q non-persistent (vertex) values in reverse order ( v 5 , . . . , v − 1 ); 3 extreme cases: store-all, recompute-all Objective: Minimization of Primal Reevaluation 2 Cost (PRC) for given upper bound M on Persistent Memory Requirement (PMR) 1 Complexity: Fixed Cost ( n + q ) Minimum Mem- -1 0 ory Data Flow Reversal by reduction from Vertex Cover solvable by O ( n + q ) instances G = ( V, E ) of Fixed Memory Minimum Cost Data Flow Reversal n = 2; q = 5 ◮ U.N.: DAG Reversal is NP-Complete, J. Disc. Alg. 7(4), 402-410 (2009). , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 6

  7. STCE Outline Motivation Call Tree Reversal Conclusion and Outlook , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 7

  8. Call Tree Reversal STCE Problem Description and Computational Complexity 1 f primal 10 10 1 g augmented primal adjoint 100 100 store arguments restore arguments 1 h 1000 Objective: Reversal scheme R : E → { 0 , 1 } | E | minimizing PRC for upper bound M on PMR and given annotated call tree T = ( V, E ) Extreme Cases: R = 0 (fully split; checkpoint none); R = 1 (fully joint; checkpoint all) ◮ U.N.: Call Tree Reversal is NP-Complete, LNCSE 64, 13-22 (2008). , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 8

  9. Call Tree Reversal STCE Example: Let MEM = 1110 ... f f f f +10 +10 −10 −10 +10 +10 −10 −10 R=(0,0): R=(1,1): g g +1 g −1 g g +100 +100 −100 −100 100 100 +100 +100 −100 −100 h h h +1 h −1 h h +1000 −1000 1000 1000 +1000 −1000 MEM=1220, OPS=0 MEM=1110, OPS=2200 Notation: Set S of subprogram calls; Number n [ i ] of subprogram calls in i ∈ S ; Set χ ( i ) of callees; PMR m [ i ] = � n [ i ] j =0 m [ i ] j ; Reversal scheme R = ( r [ i ]) i ∈ S ; → ← PMR M [ i ] after augmented primal run; PMR M [ i ] prior to adjoint run; PMR ↓ M [ i ] of argument checkpoint , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 9

  10. Call Tree Reversal STCE Example: Let MEM = 1110 ... Greedy Heuristics f f +10 +10 −10 −10 Smallest Memory Increase starts R=(0,1): g g from R = 1 and yields . . . +100 +100 −100 −100 Largest Memory Decrease (LMD) h h h +1 −1 starts from R = 0 and yields . . . 1000 +1000 −1000 MEM=1110, OPS=1000 f f +10 +10 −10 −10 R=(1,0): g g g Largest Memory Increase (LMI) re- +1 −1 100 100 +100 +100 −100 −100 mains at R = 1 as R = (1 , 0) infea- sible h h h 1000 +1000 −1000 MEM=1120, OPS=1200 , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 10

  11. Call Tree Reversal STCE Mixed Integer Programming Formulation (Objective) We aim to balance PRC � f ∈ S r [ f ] · ¯ c [ f ] , cost induced by writing/reading c · � checkpoints (memory traffic) ˜ f ∈ S r [ f ] and implementation effort due to c · � number of checkpointed routines ˆ f ∈ S ˆ r [ f ] � � � c ( R ) = r [ f ] · ¯ c [ f ] + ˜ c · r [ f ] + ˆ c · ˆ r [ f ] , f ∈ S f ∈ S f ∈ S where ˆ r [ f ] ∈ { 0 , 1 } vanishes unless at least one call of f is checkpointed, ¯ c [ f ] denotes the PRC of f and ˜ c and ˆ c are used to weigh impacts due to memory traffic and implementation effort, respectively. , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 11

  12. Call Tree Reversal STCE Mixed Integer Programming Formulation (Constraints) We define PMR for each subprogram s after execution of its augmented primal → M [ g ] = m [ g ]+ � � → � (1 − r [ h ]) · M [ h ] + r [ h ] · ↓ M [ h ] . h ∈ χ [ g ] and prior to execution of its adjoint, e.g, ← ← M [ g ] n [ g ] = M [ f i ] − m [ f ] i � → � + r [ g ] M [ g ] − ↓ M [ g ] The maximum PMR is reached prior to execution of adjoint subprograms, ← hence, max g ∈ S M [ g ] n [ g ] must not exceed the upper bound M . ◮ J. Lotz, U.N., S. Mitra: Mixed Integer Programming for Call Tree Reversal, SIAM CSC (2016). , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 12

  13. Call Tree Reversal STCE Numerical Results (Toy) f For M = 125 we get 0 0 0 5 MIP LMI LMD g 0 g 1 5 5 5 h R (1,0,0,1) (0,0,1,0) (1,1,0,0) 40 40 PMR 95 90 125 10 30 c ( R ) 70 120 80 5 i 50 ¯ c [ f ] = 200 , ¯ c [ h ] = 120 , ¯ c [ i ] = 30 , and ¯ c [ g ] = 40 . The cost ¯ c [ h ] is chosen to be much higher than the corresponding memory requirement to mimic some time consuming operation in h (e.g, file I/O) without effect on PMR. , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 13

  14. Call Tree Reversal STCE Numerical Results (Towards Reality) We consider the solution of an elliptic boundary value problem with PETSc v3.3. A discrete adjoint version was generated using dco/c++ and AMPI. Bars visualize the overhead due to PRC. Statistics: 1500 source files, 2717 subprograms instrumented based on dco/c++ count mode, 443k subprogram calls, PMR for R = 0 equal to 94GB, runtime of MIP analysis/optimization equal to 40s 600 LMI LMD 400 CPLEX 200 0 2 , 000 8 , 000 64 , 000 Memory Bound in MB ◮ J. Lotz, U.N., M. Schanen: Discrete Adjoints of PETSc through dco/c++ and Adjoint MPI, Euro-Par 2013 Parallel Processing, 497-507. , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 14

  15. STCE Outline Motivation Call Tree Reversal Conclusion and Outlook , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 15

  16. Conclusion STCE Technical Challenges ◮ automated instrumentation of large code bases ( → clang) ◮ application to call graphs requires conservatism and abstract interpretation for annotation ◮ combination with static data flow analysis desirable ◮ definition of corresponding adjoint code design patterns is work in progress ◮ U.N.: Adjoint Code Design Patterns, AD2016. ◮ L. Hasco¨ et, U.N., V. Pascual: “To Be Recorded” Analysis in Reverse-Mode Automatic Differentiation, Future Generation Computer Systems 21(8):1401–1417, Elsevier (2005). , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 16

  17. Outlook STCE Result Checkpointing f f 1 1 f 1 +10 +10 −10 −10 10 10 +1 g −1 g g 1 g 1 100 100 +100 +100 −100 −100 100 100 +1+1 −1 h h h h −1 h 1 1 1000 0 +1000 −1000 1000 MEM=1110, OPS=1200 ◮ U.N.: The Art of Differentiating Computer Programs, SIAM (2012). , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 17

  18. Outlook STCE Binomial Checkpointing 0:9 0:9 0:3 4:9 4:9 4:9 0:3 0:3 +4 * 4:6 7:9 4:6 4:6 0:0 1:3 1:3 1:3 0:0 +3 +3 +1 * * * * 4:4 5:6 4:4 1:1 2:3 1:1 +1 +1 +1 +1 ◮ recursive bisection (dynamic programming) ◮ local recompute all ◮ repeated accesses to checkpoints ◮ A. Griewank, A. Walther: Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation, ACM Transactions on Mathematical Software 26(1):19–45, ACM (2000). , Lotz, Naumann, Mitra, CSC16, Albuquerque, NM, Oct. 10–13, 2016 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend