Adjoint Data-Flow analyses applied to checkpointing - Tradeoff - PowerPoint PPT Presentation

Adjoint Data-Flow analyses applied to checkpointing - Tradeoff between snapshots and TBR Benjamin Dauvergne Tropics Project, INRIA Sophia-Antipolis Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.1/9

Why checkpoints? � Instead of recording the tape of the execution, you want to reexecute some part of your code. � To do this you need to restore the variables used by this part to the value they carried at the time of the first execution. � Used here means read before written, it is a classical data flow analysis notation, like Def . Use ( I 1 ,..., I n ) = Use ( I 1 ) ∪ ( Use ( I 2 ,..., I n ) \ Def ( I 1 )) . Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.2/9

Usual way of doing checkpoints � By hand : we know the code, we know that there is something called the state and it is read and written between checkpoints. We create a procedure which saves it on the tape and we provide it to the AD tool. � Automatically: when you write a source to source AD tool you don’t know what the input code is doing, so you need data flow analysis to find out those used variables and if they will be overwritten. Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.3/9

What should we save? Data flow notation from a previous paper of L. Hascoet and M. Araya. X = [ I 1 ,..., I n ] a sequence of instructions / 0 ⊢ X where = adjoint program of X TBR ⊢ I ; D = PUSH ( Def ( I ) ∩ ( TBR ∪ Use ( I ′ ))) I ( TBR ∪ Use ( I ′ )) \ Def ( I ) ⊢ D POP ( Def ( I ) ∩ ( TBR ∪ Use ( I ′ ))) I ′ � I ′ is the adjoint code associated with a single intruction. When you differentiate you have a context: save set TBR . Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.4/9

The TBR - Snapshot trade off Bigger TBR Bigger Snapshot TBR ⊢ C ; D = PUSH ( Def ( C ) ∩ TBR ) TBR ⊢ C ; D = PUSH ( Def ( C ) ∩ TBR ) � � � � � � �� Def ( C ) ∩ Use ) Def ( C ; D ) ∩ Use PUSH C PUSH C C C � � �� TBR ∪ Use \ Def ( C ) ⊢ D TBR \ ( Def ( C ) ∪ Snap ) ⊢ D C � � � � � � �� Def ( C ) ∩ Use ) Def ( C ; D ) ∩ Use POP C POP C / / 0 ⊢ C 0 ⊢ C POP ( Def ( C ) ∩ TBR )) POP ( Def ( C ) ∩ TBR ) Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.5/9

A code where «big snapshots» are bad Loop proc 1 ( Use state , Def A ) proc 2 ( Use state , Def B ) proc 3 ( Use state , Def C ) proc 4 ( Use ABC , Def state ) In Tapenade we checkpoint all calls so this example is interesting. Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.6/9

A code where «big snapshots» are bad The forward sweep of preceding code using «big snapshots». Loop PUSH ( state ) proc 1 ( Use state , Def A ) PUSH ( state ) proc 2 ( Use state , Def B ) PUSH ( state ) proc 3 ( Use state , Def C ) PUSH ( A , B , C ) proc 4 ( Use ABC , Def state ) It’s not really good, each time we save state , we save the same values. Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.6/9

A code where «big snapshots» are bad The forward sweep of preceding code using «big TBR». Loop PUSH ( A ) proc 1 ( Use state , Def A ) PUSH ( B ) proc 2 ( Use state , Def B ) PUSH ( C ) proc 3 ( Use state , Def C ) PUSH ( state ) proc 4 ( Use ABC , Def state ) Now we are able to remove redundant PUSH . Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.6/9

A code where « big TBR » is bad proc 1 ( use = array A ) a gather/scatter loop on A � The forward sweep of preceding code using «big TBR»: proc 1 ( use = array A ) a gather/scatter loop on A full of PUSH ( A ( i )) # PUSH > sizeof ( A ) . � The forward sweep of preceding code using «big snapshots»: PUSH ( A ) proc 1 ( use = array A ) a gather/scatter loop on A with less PUSH Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.7/9

Numerical results On one of our test code using the « big snapshots » scheme: Time of original function: 2.269999962300062 Time of tangent AD function: 7.000000000000000 Time of reverse AD function: 25.48999786376953 Max Stack size: 15876 blocks of 16384 bytes with a always « big TBR » scheme : Time of original function: 2.289999943226576 Time of tangent AD function: 7.090000152587891 Time of reverse AD function: 22.73000049591064 Max Stack size: 11815 blocks of 16384 bytes It’s a 26% gain in terms of memory and a 11% gain on cpu, with- out even knowing the code. Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.8/9

Conclusion � It is important to look at how you compute your snapshots. � «big TBR» is the scheme which gives the better result in general. � If a static analysis can infer that an array is going to be completely written once or more just after, «big snapshots» seems to be appropriate. Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.9/9

Further work � Find more, easily detectable code patterns, where one or the other scheme is better. � How could flow dependant data flow informations help us ? i.e specialization at run-time or using profiling. � Array region analysis. � The placement of checkpoints in big callgraphs/flowgraphs. Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR – p.10/9

Adjoint Data-Flow analyses applied to checkpointing - Tradeoff - PowerPoint PPT Presentation

Adjoint Data-Flow analyses applied to checkpointing - Tradeoff between snapshots and TBR Benjamin Dauvergne Tropics Project, INRIA Sophia-Antipolis Adjoint Data-Flow analyses applied to checkpointing -Tradeoff between snapshots and TBR p.1/9

CSC2/458 Parallel and Distributed Systems Checkpointing and Recovery Sreepathi Pai April 17,

Adjoint Solver Workshop Why is an Adjoint Solver useful? Design and manufacture for better

Adjoint Orbits, Principal Components, and Neural Nets Some facts about Lie groups and

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation

Extension of the adjoint method Stanislas Larnier Institut de Mathmatiques de Toulouse

Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud - Qingxi

Virtual Machine Checkpointing Brendan Cully University of British Columbia with Andrew Warfield

Cyber-Physical System Checkpointing and Recovery Fanxin Kong , Meng Xu, James Weimer, Oleg

Flow Visualization Overview: Flow Visualization (1) Introduction, overview Flow data Simulation

Type-Based Information Flow Analyses Franc ois Pottier January 2728, 2005 Franc ois

1 What Is Control-Flow Analysis? Loop Concepts Control-flow analysis discovers the flow of

Gradient-based optimization of flow problems using the adjoint method and high-order numerical

FLOW CYTOMETRY DATA COMPRESSION A.E. Bras PhD Student Erasmus University, Rotterdam, the

Flow Visualization Overview: Flow Visualization (1) Introduction, overview Fl Flow data d t

Symbolic Unfolding of Multi-adjoint Logic Programs es Moreno 1 Jaime Penabad 2 e Antonio Riaza 1

Introduction to Adjoint Models Dr. Ronald M. Errico Goddard Earth Sciences and Technology Center

Secure Geographical Routing Vivek Pathak and Liviu Iftode Location Authenticating

A-588-867 Investigation Public Document Operations (6): NC OFFICE OF AD/CVD OPERATIONS

Brand Purpose: Dont fake it to make it Graham Page, Global Managing Director of Media

IF THEY LIKE YOU, THEY WILL COME: FACEBOOK ADS FOR YOUR LIBRARY Allison Moonitz, Assistant

Lecture 10: THE AD-AS MODEL Reference: Chapter 8 LEARNING OBJECTIVES 1.What determines the

Untangling Header Bidding Lore Some myths, some truths, and some hope Waqar Aqeel , Debopam

Approaching a Platform Migration Approaches to SAS migration and Platform LSF considerations for

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems