Im Improving P Performance o of It Iterative Me Methods by y - PowerPoint PPT Presentation

Im Improving P Performance o of It Iterative Me Methods by y Lo Lossy y Checkp kpoin intin ing Dingwen Tao (University of California, Riverside) Sheng Di (Argonne National Laboratory) Xin Liang (University of California, Riverside) Zizhong Chen (University of California, Riverside) Franck Cappello (Argonne National Laboratory) June 2018

Ou Outline Ø Introduction • Why we need to checkpoint iterative methods? 2

Ou Outline Ø Introduction • Why we need to checkpoint iterative methods? Ø Background • Traditional checkpointing for iterative methods • Performance model of traditional checkpointing 3

Ou Outline Ø Introduction • Why we need to checkpoint iterative methods? Ø Background • Traditional checkpointing for iterative methods • Performance model of traditional checkpointing Ø Our Designs • Lossy checkpointing for iterative methods • Performance model of our new checkpointing 4

Ou Outline Ø Introduction • Why we need to checkpoint iterative methods? Ø Background • Traditional checkpointing for iterative methods • Performance model of traditional checkpointing Ø Our Designs • Lossy checkpointing for iterative methods • Performance model of our new checkpointing Ø Theoretical Analysis • Impact of lossy checkpointing for different methods • Expected fault tolerance overhead 5

Ou Outline Ø Introduction • Why we need to checkpoint iterative methods? Ø Background • Traditional checkpointing for iterative methods • Performance model of traditional checkpointing Ø Our Designs • Lossy checkpointing for iterative methods • Performance model of our new checkpointing Ø Theoretical Analysis • Impact of lossy checkpointing for different methods • Expected fault tolerance overhead Ø Experimental Evaluation 6

Wh Why Ne Need to o Checkpoi oint Iterative Method ods? Ø Iterative methods used for solving large, sparse linear system • ”Gaia” mission by European Space Agency (ESA) • Producing 5-parameter astrometric catalogue at the microarcsecond for 1 billion stars in Galaxy • Resulting a very large, sparse linear system of 72 billion equations • Scientists use LSQR iterative algorithm • Takes more than 54 hours on 2,048 BlueGene/Q nodes Execution Time Number of Iterations • Largest symmetric indefinite sparse matrix from UFL sparse 20000 7E+05 matrix collection (KKT240 with 28 million linear equations) 15000 6E+05 • 2,048 cores / 64 nodes on Bebop cluster at Argonne Seconds 10000 • GMRES solver implemented in PETSc 5E+05 5000 • Relative convergence tolerance of 10 -6 , execution time > 1 hour 0 4E+05 256 512 1024 2048 • MTBF of Sunway TaihuLight supercomputer can be hourly or less Number of Processes than 1 hour 8

Im Importan ance of f Im Improvi ving Checkpointing Pe Performance of Iterative Methods Ø Scientific simulations involving PDEs • Solve linear systems within each timestep • Sparse linear systems include most of the variables • E.g., 3D CFD problems from Navier-Stokes equations • Semi-Implicit Method for Pressure-Linked Equations (SIMPLE) algorithm • 5 out of 9 fluid-flow variables need to be checkpointed in iterative method Ø Significantly Improve Checkpointing Performance of Iterative methods Significantly Improve Application Performance 9

State-of St of-the the-Ar Art: F Failure-St Stop Failure Comp Proc 1 Comp Proc k Process State Process State P 1 P k Stable Storage Checkpoint/Restart Model • Periodical checkpoint to file system is expensive • Difficult to scale up due to bottleneck of I/O bandwidth 10

St State-of of-the the-Ar Art: F Failure-St Stop Failure Comp Proc 1 Comp Proc k Ckpt Proc Process State Process State P 1 P k Local Local Checkpoint Stable Checkpoint Checkpoint Encoding Storage C 1 C k C XOR Diskless checkpoint (J. Plank) 2 steps: C 1 + . . . + C n = C 1. Checkpointing state of each • More scalable (pros) application processor in memory • 2X or more memory overhead (cons) à Reduce usable memory and problem size 2. Encoding these in-memory checkpoints and storing the encodings • Only able to tolerate with partial failures , not for a whole system failure (cons) in checkpointing processors • Requires spare nodes and dedicates processors (cons) 11

Fa Failures and Checkpointing Optimized Techniques to Improve Scalability of Checkpoint • Diskless checkpoint • Multi-level checkpoint • Asynchronized checkpoint • Lossless-compressed checkpoint • …… Question: Can we use lossy compression to (1) reduce checkpointing size and overhead and (2) improve the performance and scalability? 12

Fa Failures and Checkpointing Question: Can we use lossy compression to (1) reduce checkpointing size and overhead and (2) improve the performance and scalability? Lossy checkpointing Two important questions: (1) What is the impact of the lossy checkpointing data on the execution performance? (2) Can lossy checkpointing actually improve the overall performance (including C/R and lossy compression) in the context of restarting with alternated data? 13

Tr Traditional Checkpointing for Iterative Methods Ø Checkpoint Checkpoint static variables (e.g., A , M ) at the beginning 1. Checkpoint dynamic variables (e.g., i, ⍴ , p , x ) every several iterations 2. 15

Traditional Checkpointing for Iterative Methods Tr Ø Checkpoint Checkpoint static variables (e.g., A , M ) at the beginning 1. Checkpoint dynamic variables (e.g., i, ⍴ , p , x ) every several iterations 2. Ø Recovery Recover a correct computational environment 1. Recover static variables 2. Recover dynamic variables 3. Recover recomputed variables (e.g., r ) 4. 16

Traditional Checkpointing for Iterative Methods Tr Ø Checkpoint Checkpoint static variables (e.g., A , M ) at the beginning 1. Checkpoint dynamic variables (e.g., i, ⍴ , p , x ) every several iterations 2. Ø Recovery Recover a correct computational environment 1. Recover static variables 2. Recover dynamic variables 3. Recover recomputed variables (e.g., r ) 4. Ø C/R cost dominated by dynamic variables • Static variables not checkpointed along iterations (at most once) • Static variables: linear system matrix A and preconditioner M • A usually has 1x ~ 10x nnz than dynamic variables’ size (i.e., vector size) • M is much sparse than A, e.g., block Jacobi, ILU • Checkpoint frequency is usually much higher than failure rate • MTTI = 4 hrs., Time ckpt = 18 s è Checkpoint interval ( Young’ formula ) = 12 mins • Checkpoint frequency is 30x higher than recovery frequency 17

Traditional Checkpointing for Iterative Methods Tr Ø Checkpoint Checkpoint static variables (e.g., A , M ) at the beginning 1. Checkpoint dynamic variables (e.g., i, ⍴ , p , x ) every several iterations 2. Ø Recovery Recover a correct computational environment 1. Recover static variables 2. Recover dynamic variables 3. Recover recomputed variables (e.g., r ) 4. Ø C/R cost dominated by dynamic variables • Static variables not checkpointed along iterations (at most once) • Static variables: linear system matrix A and preconditioner M • A usually has 1x ~ 10x nnz than dynamic variables’ size (i.e., vector size) Focus on reducing C/R overhead • M is much sparse than A, e.g., block Jacobi, ILU of dynamic variables in iterative methods by lossy compressors. • Checkpoint frequency is usually much higher than failure rate • MTTI = 4 hrs., Time ckpt = 18 s è Checkpoint interval ( Young’ formula ) = 12 mins • Checkpoint frequency is 30x higher than recovery frequency 18

Th Theoretic ical al Analy alysis is of f Checkpoin intin ing Ov Overhead for or Iterative Method ods Overall execution time • Iteration time Checkpoint time Recover/rollback time 20

Th Theoretic ical al Analy alysis is of f Checkpoin intin ing Ov Overhead for or Iterative Method ods Overall execution time • Iteration time Checkpoint time Recover/rollback time Based on Young’s formula Expected mean time of a rollback • and ! "# = %! &' /2 Overall time can be simplified to • 21

Im Improving P Performance o of It Iterative Me Methods by y - PowerPoint PPT Presentation

Im Improving P Performance o of It Iterative Me Methods by y Lo Lossy y Checkp kpoin intin ing Dingwen Tao (University of California, Riverside) Sheng Di (Argonne National Laboratory) Xin Liang (University of California, Riverside)

Chapter 12: Iterative Methods ES 240: Scientific and Engineering Computation. Iterative Methods

7. Iterative Methods: Roots and Optima Citius, Altius, Fortius! 7. Iterative Methods: Roots and

Accelerate Iterative Methods Good Algorithms Mixed Precision Iterative Methods Good

Iterative methods for Image Processing Lothar Reichel Como, May 2018. Lecture 3: Block iterative

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.3 Iterative

Basic Techniques II: Iterative Compression Marek Cygan Institute of Informatics University of

Development Figures are from : Agile and Iterative Development: A Manager's Guide, Craig

The use of stopping criteria for iterative Krylov methods in designing adaptive methods for PDEs

An Iterative Solver for the Diffusion The Methods Progress So Far... Equation Alan Davidson

Iterative Methods Mostly for SPD systems Iterative Linear conjugate gradient and its variants

Iterative Solution of Linear Systems in Iterative Solution of Linear Systems in Electromagnetics

Bounded Degree Spanning Tree using Iterative Relaxation Barna Saha March 11, 2015 Bounded

Partitioning sparse matrices for parallel preconditioned iterative methods Bora Uar Emory

Iterative Methods for Image Reconstruction Jeffrey A. Fessler EECS Department The University of

Iterative Krylov Subspace Methods for Sparse Reconstruction James Nagy Mathematics and Computer

Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th , 2012

Ontology-based Data Management Maurizio Lenzerini Dipartimento di Ingegneria Informatica

Fault Tolerance Chi Zhang czhang@cs.fiu.edu Basic Concepts Dependability Includes

Tolerance and distribution in Encelia species They are all closely related, how are they

Noise Tolerant Variants of the Perceptron Algorithm Roni Khardon RONI @ CS . TUFTS . EDU Gabriel

FatTire: Declarative Fault Tolerance for SDN Mark Reitblatt (Cornell) (TU Berlin UC Louvain)

Data Elevators Applying the Bundle Protocol in Delay Tolerant Wireless Sensor Networks

FaultTolerantLinearAlgebra: goalsandmethods.

A fault-tolerant one-way quantum computer Robert Raussendorf 1 , Jim Harrington 2 and Kovid Goyal