self stabilizing iterative solvers
play

Self-stabilizing Iterative Solvers Piyush Sao, Richard Vuduc School - PowerPoint PPT Presentation

Self-stabilizing Iterative Solvers Piyush Sao, Richard Vuduc School of Computational Science & Engineering Georgia Institute of Technology SIAM PP-14 P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 1 / 21


  1. Self-stabilizing Iterative Solvers Piyush Sao, Richard Vuduc School of Computational Science & Engineering Georgia Institute of Technology SIAM PP-14 P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 1 / 21

  2. Introduction Self-stabilization Informally, self-stabilization (Dijkstra 1974) is a property of a system that guarantees it will enter a valid state no matter what its initial state is. We describe a self-stabilizing version the conjugate gradients method, which is resilient to transient soft faults. P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 2 / 21

  3. Introduction Fault tolerant Iterative Algorithms Can an iterative algorithm still converge if a fault has occured ? P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 3 / 21

  4. . > < ... , , z 1 y 1 x 1 < y 1 ... , , z 0 y 0 x 0 . . > z 1 > ... ... , , z y > < , , , z 2 y 2 x 2 > < ... , . < > Return , , z 0 y 0 x 0 Intermediate Vars Start Update < > < ... , , . convergence Check for ... > ... , , , z s y s x s > < ... , x k . > < ... , , z k y k < Introduction Iterative Algorithms x k+1 y k+1 z k+1 x k+1 y k+1 z k+1 P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 4 / 21

  5. Invalid States . Valid States Solution States Invalid States <Xs', Ys', Zs', ...> <Xi, Yi, Zi, ...> <Xf, Yf, Zf, ...> . . . . . Start <Xs, Ys, Zs, ...> <Xk, Yk, Zk, ...> <X1, Y1, Z1, ...> . . . Self-stabilizing Algorithms Introduction Self-stabilizing Algorithms An algorithm is self-stabilizing , if starting from any state faulty Execution (valid or invalid), it comes back to a valid . . . state within finite number of“steps”, otherwise not. P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 5 / 21

  6. Introduction Making an Algorithm Self-stabilizing Naturally self-stabilizing (e.g., Newton, SOR, Jacobi) Restart from a checkpoint Restart (such as restarted-GMRES) Our strategy: Correction step P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 6 / 21

  7. Introduction Periodic correction step Restore sufficient conditions for convergence Mathematically“equivalent”to original in a fault-free execution Eliminates need for detecting faults Executing correction step periodically ensures resuming correct behavior in finite number of steps P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 7 / 21

  8. Introduction Self-stabilizing Conjugate Gradient Conjugate Gradient Algorithm Solve Ax = b for x for SPD A ; Quadratic optimization problem F ( x ) = 1 2 x T Ax − x T b F ( x ) represents N-dimensional paraboloid CG finds the optimum by taking appropriately constructed steps P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 8 / 21

  9. Introduction Self-stabilizing Conjugate Gradient Conjugate Gradient (CG) Algorithm State variables Transition function x k = present estimate 1 . ← q k Ap k p k = search direction r T r 2 . ← α k r k = b − Ax k = p T q 3 . ← x k + α k r k x k +1 direction of steepest 4 . r k +1 ← r k − α k q k descent � r k +1 � 2 5 . ← β k � r k � 2 6 . ← r k +1 + β k p k p k +1 P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 9 / 21

  10. Introduction Self-stabilizing Conjugate Gradient Self-stabilizing Conjugate Gradient It is a Krylov subspace method, { p k } , { r k } spans Krylov subspace K ( A , r 0 , m ) K ( A , r 0 , m ) = span { r 0 , Ar 0 , . . . A m − 1 r 0 } Global orthogonality properties p T i Ap j = 0 if i � = j ; r T = 0 if i � = j ; and i r j r T i p j = 0 if i > j . Finite termination in exact arithmetic P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 10 / 21

  11. Introduction Self-stabilizing Conjugate Gradient Effects of faults on Conjugate Gradient In general, most of Krylov subspace properties are lost Multiple potential outcomes due to faults Error in r k ⇒ Converge to incorrect value 1 Error in p k ⇒ Diverge, stagnation, slow convergence 2 Difficult to detect validity of state P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 11 / 21

  12. Introduction Self-stabilizing Conjugate Gradient Self-stabilizing Conjugate Gradient We identify the following relations that are sufficient to guarentee convergence (a corollary to Zoutendijk condition) Residual condition : r k = b − Ax k r T k p k Optimal step length : α k = p T k Ap k Correct search direction : ( p T k r k ) � p k �� r k � > c 1 Local orthogonality relation : p k +1 T Ap k = 0 P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 12 / 21

  13. Experiments Experiments Assume: selective reliability mode , i.e., correction step can be done reliably Inject faults in sparse matrix-vector (SpMV) product by flipping bits in matrix entry at a specified rate Bit flips in mantissa and sign bits - 40 bit flips in every 1 unreliable SpMV Bit flips can occur any where (including exponent) - 4 bit flips 2 in every unreliable SpMV P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 13 / 21

  14. Experiments Problems We test self-stabilizing CG (CG-SS) on three problems with different convergence profiles and conditioning Name N NNZ κ ( A ) Convergence profile K3D 27000 183600 646 Quadratic DIAG 10000 10000 990100 Linear THERMAL1 82654 574458 496250 Sub-linear Table : Different problems used for experimentation P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 14 / 21

  15. Experiments Solvers We compare performance of CG-SS against following solvers Reliable-CG : Where all the computations are done reliably 1 CG-SS : Self-stabilizing CG with correction done every 10 th 2 iteration CG-RES : Restarted CG: restart every 10 th iteration 3 FT-GMRES : Inner outer iteration based fault tolerant GMRES, 4 where outer iteration is done reliably P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 15 / 21

  16. Experiments K3D : Quadratic Convergence In presence of faults, only linear convergence is observed Convergence History Convergence History Error Free CG Error Free CG 0 10 CG− SS 0 10 CG− SS CG− RES CG− RES − 2 10 FT− GMRES FT− GMRES − 4 10 − 5 10 − 6 10 − 8 10 − 10 10 − 10 10 − 12 10 − 15 − 14 10 10 0 100 200 300 100 200 300 50 150 250 0 50 150 250 Number of Iterations Number of Iterations (a) Bounded errors (mantissa (b) Unbounded errors (including and sign bit flips only) exponent also) P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 16 / 21

  17. Experiments THERMAL1 : Sub-linear Convergence Convergence rate for CG-SS and CG-RES does not change by much; FT-GMRES shows better convergence due to pre-conditioning Convergence History Convergence History 0 10 0 10 Error Free CG Error Free CG CG− SS CG− SS − 1 10 CG− RES − 1 CG− RES 10 FT− GMRES FT− GMRES − 2 10 − 2 10 − 3 10 − 3 10 − 4 − 4 10 10 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 0 50 Number of Iterations Number of Iterations (a) Bounded errors (mantissa (b) Unbounded errors and sign bit flips only) (including exponent also) P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 17 / 21

  18. Experiments DIAG : Linear Convergence Linear convergence is maintained. However, slight slow-down in convergence is observed Convergence History Convergence History 0 0 10 10 − 2 10 Error Free CG Error Free CG CG− SS CG− SS − 5 − 4 10 10 CG− RES CG− RES FT− GMRES FT− GMRES − 6 10 − 10 10 − 8 10 − 10 10 − 15 10 − 12 10 − 14 10 100 200 300 100 200 300 0 50 150 250 0 50 150 250 Number of Iterations Number of Iterations (a) Bounded errors (mantissa and (b) Unbounded errors (including sign bit flips only) exponent also) P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 18 / 21

  19. Experiments Amount of Reliable Computation Required Compared to Reliable-CG, CG-SS requires < 30 % reliable SpMV to reach same error tolerance Error Tolerance versus # Reliable SpMV Error Tolerance versus # Reliable SpMV 1 1 Fraction of Reliable SpMV Error Free CG Fraction of Reliable SpMV 0.9 0.9 CG− SS Error Free CG 0.8 0.8 CG− RES CG− SS 0.7 FT− GMRES 0.7 CG− RES 0.6 0.6 FT− GMRES 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 − 5 − 10 − 15 0 − 5 − 10 − 15 10 10 10 10 10 10 10 10 E rror tolerance E rror tolerance (a) Bounded errors (mantissa (b) Unbounded errors and sign bit flips only) (including exponent also) P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 19 / 21

  20. n { Observed Convergence in Presence of faults Ideal Convergence in �nite precision Analysis Analysis Can show that if κ ( A ) is the condition number and η is rate of bit flips per SpMV, then P. Sao, R. Vuduc (Georgia Tech) Self-stabilizing Iterative Solvers SIAM PP-14 20 / 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend