Spare Node Substitution for Failure Nodes Kazumi Yoshinaga RIKEN - PowerPoint PPT Presentation

Spare Node Substitution for Failure Nodes Kazumi Yoshinaga RIKEN AICS

Background • In the Exa-flops era, faults could happen more frequently than ever → System MTBF becomes shorter • Important Issue : Recovery from faults • Conventional method : System-level Checkpoint-Restart – Requires massive I/O • Many mechanisms to survive failures have been proposed and investigated – Less I/O Size – One of the mechanisms is ULFM(User-Level Fault Mitigation). • User program handles failures • The program can survive from the failures and continue its execution • But there is no discussion how a job should survive from node failures

Purpose of this Research • What is the best way to survive from node failures ? – Assuming a job can survive from a node failure by using an existing fault mitigation software – Not to propose a new fault mitigation mechanism – Propose recovery strategy

Survival from Node Failure • Applications with dynamic load balancing – e.g. Distributed Master-Worker model – Avoiding failure nodes method – Applications continue its execution only with healthy nodes after failure • How about applications without dynamic load balancing? – e.g. Stencil Computation

Avoiding Failure Node(s) for Stencil Computation x1.5 computation Stencil computation characteristics • – Communication pattern is fixed Failure – Load can be balanced When a recovery happens, above stencil • computation characteristics must be preserved However, New comm. pattern • – Hard to balance loads – Impossible to preserve communication pattern – Every time a new failure happens, communication pattern can differ Hard to program !!! • Using spare nodes to solve these problems

Using Spare Nodes • An application runs with spare nodes • If node failure happens, migrate the task running on failed node to the spare node – Loads are balanced (continues with the same # procs.) – Preserve logical communication pattern – No change in the kernel part of application – Some penalties

Spare Node Penalty-1 -System utilization Degradation- • Spare node allocation • System utilization is decreased 14 12 % Spare Nodes 10 3D(3,1) 8 3D(2,1) 6 3D(1,1) 4 2D(2,1) 2 2D(1,1) 0 1,000 10,000 100,000 1,000,000 # Nodes nD (α,β) n: Dimensions of networks α: # dimensions of spare nodes β: spare nodes width

Spare Node Penalty-2 -Communication Performance Degradation- • Logical communication pattern can be preserved • by creating a new MPI communicator to exclude the failed node and include a spare node. • However, physical communication pattern is not the same, and communication performance(CP) can be degraded. • Larger hop counts (latency), and • Possible message collisions

Ex. CP Degradation of Spare Node Substitution • Nodes on the topmost row work as spare nodes • Up to 5 possible collisions after 1 node failure – Independent from the # 2D Cartesian network topology nodes (XY routing ) 5-point Stencil Computation How faulty nodes should be replaced by spare nodes?

Sliding Substitution(1) • We proposed “Sliding Substitution” methods – 0D Sliding (simple replace) Failed rank is continued on an alternative node • – 1D Sliding Processes between the failure node and the spare node are shifted • – 2D Sliding • Whole processes between the failure node's row(column) and the spare node's row(column) are shifted – 3D Sliding, 4D , 5D… 20 32 30 31 32 33 34 35 30 31 32 33 34 35 30 31 26 32 33 34 35 30 24 25 31 26 32 33 27 28 34 35 29 24 25 26 27 28 29 24 25 20 26 27 28 29 24 18 19 25 20 26 27 21 22 28 29 23 18 19 20 20 21 22 23 18 19 20 21 20 22 23 18 19 18 19 18 19 20 21 20 21 20 21 22 23 22 23 22 23 12 13 14 15 16 17 12 13 14 15 16 17 12 13 14 15 16 17 6 7 8 9 10 11 6 7 8 9 10 11 6 7 8 9 10 11 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0D Sliding 1D Sliding 2D Sliding

Preliminary Evaluation -5D stencil on 2D network- • Spare Allocation 30 30 0D : 2D(1,1) 0D : 2D(2,1) 2D(2,1) > 2D(1,1) 25 25 Max. Collisions Mesh 20 20 Torus 15 15 • Max. Failure 10 10 – 0D: up to # Spare 5 5 – 1D: 3 (or more) 0 0 – 2D: up to 2 (2D 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 Cart. Topo.) 8 8 1D : 2D(2,1) 2D : 2D(2,1) Max. Collisions • Comm. Perf. 6 6 2D > 1D > 0D 4 4 2 2 0 0 1 2 3 4 5 1 2 3 4 5 # Failed Nodes # Failed Nodes

Sliding Substitution(2) The higher the dimension • – The better the performance – The smaller the number of the failure nodes it can handle 2D or higher dimension Sliding • – Migrate tasks running on healthy nodes – Free nodes works as new spare nodes Hybrid Sliding • – 3D → 2D → 1D → 0D (on 3D network) 3D Sliding Works as new spare nodes

Evaluation : 7P-Stencil on the K and BG/Q (Hybrid, 3D(2,1), 4MiB) 45 40 40 35 35 Smaller is better 30 Relative latency 30 25 25 Sim. Avg. 20 Sim. Worst 20 15 Sim. Best 15 10 Exp. Worst 10 5 5 0 0 0 100 200 300 0 50 100 150 200 # Failed Nodes # Failed Nodes The K Computer BG/Q 12x12x12 Nodes (calc. 11x11x12) 16x8x8 Nodes (calc. 15x7x8) K computer : up to 8 times slower • BG/Q : up to 12 times slower •

Evaluation: Collectives on the K and BG/Q (Hybrid, 3D(2,1)) Smaller is better 6 6 Allreduce(K) Barrier(K) 5 5 (Worst Case) Rel. latency 4 4 3 3 2 2 1 1 0 0 1 2 100 200 276 1 2 100 200 276 # Failed Nodes # Failed Nodes Smaller is better 1.2 1.2 (Based on 16x8x8) (Based on 16x8x8) (Worst Case) 2 10 (Worst Case) Rel. latency 1 1 Rel. latency 8 1.5 0.8 0.8 6 0.6 0.6 1 4 0.4 0.4 Barrier(BG/Q) Allreduce(BG/Q) 0.5 2 0.2 0.2 0 0 0 0 1 2 100 184 1 2 100 184 # Failed Nodes # Failed Nodes On the K and BG/Q, collective operations are optimized for their network • Having spare nodes makes the optimization very difficult • BG/Q’s optimization works only with MPI_COMM_WORLD •

Summary • We proposed and compared “Sliding Substitution” methods. • Communication performance degradation is observed – 7P-Stencil : • Simulation results: up to 40 collisions • Experimental results: up to 12 times larger latency – Collective communications: • up to 12 times lager latency (BG/Q, Barrier)

Future Work • Evaluations with real applications • Node-Rank re-mapping algorithms, or better substitution methods • Discussion on the other network topology – Experiments using Tsubame 2.5 (Fat-tree) is scheduled

Spare Node Substitution for Failure Nodes Kazumi Yoshinaga RIKEN - PowerPoint PPT Presentation

Spare Node Substitution for Failure Nodes Kazumi Yoshinaga RIKEN AICS Background In the Exa-flops era, faults could happen more frequently than ever System MTBF becomes shorter Important Issue : Recovery from faults Conventional

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

1 Agenda Quick'Intro' Node.js:'The'Beginning' What'Is'Node.js? Why'Use'Node.js?

Node.js Workshop Tom Hughes-Croucher Chief Evangelist / Node Tech Lead @sh1mmer tom@joyent.com

Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node->m_data == value) {

Reap What You Sow: Reap What You Sow: Spare Cells for Post Spare Cells for Post-Silicon Silicon

How to create spare part manuals from Autodesk inventor design with ToDoT www.SIngeCa.it

Health Failure Telehealth Final Report Sarah Briggs Heart Failure Specialist Nurse Heart Failure

Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

SEARCH TREE Node: State in state tree Root node: Top of state tree Children: Nodes that can be

NODE.JS ANTI-PATTERNS and bad practices ADOPTION OF NODE.JS KEEPS GROWING CHAMPIONS Walmart,

RegCM Climate Model Refactoring for HPC Graziano Giuliani ggiulian@ictp.it International Centre

Acceleration of stencil- based fusion kernels Y. ASAHI 1 , G. Latu 1 , T. Ina 2 , Y. Idomura 2 ,

Boundary Approximations for Semi-Lagrangian Schemes Applied to Hamilton-Jacobi-Bellman Equations

Tuning space optimization for multi- core architectures V. Martnez , F. Dupros, M. Castro, H.

Deferred Shading Rasmus Vahtra, Andres Traks Forward rendering (non-deferred shading) Forward

GPU Programming Maciej Halber Aim Give basic introduction to CUDA C How to write kernels

Programs in context Monday 12 th November 2012 Dominic Orchard, Cambridge Programming Research

Improving 3D Lattice Boltzmann Method with asynchronous transfers on many-core processors Minh

Spare Node Substitution for Failure Nodes Kazumi Yoshinaga RIKEN - PowerPoint PPT Presentation

Spare Node Substitution for Failure Nodes Kazumi Yoshinaga RIKEN AICS Background In the Exa-flops era, faults could happen more frequently than ever System MTBF becomes shorter Important Issue : Recovery from faults Conventional

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

1 Agenda Quick'Intro' Node.js:'The'Beginning' What'Is'Node.js? Why'Use'Node.js?

Node.js Workshop Tom Hughes-Croucher Chief Evangelist / Node Tech Lead @sh1mmer tom@joyent.com

Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node-&gt;m_data == value) {

Reap What You Sow: Reap What You Sow: Spare Cells for Post Spare Cells for Post-Silicon Silicon

How to create spare part manuals from Autodesk inventor design with ToDoT www.SIngeCa.it

Health Failure Telehealth Final Report Sarah Briggs Heart Failure Specialist Nurse Heart Failure

Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

SEARCH TREE Node: State in state tree Root node: Top of state tree Children: Nodes that can be

NODE.JS ANTI-PATTERNS and bad practices ADOPTION OF NODE.JS KEEPS GROWING CHAMPIONS Walmart,

RegCM Climate Model Refactoring for HPC Graziano Giuliani ggiulian@ictp.it International Centre

Acceleration of stencil- based fusion kernels Y. ASAHI 1 , G. Latu 1 , T. Ina 2 , Y. Idomura 2 ,

Boundary Approximations for Semi-Lagrangian Schemes Applied to Hamilton-Jacobi-Bellman Equations

Tuning space optimization for multi- core architectures V. Martnez , F. Dupros, M. Castro, H.

Deferred Shading Rasmus Vahtra, Andres Traks Forward rendering (non-deferred shading) Forward

GPU Programming Maciej Halber Aim Give basic introduction to CUDA C How to write kernels

Programs in context Monday 12 th November 2012 Dominic Orchard, Cambridge Programming Research

Improving 3D Lattice Boltzmann Method with asynchronous transfers on many-core processors Minh

Warmup Exercise while (node != NULL) { ! Consider a binary tree if (node->m_data == value) {