Sliding Substitution of Failed Nodes Atsushi Hori, Kazumi - - PowerPoint PPT Presentation

sliding substitution of failed nodes
SMART_READER_LITE
LIVE PREVIEW

Sliding Substitution of Failed Nodes Atsushi Hori, Kazumi - - PowerPoint PPT Presentation

Sliding Substitution of Failed Nodes Atsushi Hori, Kazumi Yoshinaga, Yutaka Ishikawa RIKEN AICS Thomas Herault, Aurlien Bouteiller, George Bosilca University of Tennessee, ICL 15 10 2 2 Motivation Having spare


slide-1
SLIDE 1

Sliding Substitution of Failed Nodes

Atsushi Hori, Kazumi Yoshinaga, Yutaka Ishikawa RIKEN AICS Thomas Herault, Aurélien Bouteiller, George Bosilca University of Tennessee, ICL

15年10月2日金曜日

slide-2
SLIDE 2

EuroMPI 2015, Bordeaux

Motivation

  • Having spare node set seems to be the last

resort

  • “in such case, spare node can be used.”
  • Having spare node is not the answer,

but new research issue

2

15年10月2日金曜日

slide-3
SLIDE 3

EuroMPI 2015, Bordeaux

Fault Resilience

  • Fault tolerance in Exa-flops era
  • High failure rate
  • High I/O bandwidth requirement
  • User-level fault resilience
  • Less I/O bandwidth required
  • e.g., ULFM (User-Level Fault Mitigation)
  • We need a recovery strategy !!

3

15年10月2日金曜日

slide-4
SLIDE 4

EuroMPI 2015, Bordeaux

Survival from Node Failure

  • Jobs with dynamic load balancing
  • e.g., Task bag, PIC, ...
  • Job shrinking to exclude failed nodes
  • Tasks running on failed node(s) are

migrated to live nodes

  • Jobs without dynamic load balancing
  • e.g., Stencil computation, ...
  • Very difficult to balance load
  • Having spare nodes seems to be the

answer ...

4

15年10月2日金曜日

slide-5
SLIDE 5

EuroMPI 2015, Bordeaux

Stencil Computation

  • Survival from a node failure
  • Load balancing
  • Preserving communication pattern
  • Less code modification

5

Shift the load on to healthy nodes New complex communication pattern

15年10月2日金曜日

slide-6
SLIDE 6

EuroMPI 2015, Bordeaux

Spare Node

  • In an error handler (of ULFM, for example)
  • create a new MPI communicator to
  • exclude the failed node, and
  • include a spare node.
  • then, migrate the task running on the failed

node to the spare node

  • No change in the kernel part of application
  • However, at the network level, the regular

stencil communication pattern can be lost !

6

15年10月2日金曜日

slide-7
SLIDE 7

EuroMPI 2015, Bordeaux

Is spare node really the answer ?

  • Our scope
  • Is there any penalty? If any, how much?
  • How spare nodes should be allocated?
  • How many spare nodes should be allocated?
  • How failed nodes should be substituted be

spare nodes?

  • Out of scope
  • How (soft/hard) errors are detected
  • How checkpoints are taken
  • How tasks are migrated

7

15年10月2日金曜日

slide-8
SLIDE 8

EuroMPI 2015, Bordeaux

Spare Node Penalty (1)

  • Spare node allocation and node utilization

8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 2D(1,1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 2D(2,1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 2D(2,2)

2 4 6 8 10 12 14 10,000 100,000 1,000,000 % Spare Nodes # Nodes 2D(1,1) 2D(2,1) 3D(1,1) 3D(2,1) 3D(3,1)

15年10月2日金曜日

slide-9
SLIDE 9

EuroMPI 2015, Bordeaux

How many spare nodes ?

  • MTBF of a node
  • 50,000 Hr. ≈ 5 Years
  • MTBF of Exa (106 nodes)
  • 0.05 Hr. = 3 Min.
  • #Spare = 10,000 (1%)
  • 500 Hr. ≈ 20 Days
  • 104 out of 106

9 10 100 1,000 10,000 10,000 100,000 1,000,000 System MTBF (50,000H/Node) # Nodes 2D(1,1) 2D(2,1) 3D(1,1) 3D(2,1) 3D(3,1)

15年10月2日金曜日

slide-10
SLIDE 10

EuroMPI 2015, Bordeaux

Spare Node Allocation

  • Changing spare node allocation method

according to the number of nodes

10

1 2 3 4 5 10,000 100,000 1,000,000 % Spare Nodes # Nodes 2 4 6 8 10 12 14 10,000 100,000 1,000,000 % Spare Nodes # Nodes 2D(1,1) 2D(2,1) 3D(1,1) 3D(2,1) 3D(3,1)

15年10月2日金曜日

slide-11
SLIDE 11

EuroMPI 2015, Bordeaux

Spare Node Penalty (2)

  • Possibility of communication performance

degradation

  • 5P Stencil communication pattern

11

S F

Spare Nodes Normal After substitution

2D Cartesian Network and XY Routing

15年10月2日金曜日

slide-12
SLIDE 12

EuroMPI 2015, Bordeaux

Sliding Substitution

  • 0D Sliding
  • 1D Sliding
  • 2D Sliding
  • 3D, 4D, .... Sliding

12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 21 24 25 26 27 28 29 30 31 32 33 34 35

0D Sliding

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25 26 21 28 29 30 31 32 27 34 35 33

1D Sliding 2D Sliding

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Spare Nodes Spare Nodes

Node 21 fails

15年10月2日金曜日

slide-13
SLIDE 13

EuroMPI 2015, Bordeaux

5P Stencil on 2D Network

  • Simulated Results
  • Spare Allocation
  • 2D(2,1) > 2D(1,1)
  • Max. Failure
  • 0D: up to #Spare
  • 1D: 3 (or more)
  • 2D: up to 2

(2D Cart. Topo.)

  • Comm. Perf.
  • 2D > 1D > 0D

13

B B B B B B B B B B C C C C C

5 10 15 20 25 30 35 1 2 3 4 5 6 7 8 9 10

  • Max. Collisions

B Mesh C Torus B B B B B

5 10 15 20 25 30 35 1 2 3 4 5 6 7 8 9 10

B B B

2 4 6 8 10 1 2 3 4 5

  • Max. Collisions

# Failures

B B

2 4 6 8 1 2 3 4 5 # Failures

??? combinatory explosion

  • no message collision
  • up to 2 failures

up to 3 failures in worst case

15年10月2日金曜日

slide-14
SLIDE 14

EuroMPI 2015, Bordeaux

5P Stencil Comm. Perf.

14

K K K K K K

KKK

256KiB 1MiB 4MiB

  • 1

2 3 4 Relative Latency

K K K K K K

KKK

256KiB 1MiB 4MiB

  • K K K

K K K

KKK

256KiB 1MiB 4MiB

  • K K K

K K K

KKK

256KiB 1MiB 4MiB

  • K K K

K K K

KKK

256KiB 1MiB 4MiB

  • K K K

K K K

KKK

256KiB 1MiB 4MiB

  • K

K K K K K

KKK

256KiB 1MiB 4MiB

  • 1

2 3 4 Relative Latency

K K K K K K

KKK

256KiB 1MiB 4MiB

  • K

K K K K K

KKK

256KiB 1MiB 4MiB

  • 1

2 3 4 5 6 Relative Latency

K K K K K K

KKK

256KiB 1MiB 4MiB

  • K

K K K K K

KKK

256KiB 1MiB 4MiB

  • K

K K K K K

KKK

256KiB 1MiB 4MiB

  • K

K K K K K

KKK

256KiB 1MiB 4MiB

  • K K K

K K K

KKK

256KiB 1MiB 4MiB

  • K

K K K K K

KKK

256KiB 1MiB 4MiB

  • 1

2 3 4 5 6 Relative Latency

K K K K K K

KKK

256KiB 1MiB 4MiB

  • the K

BG/Q

Smaller is better

15年10月2日金曜日

slide-15
SLIDE 15

EuroMPI 2015, Bordeaux

Collective Performance

15

KKK KKK

K K K

0D 1D 2D

  • 1

2 3

  • Rel. Perf. (based on 15x31)

K K K K K K K K

K K K K

0D 1D 1D+ 2D

  • KKK

KKK

K K K

0D 1D 2D

  • K

K K K K K K K

K K K K

0D 1D 1D+ 2D

  • 2

4 6 8 10 12

  • Rel. Perf. (based on 16x32)

KKK KKK

K K K

0D 1D 2D

  • K

K K K K K K K

K K K K

0D 1D 1D+ 2D

  • 1

2 3

  • Rel. Perf. (based on 16x32)

KKK KKK

K K K

0D 1D 2D

  • 1

2 3

  • Rel. Perf. (based on 15x31)

K K K K K K K K

K K K K

0D 1D 1D+ 2D

  • K K K

K K K

K K K

0D 1D 2D

  • 1

2 3

  • Rel. Perf. (based on 23x23)

K K K K K K K K

K K K K

0D 1D 1D+ 2D

  • 1

2 3

  • Rel. Perf. (based on 23x23)

K K K K K K

K K K

0D 1D 2D

  • 1

2 3

  • Rel. Perf. (based on 23x23)

K K K K K K K K

K K K K

0D 1D 1D+ 2D

  • 1

2 3

  • Rel. Perf. (based on 23x23)
  • On K and BG/Q,

collective ops are

  • ptimized for

their network.

  • Having spare

nodes makes the

  • ptimization very

difficult.

  • BG/Q’ optimization

works only with MPI_COMM_WORLD

Smaller is better

15年10月2日金曜日

slide-16
SLIDE 16

EuroMPI 2015, Bordeaux

Summary

  • Study on spare node substitution has just

begun

  • Comm. perf. degradation is observed
  • 5P stencil :
  • Simulation: up to 100 times larger latency
  • Experiment: < 20 times larger latency
  • Collective : up to 12 times larger latency

16

15年10月2日金曜日

slide-17
SLIDE 17

EuroMPI 2015, Bordeaux

Current and Future Work

  • Evaluations with real applications
  • Node-Rank re-mapping algorithms, or better

substitution methods

  • Dragonfly and/or Fat-tree network ?
  • Experiments using Tsubame 2.5 (Fat-tree) is

scheduled

  • At this moment, it is still unclear if having spare

nodes is a promising technique

17

15年10月2日金曜日

slide-18
SLIDE 18

EuroMPI 2015, Bordeaux

Acknowledgement

Thank to

  • Dr. Norbert Attig

at Jülich Supercomputing Center to give us a chance to use JUQUEEN.

18

15年10月2日金曜日