Robustness of Interconnection Networks
3rd JLESC Summer School
Atsushi Hori
RIKEN AICS
1
16年6月28日火曜日
Robustness of Interconnection Networks 3rd JLESC Summer School - - PowerPoint PPT Presentation
1 Robustness of Interconnection Networks 3rd JLESC Summer School Atsushi Hori RIKEN AICS 16 6 28 Self Introduction Atsushi Hori - System Software Researcher The oldest and largest governmental research institute in
3rd JLESC Summer School
Atsushi Hori
RIKEN AICS
1
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Atsushi Hori - System Software Researcher
The oldest and largest governmental research institute in Japan, since 1917
Advanced Institute for Computational Science (AICS), since 2010
Running the K computer, the largest super computer in Japan (ranked 5th in Top500, Jun., 2016) involved in the Flagship2020 project to develop the post-K computer
2
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Atsushi Hori - System Software Researcher
The oldest and largest governmental research institute in Japan, since 1917
Advanced Institute for Computational Science (AICS), since 2010
Running the K computer, the largest super computer in Japan (ranked 5th in Top500, Jun., 2016) involved in the Flagship2020 project to develop the post-K computer
3
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Atsushi Hori - System Software Researcher
The oldest and largest governmental research institute in Japan, since 1917
Advanced Institute for Computational Science (AICS), since 2010
Running the K computer, the largest super computer in Japan (ranked 5th in Top500, Jun., 2016) involved in the Flagship2020 project to develop the post-K computer
DISCLAIMER This contents of this talk are based
independent from the Flagship2020 project
3
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Atsushi Hori - System Software Researcher
The oldest and largest governmental research institute in Japan, since 1917
Advanced Institute for Computational Science (AICS), since 2010
Running the K computer, the largest super computer in Japan (ranked 5th in Top500, Jun., 2016) involved in the Flagship2020 project to develop the post-K computer
The colored slides are supplements
3
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Atsushi Hori - System Software Researcher
The oldest and largest governmental research institute in Japan, since 1917
Advanced Institute for Computational Science (AICS), since 2010
Running the K computer, the largest super computer in Japan (ranked 5th in Top500, Jun., 2016) involved in the Flagship2020 project to develop the post-K computer
4
The venue of the next JLESC, in Dec., Kobe
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
5
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Network Basics
6
Routing Topology Implementation Fault Resilience
+ my personal opinion
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
where packets are sent and received may include a switch (see below)
connecting nodes and switches
a unit of transfer
7
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
8
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
9
Torus FatTree
Switch Switch Switch Switch
Mesh
Node Link
“SkinnyTree”
Switch Switch Switch
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Infiniband, Aries, Cray Gemini, Tiahne
Ethernet
IBM Power 775
10
❘ ❘❘❘❘ ❘❘❘❘❘ ❘❘ ❘❘❘❘❘❘❘❘ ❘ ❘❘❘❘❘❘❘❘ ❘ ❘❘❘❘❘ ❘ ❘❘❘❘❘❘❘ ❘ ❘❘❘ ❘ ❘❘❘❘❘❘❘❘❘❘❘ ❘ ❘ ❘ ❘ ❘ ❘❘ ❘❘ ❘❘❘ ❘❘ ❘ ❘ ❘❘❘❘ ❘ ❘❘ ❘❘ ❘❘❘❘ ❘❘ ❘❘❘❘❘ ❘ ❘ ❘❘❘❘❘❘❘❘❘❘❘ ❘❘❘ ❘❘❘❘ ❘ ❘❘ ❘❘❘❘❘ ❘❘❘❘ ❘ ❘ ❘ ❘❘ ❘❘ ❘ ❘ ❘❘❘ ❘ ❘ ❘❘ ❘ ❘ ❘❘ ❘❘❘ ❘❘ ❘❘ ❘❘ ❘❘❘ ❘❘❘ ❘❘ ❘❘ ❘ ❘❘❘❘❘ ❘❘ ❘❘❘ ❘❘❘❘ ❘❘ ❘❘ ❘ ❘❘ ❘ ❘ ❘ ❘ ❘❘ ❘❘❘❘❘ ❘ ❘❘❘❘ ❘❘ ❘ ❘❘❘❘ ❘❘ ❘ ❘❘ ❘ ❘ ❘ ❘❘❘❘❘ ❘❘ ❘❘❘❘ ❘ ❘❘❘ ❘ ❘ ❘❘❘ ❘ ❘❘ ❘ ❘❘ ❘ ❘❘❘❘ ❘❘ ❘ ❘ ❘❘ ❘ ❘ ❘❘ ❘ ❘ ❘❘❘❘ ❘❘❘ ❘❘❘ ❘❘❘❘ ❘ ❘ ❘❘❘❘ ❘ ❘❘ ❘ ❘❘ ❘ ❘ ❘❘❘ ❘ ❘ ❘❘ ❘ ❘❘❘❘❘❘ ❘❘ ❘ ❘ ❘❘❘❘❘ ❘ ❘❘❘ ❘❘ ❘ ❘ ❘❘❘ ❘❘ ❘❘ ❘ ❘❘ ❘ ❘ ❘ ❘❘ ❘ ❘ ❘❘❘ ❘ ❘❘ ❘❘ ❘ ❘ ❘ ❘❘❘❘❘ ❘❘❘❘❘❘❘❘❘❘ ❘ ❘ ❘ ❘ ❘❘ ❘ ❘❘ ❘ ❘❘❘ ❘❘❘ ❘ ❘ ❘❘ ❘ ❘ ❘ ❘ ❘❘❘ ❘❘ ❘❘❘❘ ❘❘ ❘ ❘❘❘ ❘ ❘ ❘❘❘ ❘ ❘❘ ❘ ❘ ❘ ❘ ❘ ❘❘ ❘ ❘ ❘❘ ❘❘❘ ❘ ❘ ❘❘ ❘❘❘❘❘❘❘ ❘ ❘❘❘❘ ❘ ❘❘ ❘ ❘ ❘ ❘ ❘❘ ❘ ❘❘ ❘❘ ❘❘❘❘ ❘❘ ❘ ❘ ❘ ❘ ❘❘❘❘ ❘❘ ❘❘ ❘❘ ❘ ❘❘ ❘❘ ❘❘❘❘ ❘ ❘❘ ❘ ❘ ❘❘ ❘ ❘ ❘❘❘ ❘❘❘❘
50 100 150 200 250 300 350 400 450 500 Topology Rank in Top500 as of Nov. 2015 FatTree Torus/Mesh SkinnyTree Misc.
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
11
Hypercube Dragonfly CM-2, nCUBE in 90s Cray XC series
and many others (ring, star, butterfly, to name a few)
Nodes Link Node
Sw.
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
12
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
node
Mesh
13
Nj Ni
Node
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
must be deadlock free
14
Sw. 1 channel 2 (virtual) channels Sw.
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
must be deadlock free
15
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
16
Node
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
17
Job A Job B Job C Job D Job C Job A Job B Job D
Job B, C and D can interfere with the others 2D torus turns into 2D mesh
Job C
Node Node
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
path
the state of the network
18
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Oscillation in Adaptive Routing
Two roads to the same destination
into the other road
19
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
messages consisting of multiple packets
20
P0 P1 P2 P3 P4 P5 P6 P7 …
Sending Order = Receiving Order
P0 P1 P2 P3 P4 P5 P6 P7
Recvbuf 0 Recvbuf 1
P0 P5 P3 P2 P4 P7 P9 P6 …
Sending Order ≠ Receiving Order
P0 P2 P3 P4 P5 P6 P7
Recvbuf 0 Recvbuf 1
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
21
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
22
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
23
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
24
Machine/Network Direct/Indirect the K (Tofu) Direct BG/Q Direct Infiniband Indirect Ethernet Indirect
Note: In many books, direct or indirect network is categorized as an aspect of topology
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
lengths
25
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
connected
26
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
C: Network interface (card) of a node S: Switch L: Cable
possible applications, and
27
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
28
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
network component failure
29
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
failed part(s)
part(s) can be automatically bypassed
30
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
and rebalance load
31
2D Jacobi iteration V’(i,j) = A * ( V(i-1,j ) + V(i+1,j) + V(i,j-1) + V(i,j+1) )
2D array V(N,M)
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
large number of packet collisions
32
Spare
No
S F 2 3 2 3 2 2 Migration
4 Possible Collisions 5 Possible Collisions
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
utilization
packet collisions
33
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 21 24 25 26 27 28 29 30 31 32 33 34 35
0D Sliding
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25 26 21 28 29 30 31 32 27 34 35 33
1D Sliding 2D Sliding
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Spare Nodes Spare Nodes
Node 21 fails
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
degree method first
degree method
34
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 21 24 25 26 27 28 29 30 31 32 33 34 35
0D Sliding
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25 26 21 28 29 30 31 32 27 34 35 33
1D Sliding 2D Sliding
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Spare Nodes Spare Nodes
Node 21 fails
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
35
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
36
を 倍 スーパースカラーの強化
アシスタントコア
・通信のデーモンを処理
キーテクノロジー
core core core core core core core core core core core core core core core core Assistant core Assistant core core core core core core core core core core core core core core core core coreTofu2 interface
Tofu2 controller HMC interface HMC interface L2 cache L2 cache PCI interface MAC MAC MAC MAC PCI controllerキャビネット
本体装置間 は光接続
4 system boards (384 cores) 3 CPUs (Nodes) 32+2 Cores + Tofu CPU 8 Cores ICC (Tofu Network)
The K Computer 2011 FX100 2015
1 system board 4 CPUs and 4 ICCs (32 cores) 18 chassis (6,912 cores) 24 chassis (768 cores)
http://accc.riken.jp/wp-content/ uploads/2015/06/chiba.pdf
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
unit
37
– ™ user’s view iFigure 14 Connection topology in main unit
C axis B axis A axis (b) xis A axisA Tofu unit
White paper FUJITSU Supercomputer PRIMEHPC FX100 – Evolution to the Next Generation
https://www.fujitsu.com/global/Images/primehpc-fx100-hard-en.pdf 16年6月28日火曜日
3rd JLESC SS@Lyon 2016
38
を 倍 スーパースカラーの強化
アシスタントコア
・通信のデーモンを処理
キーテクノロジー
core core core core core core core core core core core core core core core core Assistant core Assistant core core core core core core core core core core core core core core core core coreTofu2 interface
Tofu2 controller HMC interface HMC interface L2 cache L2 cache PCI interface MAC MAC MAC MAC PCI controllerキャビネット
本体装置間 は光接続
3 CPUs (Nodes)
The K Computer 2011 FX100 2015
18 chassis (6,912 cores) 24 chassis (768 cores) 4 system boards (384 cores) 1 system board 4 CPUs and 4 ICCs (32 cores)
The Tofu circuit is on the same CPU die, however, the Tofu circuit can keep running while the CPU cores are shutdown and power off.
32+2 Cores + Tofu
http://accc.riken.jp/wp-content/ uploads/2015/06/chiba.pdf
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
be aborted
accept jobs
network because XYZ connections for I/O are lost
39
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
density increases
40
The K’s case: Every morning, SEs replace the failed nodes
Apparent failure
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
41
# Failed Components Operation Repair Time
Apparent Failure
# Failed Components Operation Repair Time
Apparent Failure Average Average
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
failures
➡ Always having one or more number of failed
components
42
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
My Personal Opinion
43
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
failure happens unexpectedly and unusually
and algorithms
algorithms
design HPC systems having failures in mind ?
44
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
degradation due to the failures
45
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Stencil and Cartesian Topology
is revisited
fits with Cartesian topology very very well
substitutions take place, then the fitness is gone and performance degrades
46
20 40 60 80 100 120 20 40 60 80 100 120 140 160 180 200 # Collisions # Node Failures Best Average Worst
5P-Stencil Communication Performance Degradation over the Number of Failed Nodes [7]
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
topology
support are NOT met, then general protocol takes place
conditions
47
2 4 6 8 10 12 50 100 150 200 250 300 Slowdown # Node Failures K-Barrier K-Allreduce BGQ-Barrier BGQ-Allreduce BGQ-Barrier* BGQ-Allreduce*
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Regular topology turns into random topology
as the number of failed links increases
48
(Full) Dragonfly 22/28 16/28
N
e s
Sw.
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Regular topology turns into random topology
as the number of failed links increases
49
(Full) Dragonfly 22/28 16/28
N
e s
Sw.
Qualitative Change Quantitative Change
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Randomness may be an answer
50
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Randomness may be an answer
broken by failures ?
Qualitative change: Hard to imagine
50
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Randomness may be an answer
broken by failures ?
Qualitative change: Hard to imagine
Quantitative change: Easier to imagine
50
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Randomness may be an answer
broken by failures ?
Qualitative change: Hard to imagine
Quantitative change: Easier to imagine
from the beginning, forget about failures in regular systems
50
➡Random Topology ➡Random Network
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
51
– low diameter and low average path hops
100 times improvement
(a) Non-random Shortcuts (b) Random Shortcuts
1,024-node network
Good Point of Random
3
Switch degree ≈ Number of shortcuts Michihiro Koibuchi, http://research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
52
Tw Two Appr proa
hes to Quasi si-randomne randomness ss
11
Low High (not random) (fully random) Randomness
Method A Method B
start start
Quasi-random topologies
Michihiro Koibuchi, http://research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Random Routing in Hypercube
53
7
Sid C-K ChauRandom Routing in Hypercube
at least 2𝑜/2/2 steps (exponential in n)
with high probability (i.e., using more than O(n) steps has a vanishing probability converging to 0, as n0)
use bit-fixing routing from i to r(i)
worst case configuration from deterministic routing
is the worst case is very low, and is vanishing for large n
Random bit-fixing routing
r(i) r(j)0000 0000 0000 0001 0001 0100 0010 1000 1000 0011 0101 1100 0100 0001 0001 0101 1110 0101 0111 1101 1101 1110 0000 1011 1111 1110 1111 i r(i) d(i)
A two-stage configuration
Sid C-K Chau, https://www.cl.cam.ac.uk/teaching/1011/CompSysMod/RandBits_Lec2V2.pdf
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
Dynamic routing vs. Random routing
packet to go through
54
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
(≠ Brownian motion)
55
x
y
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
“An eye for an eye, a tooth for a tooth”
Randomness for randomness
Randomness MAY save the future supercomputers (not yet proven) Thank you
56
16年6月28日火曜日
3rd JLESC SS@Lyon 2016
1) High-radix router: Microarchitecture of a High-Radix Router, John Kim, William J. Dally, et. al., ISCA’05. 2) Tofu network: THE TOFU INTERCONNECT, Yuichiro Ajima, et. al., HOT INTERCONNECTS, 2012. 3) Dragonfly network: Technology-Driven, Highly-Scalable Dragonfly Topology, John Kim, William J. Dally, et. al., ISCA '08. 4) Routing algorithms: A Survey and Evaluation of Topology-Agnostic Deterministic Routing Algorithms, J. Flich et al., in IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 3,
5) Shortest path finding algorithm (Dijkstra Algorithm): A note on two problems in connexion with graphs, Dijkstra, E.W., In Numerische Mathematik, 1959. 6) Adaptive routing in Infiniband: Fail-in-place Network Design: Interaction Between Topology, Routing Algorithm and Failures, J. Domke, T. Hoefler, and S. Matsuoka, SC ’14, 2014. 7) Spare node substitution: Sliding Substitution of Failed Nodes, Atsushi Hori, et. al., In Proceedings
8) Random algorithms including the random routing: Randomized Algorithms, Rajeev Motwani and Prabhakar Raghavan, Cambridge University Press, 1995. 9) Random network: A Case for Random Shortcut Topologies for HPC Interconnects, Michihiro Koibuchi, et. al., ISCA’12. 10)Another view on HPC network robustness: Robustness Attributes of Interconnection Networks for Parallel Processing, Behrooz Parhami, Keynote lecture, 1st Int'l Suprcomputing Conf. (ISUM-2010), 2010 March 4. (https://www.ece.ucsb.edu/~parhami/pres_folder/parh10-isum-robustness-int- nets.ppt)
57
16年6月28日火曜日