robustness of interconnection networks
play

Robustness of Interconnection Networks 3rd JLESC Summer School - PowerPoint PPT Presentation

1 Robustness of Interconnection Networks 3rd JLESC Summer School Atsushi Hori RIKEN AICS 16 6 28 Self Introduction Atsushi Hori - System Software Researcher The oldest and largest governmental research institute in


  1. 1 Robustness of Interconnection Networks 3rd JLESC Summer School Atsushi Hori RIKEN AICS 16 年 6 月 28 日火曜日

  2. Self Introduction Atsushi Hori - System Software Researcher The oldest and largest governmental research institute in Japan, since 1917 Advanced Institute for Computational Science (AICS), since 2010 Running the K computer, the largest super computer in Japan (ranked 5th in Top500, Jun., 2016) involved in the Flagship2020 project to develop the post-K computer 3rd JLESC SS@Lyon 2016 2 16 年 6 月 28 日火曜日

  3. Self Introduction Atsushi Hori - System Software Researcher The oldest and largest governmental research institute in Japan, since 1917 Advanced Institute for Computational Science (AICS), since 2010 Running the K computer, the largest super computer in Japan (ranked 5th in Top500, Jun., 2016) involved in the Flagship2020 project to develop the post-K computer 3rd JLESC SS@Lyon 2016 3 16 年 6 月 28 日火曜日

  4. Self Introduction Atsushi Hori - System Software Researcher The oldest and largest governmental research institute in Japan, since 1917 DISCLAIMER Advanced Institute for Computational Science (AICS), since 2010 This contents of this talk are based Running the K computer, the largest super on my personal experiences and computer in Japan (ranked 5th in Top500, Jun., 2016) independent from the Flagship2020 involved in the Flagship2020 project to develop the post-K computer project 3rd JLESC SS@Lyon 2016 3 16 年 6 月 28 日火曜日

  5. Self Introduction Atsushi Hori - System Software Researcher The oldest and largest governmental research institute in Japan, since 1917 Advanced Institute for Computational Science (AICS), since 2010 Running the K computer, the largest super The colored slides computer in Japan (ranked 5th in Top500, Jun., 2016) are supplements involved in the Flagship2020 project to develop the post-K computer 3rd JLESC SS@Lyon 2016 3 16 年 6 月 28 日火曜日

  6. Self Introduction Atsushi Hori - System Software Researcher The venue of the next The oldest and largest governmental research JLESC, in Dec., Kobe institute in Japan, since 1917 Advanced Institute for Computational Science (AICS), since 2010 Running the K computer, the largest super computer in Japan (ranked 5th in Top500, Jun., 2016) involved in the Flagship2020 project to develop the post-K computer 3rd JLESC SS@Lyon 2016 4 16 年 6 月 28 日火曜日

  7. HPC Network • Low latency and high bandwidth • Higher performance than silicon disks • High Bi-section bandwidth • Low congestion possibility (hopefully) • Very Reliable • No error, No loss • Dense (in a computer room) • Internet covers the whole earth • Packet Switching • No circuit switching (old telephone network) 3rd JLESC SS@Lyon 2016 5 16 年 6 月 28 日火曜日

  8. Outline Network Basics Topology Routing Implementation Fault Resilience + my personal opinion 3rd JLESC SS@Lyon 2016 6 16 年 6 月 28 日火曜日

  9. Glossary • A network consists of • Nodes where packets are sent and received may include a switch (see below) • Switches (Routers) • Links connecting nodes and switches • Data transfer • Packet a unit of transfer • Message consists of multiple packets 3rd JLESC SS@Lyon 2016 7 16 年 6 月 28 日火曜日

  10. Topology 3rd JLESC SS@Lyon 2016 8 16 年 6 月 28 日火曜日

  11. Network Topologies (1) Mesh Torus Link Node FatTree “SkinnyTree” Switch Switch Switch Switch Switch Switch Switch 3rd JLESC SS@Lyon 2016 9 16 年 6 月 28 日火曜日

  12. Network Topology in Top500 • Topologies in Top500 http://www.top500.org • Torus/Mesh BG/Q, the K (Tofu) • FatTree Infiniband , Aries , Cray Gemini, Tiahne • SkinnyTree Ethernet • Misc. IBM Power 775 Torus/Mesh ❘❘❘❘ ❘❘ ❘ ❘ ❘ ❘ ❘ ❘ ❘ ❘❘❘ ❘ ❘❘ ❘ ❘❘❘❘❘ ❘ ❘❘❘ ❘ ❘ ❘ ❘ ❘ ❘ FatTree ❘ ❘❘❘❘❘ ❘❘❘❘❘❘❘❘ ❘❘❘❘❘❘❘❘ ❘❘❘❘❘ ❘❘❘❘❘❘❘ ❘❘❘ ❘❘❘❘❘❘❘❘❘❘❘ ❘ ❘ ❘❘ ❘❘ ❘❘❘❘ ❘❘ ❘❘❘❘ ❘❘❘❘❘ ❘❘❘❘❘❘❘❘❘❘❘ ❘❘❘❘ ❘❘ ❘❘❘❘ ❘ ❘❘ ❘ ❘❘❘ ❘ ❘ ❘❘ ❘❘ ❘❘ ❘❘❘ ❘❘ ❘❘❘❘❘ ❘❘❘ ❘❘ ❘ ❘ ❘ ❘❘ ❘ ❘❘ ❘❘❘❘ ❘ ❘ ❘ ❘❘ ❘ ❘ ❘❘❘ ❘❘ ❘❘ ❘❘❘❘ ❘ ❘❘ ❘ ❘ ❘❘❘❘ ❘ ❘❘❘❘ ❘❘ ❘❘ ❘ ❘ ❘❘ ❘❘❘❘❘❘ ❘ ❘❘❘❘❘ ❘❘❘ ❘ ❘❘❘ ❘❘ ❘❘ ❘ ❘ ❘ ❘❘ ❘ ❘❘❘❘❘ ❘ ❘ ❘❘ ❘❘ ❘❘❘ ❘ ❘❘ ❘ ❘ ❘❘ ❘❘ ❘❘❘ ❘❘❘ ❘❘ ❘ ❘ ❘❘ ❘ ❘❘❘ ❘ ❘❘❘❘❘❘❘ ❘❘❘❘ ❘❘ ❘ ❘ ❘ ❘❘ ❘❘ ❘ ❘ ❘❘ ❘❘ ❘❘ ❘❘❘❘ ❘ ❘❘ ❘ ❘❘❘❘ Topology SkinnyTree ❘❘ ❘ ❘ ❘❘ ❘ ❘❘❘ ❘ ❘ ❘ ❘❘ ❘ ❘❘ ❘ ❘❘❘ ❘❘ ❘❘❘ ❘❘ ❘ ❘❘ ❘❘❘❘ ❘❘ ❘❘ ❘ ❘ ❘❘❘❘❘ ❘❘❘❘ ❘ ❘❘ ❘❘ ❘ ❘❘❘❘❘ ❘❘❘❘ ❘❘❘ ❘ ❘ ❘ ❘ ❘❘ ❘ ❘ ❘❘ ❘ ❘❘❘ ❘❘❘❘ ❘ ❘ ❘ ❘ ❘❘❘ ❘ ❘ ❘❘ ❘ ❘ ❘❘ ❘ ❘❘ ❘ ❘ ❘❘ ❘❘❘ ❘❘ ❘ ❘ ❘❘❘❘❘❘❘❘❘❘ ❘ ❘ ❘ ❘❘❘ ❘ ❘ ❘ ❘❘❘ ❘❘❘❘ ❘ ❘ ❘ ❘ ❘ ❘ ❘❘ ❘ ❘❘ ❘ ❘ ❘ ❘ ❘❘ ❘❘ ❘❘❘❘ ❘ ❘ ❘❘❘❘ ❘❘ ❘❘ ❘ ❘ ❘ ❘❘❘ Misc. ❘ ❘❘ 0 50 100 150 200 250 300 350 400 450 500 Rank in Top500 as of Nov. 2015 3rd JLESC SS@Lyon 2016 10 16 年 6 月 28 日火曜日

  13. Network Topologies (2) Hypercube Dragonfly Link Node Nodes Sw. Cray XC series CM-2, nCUBE in 90s and many others ( ring , star , butterfly , to name a few) 3rd JLESC SS@Lyon 2016 11 16 年 6 月 28 日火曜日

  14. Routing 3rd JLESC SS@Lyon 2016 12 16 年 6 月 28 日火曜日

  15. Routing • Find a path from a sender node to a receiver node • Ex) X-Y (Dimension Order) Routing in 2D Mesh Node N j N i 3rd JLESC SS@Lyon 2016 13 16 年 6 月 28 日火曜日

  16. Deadlock • A routing algorithm on a network topology must be deadlock free • Cyclic path can cause deadlock • Deadlock can be avoided by having bypass • Virtual channels Sw. Sw. 2 (virtual) 1 channel channels 3rd JLESC SS@Lyon 2016 14 16 年 6 月 28 日火曜日

  17. Deadlock • A routing algorithm on a network topology must be deadlock free • Cyclic path can cause deadlock • Deadlock can be avoided by having bypass • Virtual channels 3rd JLESC SS@Lyon 2016 15 16 年 6 月 28 日火曜日

  18. Hot Spot • Hot spot • Packet congestion happens • 2D Mesh Hot spot at the center • 2D Torus No hot spots Node 3rd JLESC SS@Lyon 2016 16 16 年 6 月 28 日火曜日

  19. Partitioning • Multiple jobs can run on a big machine • Node space is partitioned • Partitioning may change topology of a job • Jobs may have interference Node Node Job C Job B Job A Job A Job C Job D Job C Job D Job B Job B, C and D can 2D torus turns into 2D mesh interfere with the others 3rd JLESC SS@Lyon 2016 17 16 年 6 月 28 日火曜日

  20. Dynamic (Adaptive) Routing • Static Routing • Once a path is fixed, packets go along with the path • Dynamic (adaptive) Routing • Paths can be changed dynamically according to the state of the network • Issues • Algorithm: how, who, when ? • Deadlock free • Route changing latency & H/W resource • Stability (see next slide) • Packet order is not preserved (see next of next) 3rd JLESC SS@Lyon 2016 18 16 年 6 月 28 日火曜日

  21. Oscillation in Adaptive Routing Two roads to the same destination 1. One is very crowded 2. The radio says the other is empty 3. Everybody rushes into the other road 4. (repeat 1-3) 3rd JLESC SS@Lyon 2016 19 16 年 6 月 28 日火曜日

  22. Packet Order • Adaptive routing cannot preserve packet ordering • This can be problematic when receiving large messages consisting of multiple packets Sending Order = Receiving Order Sending Order ≠ Receiving Order … … P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 0 P 5 P 3 P 2 P 4 P 7 P 9 P 6 P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 0 P 2 P 3 P 4 P 5 P 6 P 7 Recvbuf 0 Recvbuf 1 Recvbuf 0 Recvbuf 1 3rd JLESC SS@Lyon 2016 20 16 年 6 月 28 日火曜日

  23. Metrics • Topology • The higher radix, the smaller network diameter • Network Diameter • High-Radix or Low-Radix • Performance • Whole • Bisection Bandwidth • P2P • Bandwidth and Latency • Hop count • Collective Operations (Barrier, and so on) • Latency 3rd JLESC SS@Lyon 2016 21 16 年 6 月 28 日火曜日

  24. Implementation 3rd JLESC SS@Lyon 2016 22 16 年 6 月 28 日火曜日

  25. Installation of the K Computer 3rd JLESC SS@Lyon 2016 23 16 年 6 月 28 日火曜日

  26. Direct/Indirect Network • Direct or Indirect Network Note: In many books, direct or indirect network is categorized • Direct network as an aspect of topology • Every node has a switch inside • Indirect network • Node has no switch Machine/ Network Direct/Indirect the K ( Tofu ) Direct BG/Q Direct Infiniband Indirect Ethernet Indirect 3rd JLESC SS@Lyon 2016 24 16 年 6 月 28 日火曜日

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend