HiRy: An Advanced Theory on Design of Deadlock-free Adaptive Routing - - PowerPoint PPT Presentation

hiry an advanced theory on design of deadlock free
SMART_READER_LITE
LIVE PREVIEW

HiRy: An Advanced Theory on Design of Deadlock-free Adaptive Routing - - PowerPoint PPT Presentation

1 HiRy: An Advanced Theory on Design of Deadlock-free Adaptive Routing for Arbitrary Topologies 2017/12/17 Ryuta Kawano Keio Univ., Japan Ryota Yasudo Keio Univ., Japan Hiroki Matsutani Keio Univ., Japan Michihiro


slide-1
SLIDE 1

HiRy: An Advanced Theory on Design of Deadlock-free Adaptive Routing for Arbitrary Topologies

2017/12/17 Ryuta Kawano (Keio Univ., Japan) Ryota Yasudo (Keio Univ., Japan) Hiroki Matsutani (Keio Univ., Japan) Michihiro Koibuchi (NII, Japan) Hideharu Amano (Keio Univ., Japan) 1

slide-2
SLIDE 2

Outline

  • Low-latency Network Topologies for HPC systems
  • Conventional Deadlock-free Routing Methods
  • EbDa – A Generalized Theorem to Design

Adaptive Routing for Mesh and Torus

  • HiRy - An Advanced Theorem to Design

Adaptive Routing for Arbitrary Topologies

  • Evaluation by Network Simulation
  • Conclusion

2

slide-3
SLIDE 3

Subject: Inter-switch Networks for HPC Systems

  • Network topologies

are determined based

  • n the required

performance and scalability.

  • Fat-tree, Torus,

Dragonfly [1] are widely used for HPC systems.

3

Fat-tree Torus Dragonfly [1]

[1] J. Kim, W. J. Dally, S. Scott and D. Abts: “Technology-Driven, Highly-Scalable Dragony Topology", ISCA’08.

slide-4
SLIDE 4

Inter-Switch Irregular Topology

Reduction of # of hops with randomized links

4 Irregular topologies Regular (Non-Random) topologies

Low-latency Irregular Topologies [2,3] for HPC systems

(1,024sw)

[2] M. Koibuchi et al.: “A Case for Random Shortcut Topologies for HPC Interconnects", ISCA’12. [3] H. Yang et al.: “Dodec: Random-Link, Low-Radix On-Chip Networks”, MICRO’14.

slide-5
SLIDE 5

Outline

  • Low-latency Network Topologies for HPC systems
  • Conventional Deadlock-free Routing Methods
  • EbDa – A Generalized Theorem to Design

Adaptive Routing for Mesh and Torus

  • HiRy - An Advanced Theorem to Design

Adaptive Routing for Arbitrary Topologies

  • Evaluation by Network Simulation
  • Conclusion

5

slide-6
SLIDE 6

Challenge: Deadlock-free Routing

  • Routing methods for irregular topologies have to

support deadlock-freedom while

  • reducing the # of hops to achieve the low latency.
  • making alternative paths available to avoid the

congestion.

  • Conventional topology-independent routing

methods for irregular topologies

  • LASH-TOR
  • Duato’s protocol

6

slide-7
SLIDE 7

LASH-TOR [4]

  • Layered virtual networks generated with

multiple Virtual Channels (VCs)

  • Permitting transitions to achieve minimal routing
  • ○: Minimal paths,

×: Alternative paths

7

[4] T. Skeie, O. Lysne, J. Flich, P . Lopez, A. Robles and J. Duato: "LASH-TOR: A Generic Transition-Oriented Routing Algorithm", ICPADS'04.

physical NW virtual NWs VC2 VC1 flows Transition channel

slide-8
SLIDE 8

Duato’s Protocol [5]

  • Layered virtual networks generated with

multiple Virtual Channels (VCs) as LASH-TOR

  • Minimal routing on a virtual network and

non-minimal and deadlock-free routing on another virtual network

  • △: Minimal paths,

○: Alternative paths

  • Non-minimal routing on high load

8

[5] F. Silla and J. Duato: "Improving the Efficiency of Adaptive Routing in Networks with Irregular Topology", HiPC‘97.

slide-9
SLIDE 9

Comparison of Topology-independent Routing Methods

9 LASH-TOR Duato’s Minimal Paths ○ △ Alternative Paths × ○

  • Challenge: Designing routing methods achieving

minimal paths and alternative paths for irregular networks

slide-10
SLIDE 10

Outline

  • Low-latency Network Topologies for HPC systems
  • Conventional Deadlock-free Routing Methods
  • EbDa – A Generalized Theorem to Design

Adaptive Routing for Mesh and Torus

  • HiRy - An Advanced Theorem to Design

Adaptive Routing for Arbitrary Topologies

  • Evaluation by Network Simulation
  • Conclusion

10

slide-11
SLIDE 11

Turn Model

11

  • Routing theorem

for Mesh and Torus

  • prohibiting a part of turns to avoid loops
  • Example: West-first routing

– West channels are available before using {North East, South} channels.

  • ○: Minimal paths,

○: Alternative paths

slide-12
SLIDE 12

EbDa [6] - Generalized Theorems

  • f the Turn Model
  • Available turns on West-first routing are illustrated

by arrows in the left figure.

  • The directions available arbitrarily and repeatedly can be arranged

into a group called a partition in EbDa.

  • A transition between partitions can be illustrated

in the right figure.

N E W S

Partition 1 Partition 2 transition 12

[6] M. Ebrahimi et al: " EbDa: A New Theory on Design and Verification of Deadlock-free Interconnection Networks", ISCA’17.

slide-13
SLIDE 13

Deadlock-free Routing in EbDa

  • An intuitive proof for deadlock-

freedom

  • An example of a routed path in

the bottom-right figure

Partition 1 Partition 2 transition src. transition …

13

slide-14
SLIDE 14

Deadlock-free Routing in EbDa

  • An intuitive proof for deadlock-

freedom

  • An example of a routed path in

the bottom-right figure

  • West channels available before

the transition

  • The uni-directional transition can

avoid loops among partitions.

Partition 1 Partition 2 transition src. transition …

14

slide-15
SLIDE 15

Deadlock-free Routing in EbDa

  • An intuitive proof for deadlock-

freedom

  • An example of a routed path in

the bottom-right figure

  • West channels available before

the transition

  • The uni-directional transition can

avoid loops among partitions.

  • After the transition, {North, East,

South} channels are available.

  • Packets cannot cause loops

because they have to move along the eastern direction monotonically.

Partition 1 Partition 2 transition src. transition …

15

slide-16
SLIDE 16

Outline

  • Low-latency Network Topologies for HPC systems
  • Conventional Deadlock-free Routing Methods
  • EbDa – A Generalized Theorem to Design

Adaptive Routing for Mesh and Torus

  • HiRy - An Advanced Theorem to Design

Adaptive Routing for Arbitrary Topologies

  • Evaluation by Network Simulation
  • Conclusion

16

slide-17
SLIDE 17

Proposal:Extention of the EbDa Theorems for Arbitrary Networks (≒ Irregular NWs)

17

4×4 Random Topology Partition 1 Partition 2

  • Grouping channels based on their monotonic

directions including diagonal ones

  • An example in the bottom figures
  • Partition1: North channels
  • Partition2: South channels
slide-18
SLIDE 18

Design of Routing based on the Proposed Theory

  • An example of routed

paths(the right figure)

  • The channels in Partition 1

available before those in Partition 2

  • Packets can avoid loops because

they have to move monotonically in each partition.

  • As the turn model,

congestion can be avoided by alternative paths.

18

src dst

slide-19
SLIDE 19

Other Partitions Derived from the Different Monotonic Directions

19

4×4 Random Topology Partition 1 Partition 2

  • Partitions can be generated for arbitrary monotonic

directions.

  • An example in the bottom figures
  • Partition1: West channels
  • Partition2: East channels
slide-20
SLIDE 20

An Implementation of Deadlock-free Routing based on the proposed theory

  • Virtual networks

generated with multiple Virtual Channels (VCs) as LASH-TOR and Duato’s protocol

20

Virtual NW 1 (# of VC = 2) Virtual NW 2

slide-21
SLIDE 21

An Implementation of Deadlock-free Routing based on the proposed theory

21

(# of VC = 2)

  • Virtual networks

generated with multiple Virtual Channels (VCs) as LASH-TOR and Duato’s protocol

  • Partitions generated in each

virtual Network

Virtual NW 1 Virtual NW 2

slide-22
SLIDE 22

An Implementation of Deadlock-free Routing based on the proposed theory

22

(# of VC = 2)

  • Virtual networks

generated with multiple Virtual Channels (VCs) as LASH-TOR and Duato’s protocol

  • Partitions generated in each

virtual Network

  • The order of the partitions

are sorted to reduce the average path hops.

Virtual NW 1 Virtual NW 2

slide-23
SLIDE 23

An Implementation of Deadlock-free Routing based on the proposed theory

23

(# of VC = 2) Partition 1 Partition 2 Partition 3 Partition 4

  • Virtual networks

generated with multiple Virtual Channels (VCs) as LASH-TOR and Duato’s protocol

  • Partitions generated in each

virtual Network

  • The order of the partitions

are sorted to reduce the average path hops.

Virtual NW 1 Virtual NW 2

slide-24
SLIDE 24

Outline

  • Low-latency Network Topologies for HPC systems
  • Conventional Deadlock-free Routing Methods
  • EbDa – A Generalized Theorem to Design

Adaptive Routing for Mesh and Torus

  • HiRy - An Advanced Theorem to Design

Adaptive Routing for Arbitrary Topologies

  • Evaluation by Network Simulation
  • Conclusion

24

slide-25
SLIDE 25

Network Simulation Environment

  • Booksim simulator [7]
  • Evaluating
  • LASH-TOR
  • Duato’s protocol
  • up*/down* routing for non-

minimal deadlock-free paths

  • HiRy-based implementation
  • # of dimensions =2, 3, 4
  • Applying 4 traffics
  • Uniform, Transpose,

Reverse, Shuffle 25

[7] N. Jiang et al. : “A Detailed and Flexible Cycle-Accurate Network-on-Chip Simulator,” ISPASS’13.

Topology and simulation parameters

NW topology Random regular topology # of nodes (SWs) 256 Degree (# of ports) 13

(required for LASH-TOR)

Simulation period 100,000 cycles Packet size 1 flit # of VCs 2 Buffer size / VC 8 flits # of pipeline stages 4

slide-26
SLIDE 26

NW Simulation Results (256 nodes)

  • Improving the

throughput with alternative paths by up to 138% compared with LASH- TOR

  • Reducing the latency

with minimal paths by up to 2.9% compared with Duato’s protocol

26

(uniform) (shuffle) (transpose) (reverse)

slide-27
SLIDE 27

Conclusions

  • HiRy, a theory to design deadlock-free routing with

the low latency and the high throughput for irregular networks

  • Extention of the EbDa theorems,

generalization of the turn model

  • An Implementation of the routing method based on

HiRy

  • Improving the throughput by up to 138% compared with

LASH-TOR

  • Reducing the latency by up to 2.9% compared with Duato’s

protocol

27