for 3D Network-on-Chip Akram Ben Ahmed, Abderazek Ben Abdallah The - - PowerPoint PPT Presentation

for 3d network on chip
SMART_READER_LITE
LIVE PREVIEW

for 3D Network-on-Chip Akram Ben Ahmed, Abderazek Ben Abdallah The - - PowerPoint PPT Presentation

ICNC-12 Okinawa, Japan Dec 05-07 2012 Low-overhead Routing Algorithm for 3D Network-on-Chip Akram Ben Ahmed, Abderazek Ben Abdallah The University of Aizu School of Computer Science and Engineering, Adaptive Systems Laboratory, Aizu-Wakamatsu,


slide-1
SLIDE 1

Low-overhead Routing Algorithm for 3D Network-on-Chip

Akram Ben Ahmed, Abderazek Ben Abdallah The University of Aizu School of Computer Science and Engineering, Adaptive Systems Laboratory, Aizu-Wakamatsu, Japan. Email:d8141104@u-aizu.ac.jp

ICNC-12 Okinawa, Japan Dec 05-07 2012

The University of Aizu Adaptive systems lab 1

slide-2
SLIDE 2

Outline

  • Background
  • Contribution
  • Low Overhead Look Ahead XYZ routing
  • 3D-ONoC Architecture
  • Evaluation
  • Conclusion
  • Future Work

The University of Aizu Adaptive systems lab 2

slide-3
SLIDE 3

Outline

  • Background
  • Contribution
  • Low Latency Look Ahead XYZ routing
  • 3D-ONoC Architecture
  • Evaluation
  • Conclusion
  • Future Work

The University of Aizu Adaptive systems lab 2

slide-4
SLIDE 4

The University of Aizu Adaptive systems lab 3 Bus based system Memory 1 I/O Memory 2

Core1 Core2 Core3

Data Data Data

Wait Wait Parallelism problem

Background: Bus-based system Vs. NoC

slide-5
SLIDE 5

The University of Aizu Adaptive systems lab 5

[*] A. Ben Abdallah, M.Sowa, Basic Network-on-Chip Interconnection for Future Gigascale MCSoCs Applications: Communication and Computation Orthogonalization, JASSST2006, Dec. 4-9th, 2006.

OASIS-NoC [*]

Background: Bus-based system Vs. NoC

slide-6
SLIDE 6

The University of Aizu Adaptive systems lab 5 OASIS-NoC [*]

[*] A. Ben Abdallah, M.Sowa, Basic Network-on-Chip Interconnection for Future Gigascale MCSoCs Applications: Communication and Computation Orthogonalization, JASSST2006, Dec. 4-9th, 2006.

Background: Bus-based system Vs. NoC

slide-7
SLIDE 7

Background: 2D-NoC limitations

The University of Aizu Adaptive systems lab 6

10 00 11 01

2x2

10 00 11 01 12 02 20 21 22

3x3

10 00 11 01 12 02 13 03 20 21 22 23 30 31 32 33

4x4

10 00 11 01 12 02 13 03 20 21 22 23 14 04 15 05 24 25 30 31 32 33 40 41 42 43 34 35 44 45 50 51 52 53 54 55 16 06 17 07 26 27 36 37 46 47 56 57 60 61 62 63 64 65 70 71 72 73 74 75 66 67 76 77

8x8

slide-8
SLIDE 8

00 00 10 20 30 40 50 60 70 77 77

Background: 2D-NoC limitations

The University of Aizu Adaptive systems lab 7

11 01 12 02 13 03 21 22 23 14 04 15 05 24 25 31 32 33 41 42 43 34 35 44 45 51 52 53 54 55 16 06 17 07 26 27 36 37 46 47 56 57 61 62 63 64 65 71 72 73 74 75 66 67 76

  • The number of hops

between nodes increases linearly when we increase the network size

– This increasing distance has an effect on the latency, throughput and power consumption

slide-9
SLIDE 9

Background: 2D-NoC limitations

The University of Aizu Adaptive systems lab 8

10 00 11 01 12 02 13 03 20 21 22 23 14 04 15 05 24 25 30 31 32 33 40 41 42 43 34 35 44 45 50 51 52 53 54 55 16 06 17 07 26 27 36 37 46 47 56 57 60 61 62 63 64 65 70 71 72 73 74 75 66 67 76 77

  • Torus Topology [Daly1986]:

Connects the network extremities to reduce the inter-node distance

slide-10
SLIDE 10

Background: 2D-NoC limitations

The University of Aizu Adaptive systems lab 8

10 00 11 01 12 02 13 03 20 21 22 23 14 04 15 05 24 25 30 31 32 33 40 41 42 43 34 35 44 45 50 51 52 53 54 55 16 06 17 07 26 27 36 37 46 47 56 57 60 61 62 63 64 65 70 71 72 73 74 75 66 67 76 77

  • Torus Topology [Daly1986]:

Connects the network extremities to reduce the inter-node distance

– Increasing complexity – Different wire lengths

slide-11
SLIDE 11

Background: 2D-NoC limitations

The University of Aizu Adaptive systems lab 9

  • Short Pass link [Kim2007][**]:

establishes connections between some (source, node) pairs having the longest distance and higher communication frequency

10 00 11 01 12 02 13 03 20 21 22 23 14 04 15 05 24 25 30 31 32 33 40 41 42 43 34 35 44 45 50 51 52 53 54 55 16 06 17 07 26 27 36 37 46 47 56 57 60 61 62 63 64 65 70 71 72 73 74 75 66 67 76 77 00 77

[**] A. Ben Ahmed, K. Mori, A. Ben Abdallah, ONoC-SPL Customized Network-on-Chip (NoC) Architecture and Prototyping for Data- intensive Computation Applications, iCAST-2012, Aug. 2012

slide-12
SLIDE 12

Background: 2D-NoC limitations

The University of Aizu Adaptive systems lab 9

– Long wire – Clock skew problems

10 00 11 01 12 02 13 03 20 21 22 23 14 04 15 05 24 25 30 31 32 33 40 41 42 43 34 35 44 45 50 51 52 53 54 55 16 06 17 07 26 27 36 37 46 47 56 57 60 61 62 63 64 65 70 71 72 73 74 75 66 67 76 77 00 77

  • Short Pass link [Kim2007][**]:

establishes connections between some (source, node) pairs having the longest distance and higher communication frequency

[**] A. Ben Ahmed, K. Mori, A. Ben Abdallah, ONoC-SPL Customized Network-on-Chip (NoC) Architecture and Prototyping for Data- intensive Computation Applications, iCAST-2012, Aug. 2012

slide-13
SLIDE 13

Background: 3D-NoC solution

The University of Aizu Adaptive systems lab 10

10 00 11 01 12 02 13 03 20 21 22 23 14 04 15 05 24 25 30 31 32 33 40 41 42 43 34 35 44 45 50 51 52 53 54 55 16 06 17 07 26 27 36 37 46 47 56 57 60 61 62 63 64 65 70 71 72 73 74 75 66 67 76 77

Layer1 Layer3

30 31 32 33 20 21 22 32 10 11 21 31 00 01 02 03 30 31 32 33 20 21 22 32 10 11 21 31 00 01 02 03 30 31 32 33 20 21 22 32 10 11 21 31 00 01 02 03 30 31 32 33 20 21 22 32 10 11 21 31 00 01 02 03

Layer4 Layer2

slide-14
SLIDE 14

30 31 32 33 20 21 22 32 10 11 21 31 00 01 02 03 30 31 32 33 20 21 22 32 10 11 21 31 00 01 02 03 30 31 32 33 20 21 22 32 10 11 21 31 00 01 02 03 30 31 32 33 20 21 22 32 10 11 21 31 00 01 02 03

Background: 3D-NoC solution

The University of Aizu Adaptive systems lab 10

10 00 11 01 12 02 13 03 20 21 22 23 14 04 15 05 24 25 30 31 32 33 40 41 42 43 34 35 44 45 50 51 52 53 54 55 16 06 17 07 26 27 36 37 46 47 56 57 60 61 62 63 64 65 70 71 72 73 74 75 66 67 76 77

  • Decreasing the number of hops between nodes in a

scalable way [***]

[***] A. Ben Ahmed, A. Ben Abdallah, K. Kuroda, Architecture and Design of Efficient 3D Network-on-Chip (3D NoC) for Custom Multicore SoC, IEEE Proc. of BWCCA-2010, Nov. 2010.

slide-15
SLIDE 15

3x3x3 Mesh topology 3D-NOC configuration example

Intra-layer links The University of Aizu Adaptive systems lab 11

000 001 002 020 021 022 010 011 012 100 101 102 120 121 122 110 111 112 200 201 202 220 221 222 210 211 212

Inter-layer links

X Y Z

Routers Router address

Background: 3D-NoC solution

slide-16
SLIDE 16

The University of Aizu Adaptive systems lab 11

000 001 002 020 021 022 010 011 012 100 101 102 120 121 122 110 111 112 200 201 202 220 221 222 210 211 212

3D-NoC routing algorithm: Related Works

slide-17
SLIDE 17

The University of Aizu Adaptive systems lab 12

000 001 002 020 021 022 010 011 012 100 101 102 120 121 122 110 111 112 200 201 202 220 221 222 210 211 212

3D-NoC routing algorithm: Related Works

X dimension Z dimension

  • XYZ [Kamali2005]
slide-18
SLIDE 18

The University of Aizu Adaptive systems lab 13

000 001 002 020 021 022 010 011 012 100 101 102 120 121 122 110 111 112 200 201 202 220 221 222 210 211 212

3D-NoC routing algorithm: Related Works

  • XYZ [Kamali2005]
  • RPM [Lin2008]
slide-19
SLIDE 19

The University of Aizu Adaptive systems lab 14

000 001 002 020 021 022 010 011 012 100 101 102 120 121 122 110 111 112 200 201 202 220 221 222 210 211 212

3D-NoC routing algorithm: Related Works

  • XYZ [Kamali2005]
  • RPM [Lin2008]
  • RTM [Chao2010]

Heat Sink

slide-20
SLIDE 20

The University of Aizu Adaptive systems lab 15

000 001 002 020 021 022 010 011 012 100 101 102 120 121 122 110 111 112 200 201 202 220 221 222 210 211 212

3D-NoC routing algorithm: Related Works

  • XYZ [Kamali2005]
  • RPM [Lin2008]
  • RTM [Chao2010]
  • High communication latency
  • High level simulation does not provide accurate evaluation
slide-21
SLIDE 21

Outline

  • Background
  • Contribution
  • Low Latency Look Ahead XYZ routing
  • 3D-ONoC Architecture
  • Evaluation
  • Conclusion
  • Future Work

The University of Aizu Adaptive systems lab 16

slide-22
SLIDE 22
  • Proposal of efficient low overhead routing

algorithm named Look Ahead XYZ (LA-XYZ)

  • Architecture and design of 3D-Network-on-

Chip named 3D-OASIS-NoC (3D-ONoC)

  • Complexity and performance evaluation
  • Comparison with well-known 3D-NoC routings

The University of Aizu Adaptive systems lab 17

Contribution

slide-23
SLIDE 23

Outline

  • Background
  • Contribution
  • Low Latency Look Ahead XYZ routing
  • 3D-ONoC Architecture
  • Evaluation
  • Conclusion
  • Future Work

The University of Aizu Adaptive systems lab 18

slide-24
SLIDE 24

Define New Next-port

Look Ahead XYZ Routing Algorithm: Phase 1: Define next address

Next-port == East next-xadr= xadr+1 Yes Start Next-port == West next-xadr= xadr-1 Yes next-xadr= xadr No No Next-port == North next-yadr= yadr+1 Yes Next-port == South next-yadr= yadr-1 Yes next-yadr= yadr No No Next-port == Up next-zadr= zadr+1 Yes Next-port == Down next-zadr= zadr-1 Yes next-zadr= zadr No No next-zadr next-yadr next-xadr next-port

The University of Aizu Adaptive systems lab 19

slide-25
SLIDE 25

Next-xadr == xdst Next-xadr < xdst Next-yadr == ydst Next-yadr < ydst Next-zadr == zdst Next-zadr < zdst New_Next_port = EAST New_Next_port = WEST New_Next_port =NORTH Next_port = SOUTH Next_port = UP Next_port = DOWN Next_port = LOCAL No No Yes Yes Yes No No No Yes Yes No Yes Define New Next-port

Look Ahead XYZ Routing Algorithm: Phase 2: Define New-next-port

The University of Aizu Adaptive systems lab 20

End

slide-26
SLIDE 26

Look Ahead XYZ Routing Algorithm: Example

The University of Aizu Adaptive systems lab 21

Input port Crossbar

X

Switch Arbiter

LA-XYZ

F I F O

Example illustrating LA-XYZ

slide-27
SLIDE 27

Look Ahead XYZ Routing Algorithm: Example

The University of Aizu Adaptive systems lab 21

Input port Crossbar

X

Switch Arbiter

LA-XYZ

F I F O

From Previous node Example illustrating LA-XYZ

Dest. Port Pay. tail

slide-28
SLIDE 28

Look Ahead XYZ Routing Algorithm: Example

The University of Aizu Adaptive systems lab 21

Input port Crossbar

X

Switch Arbiter

LA-XYZ

F I F O

From Previous node Example illustrating LA-XYZ

D Pr Py T

slide-29
SLIDE 29

Look Ahead XYZ Routing Algorithm: Example

The University of Aizu Adaptive systems lab 21

Input port Crossbar

X

Switch Arbiter

LA-XYZ

F I F O

Dest. Port Port

Example illustrating LA-XYZ

D Pr Py T

slide-30
SLIDE 30

Look Ahead XYZ Routing Algorithm: Example

The University of Aizu Adaptive systems lab 22

Input port Crossbar

X

Switch Arbiter

LA-XYZ cntrl

F I F O

Example illustrating LA-XYZ

Dest. Port Data tail New

slide-31
SLIDE 31

Look Ahead XYZ Routing Algorithm: Example

The University of Aizu Adaptive systems lab 22

Input port Crossbar

X

Switch Arbiter

LA-XYZ cntrl

F I F O

Example illustrating LA-XYZ To the next node

Dest. New Data tail

slide-32
SLIDE 32

Outline

  • Background
  • Contribution
  • Low Latency Look Ahead XYZ routing
  • 3D-ONoC Architecture
  • Evaluation
  • Conclusion
  • Future Work

The University of Aizu Adaptive systems lab 23

slide-33
SLIDE 33

3D-ONoC Architecture

The University of Aizu Adaptive systems lab 24

BW RC/SA CT

slide-34
SLIDE 34

Cycle 1 2 3 4 5

3D-ONoC Architecture: Router pipeline stages

The University of Aizu Adaptive systems lab 25

Flit 1

BW LA-XYZ router pipeline

1- Buffer Writing: incoming flit is stored in FIFO buffer

slide-35
SLIDE 35

Cycle 1 2 3 4 5

3D-ONoC Architecture: Router pipeline stages

The University of Aizu Adaptive systems lab 25

Flit 1

BW RC/SA LA-XYZ router pipeline

Flit 2

BW

1- Buffer Writing: incoming flit is stored in FIFO buffer 2- Routing Calculation (RC) and Switch Arbitration (SA) a- RC: the New-next-port for the next node is calculated b- SA: allocates the appropriate out-port channel to the in-port

slide-36
SLIDE 36

Cycle 1 2 3 4 5

3D-ONoC Architecture: Router pipeline stages

The University of Aizu Adaptive systems lab 25

Flit 1

BW RC/SA CT LA-XYZ router pipeline

Flit 2

BW RC/SA

Flit 3

BW

1- Buffer Writing: incoming flit is stored in FIFO buffer 2- Routing Calculation (RC) and Switch Arbitration (SA) a- RC: the New-next-port for the next node is calculated b- SA: allocates the appropriate out-port channel to the in-port 3- Crossbar Traversal: Transfer the flit to the next node

slide-37
SLIDE 37

Cycle 1 2 3 4 5

3D-ONoC Architecture: Router pipeline stages

The University of Aizu Adaptive systems lab 25

Flit 1

BW RC/SA CT LA-XYZ router pipeline

Flit 2

BW RC/SA CT

Flit 3

BW RC/SA CT

1- Buffer Writing: incoming flit is stored in FIFO buffer 2- Routing Calculation (RC) and Switch Arbitration (SA) a- RC: the New-next-port for the next node is calculated b- SA: allocates the appropriate out-port channel to the in-port 3- Crossbar Traversal: Transfer the flit to the next node

slide-38
SLIDE 38

3D-ONoC Architecture

The University of Aizu Adaptive systems lab 26

slide-39
SLIDE 39

The University of Aizu Adaptive systems lab 39

3D-ONoC Architecture: Input port

slide-40
SLIDE 40

3D-ONoC Architecture

The University of Aizu Adaptive systems lab 27

slide-41
SLIDE 41

The University of Aizu Adaptive systems lab 27

3D-ONoC Architecture: Switch Allocator

slide-42
SLIDE 42

3D-ONoC Architecture: Switch Allocator (Stall-Go)

The University of Aizu Adaptive systems lab 28

slide-43
SLIDE 43

3D-ONoC Architecture: Switch Allocator (Stall-Go)

The University of Aizu Adaptive systems lab 28

slide-44
SLIDE 44

3D-ONoC Architecture: Switch Allocator (Stall-Go)

The University of Aizu Adaptive systems lab 28

slide-45
SLIDE 45

3D-ONoC Architecture: Switch Allocator (Stall-Go)

The University of Aizu Adaptive systems lab 28

slide-46
SLIDE 46

3D-ONoC Architecture: Switch Allocator (Stall-Go)

The University of Aizu Adaptive systems lab 28

slide-47
SLIDE 47

3D-ONoC Architecture

The University of Aizu Adaptive systems lab 30

slide-48
SLIDE 48

Outline

  • Background
  • Contribution
  • Low Latency Look Ahead XYZ routing
  • 3D-ONoC Architecture
  • Evaluation
  • Conclusion
  • Future Work

The University of Aizu Adaptive systems lab 31

slide-49
SLIDE 49
  • We evaluate:

– Hardware complexity

  • Area(ALUTs/Hcells)
  • Power(Static/Dynamic)
  • Speed

– System performance

  • Stall count
  • Latency/flit
  • Throughput

The University of Aizu Adaptive systems lab 32

Evaluation: Evaluation methodology

  • Benchmarks:

– JPEG encoder – Matrix Multiplication

  • (3x3 and 6x6)

– Transpose traffic pattern

  • We use:

– Verilog HDL – Quartus II ver. 10.0 – Modelsim ver. 6.5

  • We compare with:

– Dimension Order Routing (XYZ) – Randomly Partially Minimal (RPM)

slide-50
SLIDE 50

Evaluation: Evaluation parameters

The University of Aizu Adaptive systems lab 33

slide-51
SLIDE 51

The University of Aizu Adaptive systems lab 34

Evaluation: Hardware complexity

The Balanced setting seems to be the most appropriate for the evaluation providing a good tradeoff between area, power and speed.

slide-52
SLIDE 52

The University of Aizu Adaptive systems lab 34

Evaluation: Hardware complexity comparison

+5.17% Vs. XYZ

  • 1% Vs. RPM
  • 5.98% Vs. XYZ

+4.93% Vs. RPM Slight power consumption difference

slide-53
SLIDE 53

Evaluation: Dynamic power

The University of Aizu Adaptive systems lab 35

Slight improvement Vs. XYZ 18% improvement Vs. LA-XYZ

500 1000 1500 2000 2500 3000 3500 4000 4500 Transpose 3x3 Matrix 6x6 Matrix Dynamic power (mW) RPM XYZ LA-XYZ

slide-54
SLIDE 54

Evaluation: Performance (Stall count)

The University of Aizu Adaptive systems lab 37

  • 33% Vs. XYZ
  • 45% VS. RPM
  • 59% Vs. XYZ
  • 52% VS. RPM
  • 46% Vs. XYZ
  • 48% Vs. RPM
slide-55
SLIDE 55

Evaluation: Performance (Latency)

  • 30% Vs. XYZ
  • 35% Vs. RPM

The University of Aizu Adaptive systems lab 36

  • 33% Vs. XYZ
  • 41% VS. RPM
  • 29% Vs. XYZ
  • 36% VS. RPM
  • 25% Vs. XYZ
  • 28% VS. RPM
slide-56
SLIDE 56

Evaluation: Performance (Throughput)

The University of Aizu Adaptive systems lab 38

  • 28.6% Vs. XYZ
  • 38.7% Vs. RPM
  • 33% Vs. XYZ
  • 45% VS. RPM
  • 30% Vs. XYZ
  • 37% VS. RPM
  • 23% Vs. XYZ
  • 34% VS. RPM
slide-57
SLIDE 57

Evaluation: Performance (JPEG)

The University of Aizu Adaptive systems lab 38

  • 19.2% Vs. XYZ
  • 36.3% VS. RPM

+19.3% Vs. XYZ +36.8% VS. RPM

slide-58
SLIDE 58

Outline

  • Background
  • Contribution
  • Low Latency Look Ahead XYZ routing
  • 3D-ONoC Architecture
  • Evaluation
  • Conclusion
  • Future Work

The University of Aizu Adaptive systems lab 39

slide-59
SLIDE 59

Conclusion

  • Proposal of an Efficient low overhead routing

algorithm named Look Ahead XYZ (LA-XYZ)

  • Architecture and design of 3D-Network-on-Chip

named 3D-OASIS-NoC (3D-ONoC)

  • Complexity and performance evaluation

The University of Aizu Adaptive systems lab 40

slide-60
SLIDE 60

Conclusion

  • LA-XYZ exhibits only 5.71% area overhead and 6%

increasing clock frequency with almost the same behavior in terms of power consumption compared to Dimension Order Routing (XYZ). When compared to Randomly Partially Minimal (RPM), LA-XYZ is 4.9% faster and 18.8% less power.

  • When compared to XYZ and RPM respectively, LA-

XYZ reduces the stall count with 46% and 48%, decreases the latency with 25% and 35%, and enhances the system throughput with 23% and 37%.

The University of Aizu Adaptive systems lab 41

slide-61
SLIDE 61

Future Work

  • LA-XYZ is a static routing algorithm:
  • Does not support fault tolerance
  • Does not consider network congestion

Optimize LA-XYZ to a minimal, fault tolerant, and congestion aware routing

  • Investigate about the performance of LA-XYZ in

terms of thermal power

The University of Aizu Adaptive systems lab 42

slide-62
SLIDE 62

References

  • [Daly1986]: W.J. Dally and C.L. Seitz, “The Torus Routing Chip,”

Technical Report 5208:TR: 86, Computer ScienceDept., California Inst. of Technology, pp. 1-19, 1986.

  • [Kamali2005] M. Kamali, L. Petre, K. Sere and M. Daneshtalab,

Refinement-Based Modeling of 3D NoCs

  • [Lin2008] R. S. Ramanujam and B. Lin, Near-optimal oblivious

routing on threedimensional mesh networks

  • [Chao2010] C. Chao, Kai-Y., Traffic- and Thermal-Aware Run-

Time Thermal Management Scheme for 3D NoC Systems

The University of Aizu Adaptive systems lab 43

slide-63
SLIDE 63

Thank you For your attention

The University of Aizu Adaptive systems lab 44