ONoC-SPL: Customized Network-on-Chip (NoC) Architecture and - - PowerPoint PPT Presentation

onoc spl customized
SMART_READER_LITE
LIVE PREVIEW

ONoC-SPL: Customized Network-on-Chip (NoC) Architecture and - - PowerPoint PPT Presentation

iCAST 2012 Seoul, Korea July 21-24 2012 ONoC-SPL: Customized Network-on-Chip (NoC) Architecture and Prototyping for Data-intensive Computation Applications Akram Ben Ahmed, Kenichi Mori, Abderazek Ben Abdallah The University of Aizu School of


slide-1
SLIDE 1

ONoC-SPL: Customized Network-on-Chip (NoC) Architecture and Prototyping for Data-intensive Computation Applications

Akram Ben Ahmed, Kenichi Mori, Abderazek Ben Abdallah The University of Aizu School of Computer Science and Engineering, Adaptive Systems Laboratory, Aizu-Wakamatsu, Japan. Email:d8141104@u-aizu.ac.jp

iCAST 2012 Seoul, Korea July 21-24 2012

The University of Aizu Adaptive systems lab 1

slide-2
SLIDE 2

Outline

  • Background
  • ONoC-SPL architecture

– OASIS2-NoC overview – SPL Insertion Algorithm

  • Evaluation
  • Conclusion

The University of Aizu Adaptive systems lab 2

slide-3
SLIDE 3

Outline

  • Background
  • ONoC-SPL architecture

– OASIS2-NoC overview – SPL Insertion Algorithm

  • Evaluation
  • Conclusion

The University of Aizu Adaptive systems lab 3

slide-4
SLIDE 4

Background: Bus-based system Vs. NoC

The University of Aizu Adaptive systems lab 3 Bus based system Memory 1 I/O Memory 2

Core1 Core2 Core3

Data Data Data

Wait Wait Parallelism problem High latency

slide-5
SLIDE 5

The University of Aizu Adaptive systems lab 5

Input buffer Processing Element Router Network Interface Unidirectional link

Background: Bus-based system Vs. NoC

NoC based system [Carloni2009, Ben2006]

slide-6
SLIDE 6

The University of Aizu Adaptive systems lab 5

Background: Bus-based system Vs. NoC

NoC based system [Carloni2009, Ben2006]

slide-7
SLIDE 7

The University of Aizu Adaptive systems lab 5

Background: NoC Challenges

  • Routing [Sulivan1977, Seo2005]

Path selection has an impact on the system performance

slide-8
SLIDE 8

The University of Aizu Adaptive systems lab 5

Background: NoC Challenges

  • Routing [Sulivan1977, Seo2005]
  • Flow control [Agarwal2009, Pullini2005]

Efficient flow control is crucial

slide-9
SLIDE 9

The University of Aizu Adaptive systems lab 5

Background: Bus-based system Vs. NoC

  • Routing [Sulivan1977, Seo2005]
  • Flow control [Agarwal2009, Pullini2005]
  • Topology
  • Mesh [Zhang2011]
  • Uniform connection
  • Large hop count

The long distance affects the latency, throughput and power

slide-10
SLIDE 10

The University of Aizu Adaptive systems lab 5

Background: Bus-based system Vs. NoC

  • Routing [Sulivan1977, Seo2005]
  • Flow control [Agarwal2009, Pullini2005]
  • Topology
  • Mesh [Zhang2011]
  • Torus [Dally1986]
  • Connects the network

extremities to reduce the inter-node distance

  • Increasing complexity
  • Different wire lengths
  • Clock skew
slide-11
SLIDE 11

The University of Aizu Adaptive systems lab 5

Background: Bus-based system Vs. NoC

  • Routing [Sulivan1977, Seo2005]
  • Flow control [Agarwal2009, Pullini2005]
  • Topology
  • Mesh [Zhang2011]
  • Torus [Dally1986]
  • Customized [Bolotin2004]
  • Especially designed for

specific application

  • Long design time
  • Difficult to implement
slide-12
SLIDE 12

The University of Aizu Adaptive systems lab 5

[*] K. Mori, A. Esch, A. Ben Abdallah, K., Kuroda, ”Advanced Design Issue for OASIS Network-on-Chip Architecture”, IEEE, International Conference on BWCCA, pp.74-79, 2010.

OASIS2-NoC 4x4 network system [*]

  • 4x4 Mesh topology
  • Wormhole-like

switching

  • Stall-and-Go flow

control

  • 20 bits flit

Background: OASIS2-NoC

slide-13
SLIDE 13

The University of Aizu Adaptive systems lab 5

Background: Motivation

  • In OASIS2-NoC, PEs are connected uniformly

and it suffers from large hop count between any (source, destination) pair

– Significantly degrades the overall performance especially for Data intensive applications

  • Using synthetic traffic in High-level simulation

do not reveal the real system performance

– Not enough to evaluate the NoC router’s parameters (flow control, Buffer size and routing) effects and trade-offs – Not accurate hardware and performance evaluation

slide-14
SLIDE 14

The University of Aizu Adaptive systems lab 5

Background: Contributions

  • Proposal of an optimized version of OASIS-2,

named ONoC-SPL, customized with a Short- Pass-Link (SPL)

– To reduce the communication latency for long range and high frequency communication

  • Prototyping ONoC-SPL on FPGA with synthetic

and real applications

– To evaluate accurate Power consumption, Area utilization and Performance

slide-15
SLIDE 15

Outline

  • Background
  • ONoC-SPL architecture

– OASIS2-NoC architecture – SPL Insertion Algorithm

  • Evaluation
  • Conclusion

The University of Aizu Adaptive systems lab 15

slide-16
SLIDE 16

Outline

  • Background
  • ONoC-SPL architecture

– OASIS2-NoC architecture – SPL Insertion Algorithm

  • Evaluation
  • Conclusion

The University of Aizu Adaptive systems lab 16

slide-17
SLIDE 17

OASIS2-NoC: Router architecture

The University of Aizu Adaptive systems lab 17

BW RC/SA CT

slide-18
SLIDE 18

OASIS2-NoC: Router architecture

The University of Aizu Adaptive systems lab 18

Input module Input data enter to these modules

  • Input buffer (BW)
  • Look-Ahead-XY routing (RC)
slide-19
SLIDE 19

OASIS2-NoC: Router architecture

The University of Aizu Adaptive systems lab 19

Arbiter and flow control

  • Arbiter: Handles the

arbitration between the different input port request (SA)

  • Stall/Go: Includes the

flow control module

slide-20
SLIDE 20

OASIS2-NoC: Router architecture

The University of Aizu Adaptive systems lab 20

Crossbar Handles the transfer of flits to their appropriate channels depending on the information received from the arbiter (CT)

slide-21
SLIDE 21

The University of Aizu Adaptive systems lab 21

Arbitration mechanism Flow control mechanism

When the priority i > j, P(i,j) becomes 1 and P(j, i) become 0

(a) (b)

highest highest

Matrix arbiter Avoiding buffer overflow method is Stall/Go

OASIS2-NoC: Arbitration & flow control

slide-22
SLIDE 22

Outline

  • Background
  • ONoC-SPL architecture

– OASIS2-NoC architecture – Short-Pass-Link (SPL) Customization

  • Evaluation
  • Conclusion

The University of Aizu Adaptive systems lab 22

slide-23
SLIDE 23

Short-Pass-Link (SPL) Customization

The University of Aizu Adaptive systems lab 23

SPL

  • ONoC-SPL employs mesh topology with Short Pass Link(SPL)

– To reduce the latency caused by the high number of of hops

slide-24
SLIDE 24

SPL insertion process: Algorithm

The University of Aizu Adaptive systems lab 24 The number of SPL decision Insert commu. selection Simulation and Insertion

slide-25
SLIDE 25

SPL insertion process: Example

Dimension reversal with SPL Hotspot with SPL

The University of Aizu Adaptive systems lab 25

2 SPL inserted 2 SPL inserted

(3,0) -> (0,3): 0.125 (0,3) -> (3,0): 0.125

Communication frequency Distance

(3,0) -> (0,3): 6 (0,3) -> (3,0): 6

  • (3,0) -> (0,3)
  • (0,3) -> (3,0)

(0,3) -> (1,0): 0.294 (3,3) -> (1,1): 0.235 (2,0) -> (2,3): 0.235

Communication frequency Distance

(0,3) -> (1,0): 4 (3,3) -> (1,1): 4

  • (0,3) -> (1,0)
  • (3,3) -> (1,1)
slide-26
SLIDE 26

Outline

  • Background
  • ONoC-SPL architecture

– OASIS2-NoC overview – SPL Insertion Algorithm

  • Evaluation
  • Conclusion

The University of Aizu Adaptive systems lab 26

slide-27
SLIDE 27
  • Design Tools

– Language: Verilog-HDL – Software: Quartus II 11.0 – Simulation tool: ModelSim- Altera 6.6 – Device: Stratix III FPGA board

  • Target applications

– Dimension Reversal – Hotspot – JPEG encoder

The University of Aizu Adaptive systems lab 27

Execution time Hardware complexity

Behavior Model RTL code Synthesis FPGA

Dimen. Hotspot JPEG

Hardware compile

partitioning Network size info. Verilog- HDL NoC parameter Quartus II Stratix III RGB bitstream

24'b001101100101001101101110; 24'b001101110101010001101111; 24'b010001110110010001111111; 24'b010110100111011110010010; 24'b011001011000000010011011; 24'b011010001000001110011110; 24'b011001000111101110010101; 24'b010101100110110010000101; 24'b001110010101011001110001; 24'b010000000101110101111000;

Evaluation: Evaluation methodology

slide-28
SLIDE 28

The University of Aizu Adaptive systems lab 28

Evaluation: Simulation Configuration

slide-29
SLIDE 29

The University of Aizu Adaptive systems lab 29

  • Extra area less than 5%
  • 6.5% speed reduction
  • Slight 1% power overhead

Evaluation: Hardware complexity

slide-30
SLIDE 30

5 10 15 20 25 30 OASIS ONoC-SPL1 ONoC-SPL2 ONoC-SPL3 Dimension Reversal (μs) Hotspot(μs) JPEG time (x10^1 ms)

+7.3

  • 29.7
  • 16.9
  • 16.1

+11.3

  • 31.0
  • 43.7

Execution time

The University of Aizu 30 Adaptive systems lab

ONoC-SPL execution time decreased by 30.1 % on average

Evaluation: Performance (Execution time)

slide-31
SLIDE 31

The University of Aizu Adaptive systems lab 31

+0.01 +49.6 +24.8 + 24.8 +11.3 + 22.6 0.0

ONoC-SPL throughput enhanced 32.3 % on average

Throughput (flits/cycle)

Evaluation: Performance (Throughput)

slide-32
SLIDE 32

Outline

  • Background
  • ONoC-SPL architecture

– OASIS2-NoC overview – SPL Insertion Algorithm

  • Evaluation
  • Conclusion

The University of Aizu Adaptive systems lab 32

slide-33
SLIDE 33

Conclusion

  • Proposal of an optimized version of 2D-NoC

named ONoC-SPL

  • SPL insertion algorithm is proposed to reduce

the high frequency communication latency

  • Prototyping on FPGA for accurate performance

and hardware complexity evaluation using synthetic traffic and real workload

The University of Aizu Adaptive systems lab 33

slide-34
SLIDE 34

Conclusion

  • The execution time has decreased with 30.1%

and the throughput has enhanced by 32.3% in average when comparing the proposed system with previous systems

  • Performance gain was obtained with an extra

hardware under 5% observing a slight 0.49% power consumption overhead in average

The University of Aizu Adaptive systems lab 34

slide-35
SLIDE 35

Current system limitations

  • Investigate about fault tolerance.
  • Look-Ahead-XY routing
  • Exploit the benefits of the customization for

more performance architecture.

  • 3D-Network-on-Chip

The University of Aizu

Adaptive systems lab 35

slide-36
SLIDE 36

References

[Carloni2009] L. P. Carloni, P. Pande, and Y. Xie, "Networks-on-chip in emerging interconnect paradigms: Advantages and challenges", In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chip, pages 93–102, May 2009. [Ben2006] A. Ben Abdallah, M. Sowa, "Basic Network-on-Chip Interconnection for Future Gigascale MCSoCs Applications: Communication and Computation Orthogonalization", Proceedings of The TJASSST2006 Symposium on Science, DEC. 2006. [Sullivan1977] H. Sullivan, T. R. Bashkow, “A Large Scale, Homogeneous, Fully Distributed Parallel Machine”, in Annual Symposium on Computer Architecture, ACM Press, pp. 105-117, March 1977. [Seo2005] D. Seo, A. Ali, W.-T. Lim, N. Rafique, M. Thottethodi, “Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks”, in International Symposium on Computer Architecture, pp. 432-443, June 2005. [Agarwal2009] A. Agarwal, C. Iskander, R. Shankar, "Survey of Network on Chip (NoC) architectures and contributions", Journal of Engineering, Computing and Architecture 3 (1), 2009. [Pullini2005] A. Pullini, F. Angiolini, D. Bertozzi, and L. Benini, ”Fault tolerance overhead in network-on-chip flow control schemes”, In Proceedings of 18th Annu. Symp. Integr. Circuits and Syst. Des. (SBCCI), 2005, pp. 224-229. [Zhang2011] Y. Zhang, N. Wu, F. Ge., "Novel Test Structures for 2D-Mesh NoC with Evaluation on the Coverage- driven \& VMM-based Testbench", Proceedings of The World Congress on Engineering and Computer Science, pp. 797-801, Oct. 2011. [Dally1986] W.J. Dally and C.L. Seitz, “The Torus Routing Chip,” Technical Report 5208:TR: 86, Computer ScienceDept., California Inst. of Technology, pp. 1-19, 1986. [Bolotin2004] E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, “QNoC: QoS architecture and design process for network on chip”, Journal of Systems Architecture, Vol: 50-2-3, pp. 105-128, Feb 2004 The University of Aizu Adaptive systems lab 36

slide-37
SLIDE 37

Thank you For your attention

The University of Aizu Adaptive systems lab 37