Architecture Kenichi Mori, Adam Esch, Abderazek Ben Abdallah, - - PowerPoint PPT Presentation

architecture
SMART_READER_LITE
LIVE PREVIEW

Architecture Kenichi Mori, Adam Esch, Abderazek Ben Abdallah, - - PowerPoint PPT Presentation

Fifth International Conference on Broadband and Wireless Computing, Communication and Applications, Nov.4, 2010 Advanced Design Issues for OASIS Network-on-Chip Architecture Kenichi Mori, Adam Esch, Abderazek Ben Abdallah, Kenichi Kuroda The


slide-1
SLIDE 1

Advanced Design Issues for OASIS Network-on-Chip Architecture

Kenichi Mori, Adam Esch, Abderazek Ben Abdallah, Kenichi Kuroda

The University of Aizu, Japan

2010/11/4 BWCCA 2010 1

Fifth International Conference on Broadband and Wireless Computing, Communication and Applications, Nov.4, 2010

slide-2
SLIDE 2

Contents

  • Background
  • Original OASIS NoC

– Architecture – Drawback

  • Our contribution
  • Proposal designed ONoC mechanism

– Stall-go control flow methodology – ONoC(Optimized NoC) Architecture

  • Simulation result
  • Summary

2010/11/4 BWCCA 2010 2

slide-3
SLIDE 3
  • Network-on-Chip can solve bus-based problem
  • Scalable architectural platform with huge

potential to handle growing complexity

  • Processing elements are connected via

a packet switched communication network

2010/11/4 BWCCA 2010 3

Background

P1 P3 P2 P4 P6 P5

s s s s s s

P1 P3 P2 P4 P6 P5

Bus-based system Network-on-Chip system

P: Processing element S: Switch

slide-4
SLIDE 4

OASIS NoC: Network

2010/11/4 BWCCA 2010 4

  • Original OASIS* NoC has

4x4 mesh network

  • Each router has
  • ne processing element

* A. Ben Abdallah, M.Sowa, Basic Network-on-Chip Interconnection for Future Gigascale MCSoCs Applications: Communication and Computation Orthogonalization, JASSST2006, Dec. 4-9th, 2006.

OASIS whole network

slide-5
SLIDE 5

OASIS NoC: Routing

2010/11/4 BWCCA 2010 5

  • Routing algorithm is

static XY routing

  • Switching method is

worm hole

OASIS whole network Routing information source destination Flit structure

slide-6
SLIDE 6

drop_in

Local Input_port

sw_alloc

crossbar

76

South Input_port North Input_port West Input_port East Input_port

5

port_req[24:0]

drop_out

cntrl[24:0]

5 5 5 5 5 16 16 16 16 16

data_in_L[379:0] data_in_S[379:0] data_in_N[379:0] data_in_W[379:0] data_in_E[379:0] data_out_L[379:0] data_out_S[379:0] data_out_W[379:0] data_out_N[379:0] data_out_E[379:0]

80 76 76 76 76 76 76 76 76 76

tail

data_out_L[0] data_out_S[0] data_out_N[0] data_out_W[0] data_out_E[0]

OASIS NoC: Router design

2010/11/4 BWCCA 2010 6

One router has three pipeline stages

First stage: They have buffering and routing mechanisms Second stage: It has scheduling and flow control mechanism Third stage: It sends flits each adequate next port

slide-7
SLIDE 7

OASIS NoC drawback

2010/11/4 BWCCA 2010 7

  • Original OASIS NoC has an overhead problem

– Large number of dropped flits in congestion communication

Router PE Router PE Dropped flits

Large overhead

Node should send again full

Congestion

slide-8
SLIDE 8

Our contribution

2010/11/4 BWCCA 2010 8

  • Optimized NoC(ONoC) can overcome the

OASIS overhead problem

  • To avoid dropped flits, an efficient stall-

go (ESG) algorithm is proposed

slide-9
SLIDE 9

Contents

  • Background
  • Original OASIS NoC

– Architecture – Drawback

  • Our contribution
  • Proposal designed ONoC mechanism

– Stall-go control flow methodology – ONoC(Optimized NoC) Architecture

  • Simulation result
  • Summary

2010/11/4 BWCCA 2010 9

slide-10
SLIDE 10

Efficient stall-go (ESG) algorithm

2010/11/4 BWCCA 2010 10

Out = 0 Nearly_full = 1 Data_sent = 0 Nearly_full = 1 Data_sent = 1 Out = 0 Out = 1 Nearly_full = 0 Data_sent = 1 Out = 0 Nearly_full = 1 Data_sent = 0 Out = 1 Nearly_full = 0 Data_sent = 0

Sent Stop Go

Mealy machine for ESG algorithm

slide-11
SLIDE 11

ONoC: Architecture

2010/11/4 BWCCA 2010 11

ESG

Scheduler nearly full

stop

1 1 20 data_out 20 data_in nearly full 1 20 data_in nearly full 1 20 20 20 20 20 1 20 1 1

ESG

Scheduler

1 20 20 1 1 1

data_sent

data_out 1 data_sent nearly full 1 1 1 block grant

stop

1 1 block grant

  • ESG is implemented between input port and scheduler
  • ESG receives nearly full and data sent signal
  • If receiver FIFO will be full, stall go controls to

stop sending flits

slide-12
SLIDE 12

ONoC: Router design

2010/11/4 BWCCA 2010 12 data_sent[4:0] Nearly_full

Local Input_port

sw_alloc

crossbar

yaddr[2:0] 3 20 3

South Input_port

3 20 3

North Input_port

3 20 3

West Input_port

3 20 3

stop[4:0]

1 1 1 1 1

East Input_port

3 20 3 5

port_req[24:0] sw_req[4:0]

tail_sent[4:0] data_in[99:0]

cntrl[24:0]

xaddr[2:0] 1 1 1 1 1 20 20 20 20 20 5 5 5 5 5 1 1 1 1 1

data_in_L[19:0] data_in_S[19:0] data_in_N[19:0] data_in_W[19:0] data_in_E[19:0] data_out_L[19:0] data_out_S[19:0] data_out_W[19:0] data_out_N[19:0]

tail sent

data_out_L[0] data_out_S[0] data_out_N[0] data_out_W[0] data_out_E[0] data_out_E[19:0] data_out_L[5:1] data_out_S[5:1] data_out_N[5:1] data_out_W[5:1] data_out_E[5:1]

5

Nearly_full

ESG is implemented

slide-13
SLIDE 13

Efficient stall-go achievement

2010/11/4 BWCCA 2010 13

Router PE Router PE

full Just stop sending

Congestion

Flits are sent without overhead

slide-14
SLIDE 14

Contents

  • Background
  • Original OASIS NoC

– Architecture – Drawback

  • Our contribution
  • Proposal designed ONoC mechanism

– Stall-go control flow methodology – ONoC(Optimized NoC) Architecture

  • Simulation result
  • Summary

2010/11/4 BWCCA 2010 14

slide-15
SLIDE 15

Simulation parameters

2010/11/4 BWCCA 2010 15

ONoC parameters configurations Network size 3x3-mesh Buffer depth 4, 8, 16 and 32 Flit size 20 bit (Header: 12 bit Payload: 8 bit) Forwarding Wormhole switching Scheduling Round-robin Flow control Stall-go Routing static X-Y routing Target application JPEG codec Target device Altera Stratix III Input data size 120,015 bytes(ratio 200x200)

slide-16
SLIDE 16

50000 100000 150000 200000 250000 4 8 16 32 OASIS cycles ONoC cycles

2010/11/4 BWCCA 2010 16

Buffer depth cycles

ONoC total communication time is less than OASIS in small buffer depth

ONoC communication time analysis

slide-17
SLIDE 17

ONoC complexity analysis

Buffer size Architecture Area (ALUTs) Power (mW) Speed (MHz) 4 ONoC 5,485(5%) 649.17 185.87 OASIS 5,282(5%) 649.03 207.90 8 ONoC 8,269(7%) 660.02 186.60 OASIS 7,890(7%) 659.31 195.05 16 ONoC 10,538(9%) 682.80 161.26 OASIS 10,279(9%) 681.63 177.43 32 ONoC 17,416 (15%) 716.87 153.96 OASIS 16,569 (15%) 716.02 172.38

2010/11/4 BWCCA 2010 17

4.38 % extra hardware

slide-18
SLIDE 18

Summary

  • This research presents optimization technique

and architecture of a Optimized NoC

  • ONoC achieves 14.18 % less communication

time than OASIS, and area is only 4.38 % larger than OASIS

2010/11/4 BWCCA 2010 18

On going work

  • Buffer borrowing algorithm
  • Short cut bus
slide-19
SLIDE 19

Thank you for listening

2010/11/4 BWCCA 2010 19