The Simulation of the Dynamic Link Allocation Router (DyLAR) Wei - - PowerPoint PPT Presentation

the simulation of the dynamic link allocation router dylar
SMART_READER_LITE
LIVE PREVIEW

The Simulation of the Dynamic Link Allocation Router (DyLAR) Wei - - PowerPoint PPT Presentation

The Simulation of the Dynamic Link Allocation Router (DyLAR) Wei Song Advanced Processor Technology Group 2014/5/13 The School of Computer Science Overview A brief review of the Dynamic Link Allocation flow control method The new


slide-1
SLIDE 1

2014/5/13 Advanced Processor Technology Group The School of Computer Science

The Simulation of the Dynamic Link Allocation Router (DyLAR)

Wei Song

slide-2
SLIDE 2

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Overview

  • A brief review of the Dynamic Link

Allocation flow control method

  • The new simulation platform
  • Some simple performance analyses
  • An alternative method of the task request

procedure

  • Future schedule
slide-3
SLIDE 3

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Serial is better than Parallel

C C C C C C C C C C C C C C C C C C C C C C

DI00 DI01 DI10 DI11 DI20 DI21 DI30 DI31 DO00 DO01 DO10 DO11 DO20 DO21 DO30 DO31 ACKI ACKO

Dual-rail 0 Dual-rail 1 Dual-rail 2 Dual-rail 3

slide-4
SLIDE 4

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Bandwidth efficiency is less than 50%

Master Slave Time Request to reserve a path OK ACK Data transmissions Data transmissions Data transmissions Data transmissions Data transmissions Data transmissions False ACK Data transmissions (end) Request to reserve a path False Ack

slide-5
SLIDE 5

2014/5/13 Advanced Processor Technology Group The School of Computer Science

The high Loss Rate

Simulation results of a 6x6 NoC.

100 200 300 400 2000 4000 6000 8000 10000 12000 14000 16000

Average Frame Latency (ns) Frame Injection Rate (kfps)

100 200 300 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Loss Rate Frame Injection Rate (kfps)

Flit Level Loss Rate Frame Level Retry rate

slide-6
SLIDE 6

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Some hypotheses of DyLAR

  • Asynchronous circuits prefer serial rather than parallel

channels

  • Connection oriented communications only have a

bandwidth efficiency less than 50%

  • The high retry rate of connection oriented

communication is reducible by add virtual channels

  • The input buffer could be smaller than flit size when

using serial channels

slide-7
SLIDE 7

2014/5/13 Advanced Processor Technology Group The School of Computer Science

The DyLAR Router

Tran Control Tran Control Tran Control

Arbiter

Output Buffer Output Buffer Output Buffer

Data Switch Request Switch

Input Buffer Input Buffer Input Buffer

DyLAR Router

Credit(0,0) Credit(0,1) Credit(0,2) Sub-link(0,0) Sub-link(0,1) Sub-link(0,2) Credit(1,0) Credit(1,1) Credit(1,2) Sub-link(1,0) Sub-link(1,1) Sub-link(1,2) Credit(2,0) Credit(2,1) Credit(2,2) Sub-link(2,0) Sub-link(2,1) Sub-link(2,2) Sub-link(0,0) Sub-link(0,1) Sub-link(0,2) Sub-link(1,0) Sub-link(1,1) Sub-link(1,2) Sub-link(2,0) Sub-link(2,1) Sub-link(2,2) Credit(2,0) Credit(2,1) Credit(2,2) Credit(1,0) Credit(1,1) Credit(1,2) Credit(0,0) Credit(0,1) Credit(0,2)

slide-8
SLIDE 8

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Flit Formats

flit type flit header X Y 8 bit 8 bit

data flit type flit header 4 bit 128 bit

_ *2 4 REQ NUM      

slide-9
SLIDE 9

2014/5/13 Advanced Processor Technology Group The School of Computer Science

The Flow Control Procedures

Tran Control Tran Control Tran Control

Arbiter 1 2 3 4

Tran Control Tran Control Tran Control

Arbiter

slide-10
SLIDE 10

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Overview

  • A brief review of the Dynamic Link

Allocation flow control method

  • The new simulation platform
  • Some simple performance analyses
  • An alternative method of the task request

procedure

  • Future schedule
slide-11
SLIDE 11

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Basic information

– Mesh topology – Only send XY frames – Parameter reconfigurable – Latency is set according to 1-of-4 CHAIN link – SystemC 2.2.0 – GNU g++ – Makefile – Batch simulation and automatic result analysis (accepted traffic, latency, loss rate)

slide-12
SLIDE 12

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Configurable parameters

– Dimension (>1) – Injected traffic (kfps) (>0) – Channel number (>0) – Request number (>0) – Random seed (0 random seed, others seeds) – Random delay – Simulation time – VCD file (generate waveform and debug logs)

slide-13
SLIDE 13

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Current Problems

  • The router design

– Multiple request lines sharing one channel will generate deadlocks

  • (still under debugging and modificating)
  • The simulation model

– Slow (possible > 20 min under 4x4 cases) – Memory consuming (possible > 2G under some 4x4 cases)

Simulation environment: ADM 2.4GHz 64-bit 4G memory

slide-14
SLIDE 14

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Deadlock Avoidance 1

!

slide-15
SLIDE 15

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Deadlock Avoidance 2

slide-16
SLIDE 16

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Deadlock Recovery 1

Tran Control Tran Control Tran Control

Arbiter

Output Buffer Output Buffer Output Buffer

Data Switch Request Switch

Input Buffer Input Buffer Input Buffer

DyLAR Router

slide-17
SLIDE 17

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Deadlock Recovery 2

Tran Control Tran Control Tran Control

Arbiter

Output Buffer Output Buffer Output Buffer

Data Switch Request Switch

Input Buffer Input Buffer Input Buffer

DyLAR Router

slide-18
SLIDE 18

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Overview

  • A brief review of the Dynamic Link

Allocation flow control method

  • The new simulation platform
  • Some simple performance analyses
  • An alternative method of the task request

procedure

  • Future schedule
slide-19
SLIDE 19

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Simulation parameters

  • Dimension 4x4
  • Channel 1~3
  • Request line 1~8
  • Frame injection rate 20~500 kfps
  • Random delay and random uniform traffic

pattern

slide-20
SLIDE 20

2014/5/13 Advanced Processor Technology Group The School of Computer Science

1 channel with multiple requests

20 40 60 80 100 120 140 160 180 200 220 240 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320

Accepted Traffic (MByte/s) Injection Rate (kfps)

1 req 2 req 4 req 6 req

20 40 60 80 100 120 140

10000 20000 30000 40000 50000 60000 70000

average Latency (ns) Injection Rate (kfps)

1 req 2 req 4 req 6 req

slide-21
SLIDE 21

2014/5/13 Advanced Processor Technology Group The School of Computer Science

1 channel with multiple requests

20 40 60 80 100 120 140 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Retry rate Injection Rate (kfps)

1 req 2 req 4 req 6 req

slide-22
SLIDE 22

2014/5/13 Advanced Processor Technology Group The School of Computer Science

1 request with multiple channels

20 40 60 80 100 120 140 160 180 200 220 50 100 150 200 250 300

Accepted Traffic (MByte/s) Injection Rate (kfps)

1C1R 2C1R 3C1R

20 40 60 80 100 120 140 10000 20000 30000 40000 50000

average Latency (ns) Injection Rate (kfps)

1C1R 2C1R 3C1R

slide-23
SLIDE 23

2014/5/13 Advanced Processor Technology Group The School of Computer Science

1 request with multiple channels

50 100 2000 4000 6000

average Latency (ns) Injection Rate (kfps)

1C1R 2C1R 3C1R

slide-24
SLIDE 24

2014/5/13 Advanced Processor Technology Group The School of Computer Science

2 channels with multi-requests

50 100 150 200 250 300 350 100 200 300 400 500 600 700 800

Accepted Traffic (MByte/s) Injection Rate (kfps)

C1R1 C2R1 C2R2 C2R4 C2R6 C2R8 50 100 150 200 250 300 350 10000 20000 30000 40000 50000

Average Latency (ns) Injection Rate (kfps)

C1R1 C2R1 C2R2 C2R4 C2R6 C2R8

slide-25
SLIDE 25

2014/5/13 Advanced Processor Technology Group The School of Computer Science

3 channels with multi-requests

100 200 300 400 500 200 400 600 800 1000 1200

Accepted Traffic (MByte/s) Injection Rate (kfps)

3C1R 3C2R 3C4R 3C6R 3C8R

100 200 300 400 500

10000 20000 30000 40000 50000 60000 70000 80000

Average Latency (ns) Injection Rate (kfps)

3C1R 3C2R 3C4R 3C6R 3C8R

slide-26
SLIDE 26

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Throughput

Unit: MByte/s

186 266 300 300 300 265 512 710 >710 >710 300 650 >1000 >1000 >1000 1 channel 2 channel 3 channel 1 request 2 request 4 request 6 request 8 request

slide-27
SLIDE 27

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Overview

  • A brief review of the Dynamic Link

Allocation flow control method

  • The new simulation platform
  • Some simple performance analyses
  • An alternative method of the task request

procedure

  • Future schedule
slide-28
SLIDE 28

2014/5/13 Advanced Processor Technology Group The School of Computer Science

The Original Task Request Procedure

M S S S

T R F ( 3 ) TRF(2) TRF(1) TRF(0) VRF/TAF VRF/TAF TAF T A F

TRF task request flit VRF volunteer request flit TAF task acknowledge flit

slide-29
SLIDE 29

2014/5/13 Advanced Processor Technology Group The School of Computer Science

The alternative method

M S S S

T R F ( 3 ) TRF(3) TRF(2) TRF(1)

M S S S

VRF/TAF VRF/TAF TAF

slide-30
SLIDE 30

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Comparison of the two methods

  • The original TRF

– Need counters to calcuate life_time – Remember state for every TRF – Special communication with NA – Wait for the whole flit – One request line per TRF

  • The alternative

– Move counters to NA – States will be recorded by NA and only 1 state machine is enough – Directly send flit to NA – Send directly after the flit_type field – Two request lines per TRF

slide-31
SLIDE 31

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Overview

  • A brief review of the Dynamic Link

Allocation flow control method

  • The new simulation platform
  • Some simple performance analyses
  • An alternative method of the task request

procedure

  • Future schedule
slide-32
SLIDE 32

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Schedule

  • The simulation model is still under

debugging

  • Build the hardware model according to the

SystemC model

  • Try to speed up the simulation model and

reduce the memory required

slide-33
SLIDE 33

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Thank you!

Questions?