2014/5/13 Advanced Processor Technology Group The School of Computer Science
The Simulation of the Dynamic Link Allocation Router (DyLAR) Wei - - PowerPoint PPT Presentation
The Simulation of the Dynamic Link Allocation Router (DyLAR) Wei - - PowerPoint PPT Presentation
The Simulation of the Dynamic Link Allocation Router (DyLAR) Wei Song Advanced Processor Technology Group 2014/5/13 The School of Computer Science Overview A brief review of the Dynamic Link Allocation flow control method The new
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Overview
- A brief review of the Dynamic Link
Allocation flow control method
- The new simulation platform
- Some simple performance analyses
- An alternative method of the task request
procedure
- Future schedule
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Serial is better than Parallel
C C C C C C C C C C C C C C C C C C C C C C
DI00 DI01 DI10 DI11 DI20 DI21 DI30 DI31 DO00 DO01 DO10 DO11 DO20 DO21 DO30 DO31 ACKI ACKO
Dual-rail 0 Dual-rail 1 Dual-rail 2 Dual-rail 3
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Bandwidth efficiency is less than 50%
Master Slave Time Request to reserve a path OK ACK Data transmissions Data transmissions Data transmissions Data transmissions Data transmissions Data transmissions False ACK Data transmissions (end) Request to reserve a path False Ack
2014/5/13 Advanced Processor Technology Group The School of Computer Science
The high Loss Rate
Simulation results of a 6x6 NoC.
100 200 300 400 2000 4000 6000 8000 10000 12000 14000 16000
Average Frame Latency (ns) Frame Injection Rate (kfps)
100 200 300 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Loss Rate Frame Injection Rate (kfps)
Flit Level Loss Rate Frame Level Retry rate
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Some hypotheses of DyLAR
- Asynchronous circuits prefer serial rather than parallel
channels
- Connection oriented communications only have a
bandwidth efficiency less than 50%
- The high retry rate of connection oriented
communication is reducible by add virtual channels
- The input buffer could be smaller than flit size when
using serial channels
2014/5/13 Advanced Processor Technology Group The School of Computer Science
The DyLAR Router
Tran Control Tran Control Tran Control
Arbiter
Output Buffer Output Buffer Output Buffer
Data Switch Request Switch
Input Buffer Input Buffer Input Buffer
DyLAR Router
Credit(0,0) Credit(0,1) Credit(0,2) Sub-link(0,0) Sub-link(0,1) Sub-link(0,2) Credit(1,0) Credit(1,1) Credit(1,2) Sub-link(1,0) Sub-link(1,1) Sub-link(1,2) Credit(2,0) Credit(2,1) Credit(2,2) Sub-link(2,0) Sub-link(2,1) Sub-link(2,2) Sub-link(0,0) Sub-link(0,1) Sub-link(0,2) Sub-link(1,0) Sub-link(1,1) Sub-link(1,2) Sub-link(2,0) Sub-link(2,1) Sub-link(2,2) Credit(2,0) Credit(2,1) Credit(2,2) Credit(1,0) Credit(1,1) Credit(1,2) Credit(0,0) Credit(0,1) Credit(0,2)
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Flit Formats
flit type flit header X Y 8 bit 8 bit
data flit type flit header 4 bit 128 bit
_ *2 4 REQ NUM
2014/5/13 Advanced Processor Technology Group The School of Computer Science
The Flow Control Procedures
Tran Control Tran Control Tran Control
Arbiter 1 2 3 4
Tran Control Tran Control Tran Control
Arbiter
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Overview
- A brief review of the Dynamic Link
Allocation flow control method
- The new simulation platform
- Some simple performance analyses
- An alternative method of the task request
procedure
- Future schedule
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Basic information
– Mesh topology – Only send XY frames – Parameter reconfigurable – Latency is set according to 1-of-4 CHAIN link – SystemC 2.2.0 – GNU g++ – Makefile – Batch simulation and automatic result analysis (accepted traffic, latency, loss rate)
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Configurable parameters
– Dimension (>1) – Injected traffic (kfps) (>0) – Channel number (>0) – Request number (>0) – Random seed (0 random seed, others seeds) – Random delay – Simulation time – VCD file (generate waveform and debug logs)
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Current Problems
- The router design
– Multiple request lines sharing one channel will generate deadlocks
- (still under debugging and modificating)
- The simulation model
– Slow (possible > 20 min under 4x4 cases) – Memory consuming (possible > 2G under some 4x4 cases)
Simulation environment: ADM 2.4GHz 64-bit 4G memory
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Deadlock Avoidance 1
!
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Deadlock Avoidance 2
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Deadlock Recovery 1
Tran Control Tran Control Tran Control
Arbiter
Output Buffer Output Buffer Output Buffer
Data Switch Request Switch
Input Buffer Input Buffer Input Buffer
DyLAR Router
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Deadlock Recovery 2
Tran Control Tran Control Tran Control
Arbiter
Output Buffer Output Buffer Output Buffer
Data Switch Request Switch
Input Buffer Input Buffer Input Buffer
DyLAR Router
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Overview
- A brief review of the Dynamic Link
Allocation flow control method
- The new simulation platform
- Some simple performance analyses
- An alternative method of the task request
procedure
- Future schedule
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Simulation parameters
- Dimension 4x4
- Channel 1~3
- Request line 1~8
- Frame injection rate 20~500 kfps
- Random delay and random uniform traffic
pattern
2014/5/13 Advanced Processor Technology Group The School of Computer Science
1 channel with multiple requests
20 40 60 80 100 120 140 160 180 200 220 240 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320
Accepted Traffic (MByte/s) Injection Rate (kfps)
1 req 2 req 4 req 6 req
20 40 60 80 100 120 140
10000 20000 30000 40000 50000 60000 70000
average Latency (ns) Injection Rate (kfps)
1 req 2 req 4 req 6 req
2014/5/13 Advanced Processor Technology Group The School of Computer Science
1 channel with multiple requests
20 40 60 80 100 120 140 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Retry rate Injection Rate (kfps)
1 req 2 req 4 req 6 req
2014/5/13 Advanced Processor Technology Group The School of Computer Science
1 request with multiple channels
20 40 60 80 100 120 140 160 180 200 220 50 100 150 200 250 300
Accepted Traffic (MByte/s) Injection Rate (kfps)
1C1R 2C1R 3C1R
20 40 60 80 100 120 140 10000 20000 30000 40000 50000
average Latency (ns) Injection Rate (kfps)
1C1R 2C1R 3C1R
2014/5/13 Advanced Processor Technology Group The School of Computer Science
1 request with multiple channels
50 100 2000 4000 6000
average Latency (ns) Injection Rate (kfps)
1C1R 2C1R 3C1R
2014/5/13 Advanced Processor Technology Group The School of Computer Science
2 channels with multi-requests
50 100 150 200 250 300 350 100 200 300 400 500 600 700 800
Accepted Traffic (MByte/s) Injection Rate (kfps)
C1R1 C2R1 C2R2 C2R4 C2R6 C2R8 50 100 150 200 250 300 350 10000 20000 30000 40000 50000
Average Latency (ns) Injection Rate (kfps)
C1R1 C2R1 C2R2 C2R4 C2R6 C2R8
2014/5/13 Advanced Processor Technology Group The School of Computer Science
3 channels with multi-requests
100 200 300 400 500 200 400 600 800 1000 1200
Accepted Traffic (MByte/s) Injection Rate (kfps)
3C1R 3C2R 3C4R 3C6R 3C8R
100 200 300 400 500
10000 20000 30000 40000 50000 60000 70000 80000
Average Latency (ns) Injection Rate (kfps)
3C1R 3C2R 3C4R 3C6R 3C8R
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Throughput
Unit: MByte/s
186 266 300 300 300 265 512 710 >710 >710 300 650 >1000 >1000 >1000 1 channel 2 channel 3 channel 1 request 2 request 4 request 6 request 8 request
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Overview
- A brief review of the Dynamic Link
Allocation flow control method
- The new simulation platform
- Some simple performance analyses
- An alternative method of the task request
procedure
- Future schedule
2014/5/13 Advanced Processor Technology Group The School of Computer Science
The Original Task Request Procedure
M S S S
T R F ( 3 ) TRF(2) TRF(1) TRF(0) VRF/TAF VRF/TAF TAF T A F
TRF task request flit VRF volunteer request flit TAF task acknowledge flit
2014/5/13 Advanced Processor Technology Group The School of Computer Science
The alternative method
M S S S
T R F ( 3 ) TRF(3) TRF(2) TRF(1)
M S S S
VRF/TAF VRF/TAF TAF
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Comparison of the two methods
- The original TRF
– Need counters to calcuate life_time – Remember state for every TRF – Special communication with NA – Wait for the whole flit – One request line per TRF
- The alternative
– Move counters to NA – States will be recorded by NA and only 1 state machine is enough – Directly send flit to NA – Send directly after the flit_type field – Two request lines per TRF
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Overview
- A brief review of the Dynamic Link
Allocation flow control method
- The new simulation platform
- Some simple performance analyses
- An alternative method of the task request
procedure
- Future schedule
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Schedule
- The simulation model is still under
debugging
- Build the hardware model according to the
SystemC model
- Try to speed up the simulation model and
reduce the memory required
2014/5/13 Advanced Processor Technology Group The School of Computer Science