Design of Adaptive Communication Design of Adaptive Communication - - PowerPoint PPT Presentation

design of adaptive communication design of adaptive
SMART_READER_LITE
LIVE PREVIEW

Design of Adaptive Communication Design of Adaptive Communication - - PowerPoint PPT Presentation

Design of Adaptive Communication Design of Adaptive Communication Channel Buffers for Low- -Power Area Power Area- - Channel Buffers for Low Efficient Network- -on on- -Chip Architecture Chip Architecture Efficient Network Avinash Kodi


slide-1
SLIDE 1

Design of Adaptive Communication Design of Adaptive Communication Channel Buffers for Low Channel Buffers for Low-

  • Power Area

Power Area-

  • Efficient Network

Efficient Network-

  • on
  • n-
  • Chip Architecture

Chip Architecture

ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS’07) Dec 3-4, 2007 Avinash Kodi†, Ashwini Sarathy* and Ahmed Louri*

†Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701 *Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85719

E-mail: kodi@ohio.edu, sarathya@ece.arizona.edu, louri@ece.arizona.edu

Sponsored: National Science Foundation (NSF) grant ECCS-0725765 (at the High Performance Computing Architectures and Technologies Lab, University of Arizona, Tucson)

slide-2
SLIDE 2

Talk Outline Talk Outline

  • Motivation & Introduction
  • iDEAL – Inter-router Dual-function Energy and

Area-efficient Links for NoC architectures

– Link and Router Architecture

  • Performance Evaluation

– Power & Area estimation for the Links & Routers – Simulation results for Throughput, Latency & Overall network power

  • Conclusions

2

slide-3
SLIDE 3

3

Motivation Motivation

(0,0) (1,0) NOC Router Processing Elements (Processors, DSPs, Peripheral Controllers, Memory Subsystems) (2,0) (3,0) (0,1) (1,1) (2,1) (3,1) (0,2) (1,2) (2,2) (3,2) (0,3) (1,3) (2,3) (3,3) Channels

  • r Links
  • Increasing wire delay with

decreasing feature size

  • Scalable, modular interconnect –

Network Network-

  • on
  • n-
  • Chip (

Chip (NoC NoC) ) System-on-Chip (SoC) paradigm System System-

  • on
  • n-
  • Chip (

Chip (SoC SoC) paradigm ) paradigm

Processor Cores SRAM/Flash & Memory Controllers USB / Ethernet controllers UART / GPIO

slide-4
SLIDE 4

4

Motivation Motivation

  • 1. Reference : J.D.Owens, W.J.Dally, R.Ho, D.N.Jayasimha, S.W.Keckler and L.S.Peh, “Research Challenges for On-Chip

Interconnection Networks”, IEEE Micro, vol. 27, no. 5, pp. 96 – 108, September-October 2007.

Recent NSF-sponsored workshop on On- Chip Interconnection Networks1 :

  • “The most important technology constraint for
  • n-chip networks is power consumption”.
  • Power consumption of OCINs implemented with

current techniques – exceeds expected needs by a factor of 10. Recent NSF-sponsored workshop on On- Chip Interconnection Networks1 :

  • “The most important technology constraint for
  • n-chip networks is power consumption”.
  • Power consumption of OCINs implemented with

current techniques – exceeds expected needs by a factor of 10.

+ x

Crossbar Switch

Processing Element (PE)

  • x

+ y

  • y

Route Computation (RC) Virtual Channel (VC) Switch Allocator (SA) Input Buffers

Generic NoC Router

Power Break-up in the NoC Router Buffers, 46% Clock Buffer, 16% Arbiter, 3% Crossbar, 35%

slide-5
SLIDE 5

5

iDEAL iDEAL – – I Inter nter-

  • router

router D Dual ual-

  • function

function E Energy and nergy and A Area rea-

  • efficient

efficient L Links for inks for NoC NoC architectures architectures

iDEAL Methodology (circuit and architectural techniques)

  • Reduce the number of router buffers
  • To prevent performance degradation, use adaptive channel buffers to store

data along the links when required

  • Dynamic buffer allocation within the router buffers

iDEAL Methodology (circuit and architectural techniques)

  • Reduce the number of router buffers
  • To prevent performance degradation, use adaptive channel buffers to store

data along the links when required

  • Dynamic buffer allocation within the router buffers

+ x

Crossbar Switch

Processing Element (PE)

  • x

+ y

  • y

Route Computation (RC) Virtual Channel (VC) Switch Allocator (SA) Input Buffers

Generic NoC architecture iDEAL architecture

Crossbar Switch

Processing Element (PE)

  • x

+ y

  • y

Route Computation (RC) Virtual Channel (VC) Switch Allocator (SA) Input Buffers

Adaptive channel buffers along the link Reduced router buffer size

slide-6
SLIDE 6

6

Input Port

  • f

Router B Output Port of Router A

Conventional Links Conventional Links

slide-7
SLIDE 7

7

Input Port

  • f

Router B Output Port of Router A

iDEAL iDEAL – – Channel Buffer Design Channel Buffer Design (1/2)

(1/2)

Control block Control block

Congestion

slide-8
SLIDE 8

8

Control block

iDEAL iDEAL – – Channel Buffer Channel Buffer Design Design (2/2)

(2/2)

Functions as a conventional repeater when there is no congestion. Control block is turned ‘OFF’. Control block Repeater tri-stated and holds the sampled value, during congestion. Control block is turned ‘ON’.

slide-9
SLIDE 9

iDEAL iDEAL – – Control Block Control Block

9

  • Power efficient
  • Stable at varying frequencies
  • Power efficient
  • Stable at varying frequencies

O/P Port Router A I/P Port Router A

CLK1 CLK2 CLK1 CLK2

Congestion signal CLK

slide-10
SLIDE 10

iDEAL iDEAL : : Dual Dual-

  • function Link

function Link

10

3 2 1 Congestion Signal Cycle 1 Data-In Cycle 3 Data-In Cycle 2 Congestion Signal Congestion Release Data-Out 3 2 1 Data-In 3 2 1

slide-11
SLIDE 11

11

Input Port

  • f

Router B Output Port of Router A Control block Control block

Congestion

Link Link -

  • Power & Area Estimation

Power & Area Estimation

Psegment(repeater)

(Dynamic, leakage, short-circuit)

Psegment(chl-buffer)

(leakage, control block)

Pcontrol-blk

(inverters, clock, switched-cap.)

CLK1 CLK2

CLK

Congestion

slide-12
SLIDE 12

iDEAL iDEAL – – Router Buffer Design Router Buffer Design

12 v Flit 1 Flit r VC State Table Flit 1 Flit r DEMUX MUX vc 1 vc v VCID VC CR OVC OP WP RP Status Congestion Control C* Credit Return VC State Table Input Port P

  • Static buffer allocation
  • Fixed number of buffers per

VC

  • HoL blocking

RP = read pointer, WP = write pointer, OP = output port, OVC = output VC, CR = credits, C* = congestion Status = status of the VC (idle, waiting, RC, VA, SA, ST) RP = read pointer, WP = write pointer, OP = output port, OVC = output VC, CR = credits, C* = congestion Status = status of the VC (idle, waiting, RC, VA, SA, ST)

slide-13
SLIDE 13

iDEAL iDEAL – – Router Buffer Design Router Buffer Design

13

RP = read pointer, WP = write pointer, OP = output port, OVC = output VC, CR = credits, C* = congestion Status = status of the VC (idle, waiting, RC, VA, SA, ST)

Input Port P Flit 1 Flit r Flit (v-1) r + 1 Flit z DEMUX MUX Flit r+1 Flit 2r

Write Pointer Read Pointer Credit Return Output Flit Tracking

Unified VC State Table Buffer Slot Availability Congestion Control

Buffer Slot Free 1 2 z Y N N Input Flit Tracking VC 1 … v CR OVC OP WP F0 F1 F(z+c)/v RP N N … 3 N 5 … 6 … N … N N 5 N … N N … 6 3 Status … … … …

  • Dynamic buffer allocation
  • Approximately (z + c)/v buffers

per VC (z = router buffers, c = channel buffers, v = # of VCs)

slide-14
SLIDE 14

iDEAL iDEAL – – Router Buffer Design Router Buffer Design

14

RP = read pointer, WP = write pointer, OP = output port, OVC = output VC, CR = credits, C* = congestion Status = status of the VC (idle, waiting, RC, VA, SA, ST)

  • Example illustrating Dynamic buffer allocation in iDEAL

Buffer Slot Free 1 5 N N N 2 3 4 6 7 N Y N N N VC 1 2 3 CR OVC OP WP F0 F1 F4 RP 2 1 3 5 N N N N 7 N N N N 1 3 Status F3 N N N N 4 N 2 N 5 1 N N 2 4 4 4 4 SA VC Idle SA Unified VC State Table Buffer Slot Availability

Congestion Control Write Pointer Read Pointer

Output Flit Tracking Input Flit Tracking N

Incoming flit (VCID = 1)

6 ST Y N

slide-15
SLIDE 15

15

Router Router -

  • Power & Area Estimation

Power & Area Estimation

Processing Element (PE)

Route Computation (RC) Virtual Channel (VC) Switch Allocator (SA) Input Buffers

Buffer Power (Pwrite + Pread) Crossbar Power (Switch + Arbiter)

Crossbar Switch

Sense Amp Bitlines Wordlines 6T SRAM cell

  • Power reduces on decreasing the

buffer size

slide-16
SLIDE 16

Performance Evaluation Performance Evaluation

  • Evaluated on a cycle-accurate on-chip network simulator
  • Simulated 8 x 8 Mesh and 8 x 8 Folded Torus topologies
  • Synthetic benchmarks such as uniform, and non-uniform workloads

(Butterfly, Complement, Perfect Shuffle, Matrix Transpose, Bit Reversal) were evaluated

  • Parameters evaluated include throughput, latency and overall network

power

  • Considered 5 different configurations – (vnV – rnR – cnC)

(nV = No. of VCs per input port, nR = No. of router buffers per VC, nC = number of channel buffers) – Baseline = 440 – 434, 428, 344, 531

16

slide-17
SLIDE 17

17 vnV – rnR - cnC Buffer Power (mW) Mesh Link + Control Power (mW) % Change Folded Torus Link + Control Power (mW) 2.020 2.032 + 0 v4-r3-c4 1.646

  • 18.51

2.164 + 0.0122 + 7.0 4.195 + 0.0122 + 3.4 v4-r2-c8 1.272

  • 37.02

2.296 + 0.0205 + 13.9 4.327 + 0.0205 + 6.8 v3-r3-c7 1.365

  • 32.41

2.263 + 0.0184 + 12.2 4.294 + 0.0184 + 6.0 v5-r2-c6 1.459

  • 27.76

2.230 + 0.0164 + 10.5 4.261 + 0.0164 + 5.1 2.164 + 0.0122 2.065 + 0.0059 1.646 4.068 + 0

  • + 7.0

4.195 + 0.0122 + 1.8 4.096 + 0.0059 1.926 % Change % Change v4-r4-c0

  • v3-r4-c4
  • 18.51

+ 3.4 v5-r3-c1

  • 4.65

+ 0.8

Power Estimation Power Estimation -

  • Summary

Summary

nV = number of VCs per input port nR = number of router buffers per VC nC = number of channel buffers

v4-r2-c8 1.272 -37.02 2.296+0.0205 +13.9 4.437+0.0205 +6.8 v4-r2-c8 1.272 -37.02 2.296+0.0205 +13.9 4.437+0.0205 +6.8

slide-18
SLIDE 18

18

  • Uniformly distributed traffic

⇒ Nearly 40% power savings for 50% buffer size reduction

(428), using Dynamic buffer allocation

(428 = 4 VCs per port, 2 router buffers per VC, 8 channel buffers)

  • Uniformly distributed traffic

⇒ Nearly 40% power savings for 50% buffer size reduction

(428), using Dynamic buffer allocation

(428 = 4 VCs per port, 2 router buffers per VC, 8 channel buffers)

Buffer Power Buffer Power – – 8x8 Mesh and Folded Torus 8x8 Mesh and Folded Torus

Buffer Power (8x8 Mesh) UN - Dynamic

0.2 0.4 0.6 0.8 1 v4-r4-c0 v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c1

Configuration Power (watts) Buffer Power (8x8 Folded Torus) UN - Dynamic

0.2 0.4 0.6 0.8 1 v4-r4-c0 v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c1

Configuration Power (watts)

slide-19
SLIDE 19

19

  • Uniformly distributed traffic

⇒ Only about 5% drop in throughput for the 428 case (Dynamic

buffer allocation)

(428 = 4 VCs per port, 2 router buffers per VC, 8 channel buffers)

  • Uniformly distributed traffic

⇒ Only about 5% drop in throughput for the 428 case (Dynamic

buffer allocation)

(428 = 4 VCs per port, 2 router buffers per VC, 8 channel buffers)

Throughput Throughput – – 8x8 Mesh and Folded Torus 8x8 Mesh and Folded Torus

Throughput (8x8 Mesh) UN - Dynamic

10 20 30 40 50 60 0.2 0.4 0.6 0.8 1

Offered Load (as a fraction of network capacity) Throughput (GBps)

v4-r4-c0 v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c1

Throughput (8x8 Folded Torus) UN - Dynamic

10 20 30 40 50 60 70 0.2 0.4 0.6 0.8 1

Offered Load (as a fraction of network capacity) Throughput (GBps)

v4-r4-c0 v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c1

slide-20
SLIDE 20

20

  • Total power consumed for a network load of 0.5

⇒ Nearly 20% savings for the 428, using Dynamic buffer

allocation

(428 = 4 VCs per port, 2 router buffers per VC, 8 channel buffers)

  • Total power consumed for a network load of 0.5

⇒ Nearly 20% savings for the 428, using Dynamic buffer

allocation

(428 = 4 VCs per port, 2 router buffers per VC, 8 channel buffers)

Overall Network Power Overall Network Power – – 8x8 Mesh and Folded 8x8 Mesh and Folded Torus Torus

Total Power (8x8 Mesh) UN - Dynamic

0.5 1 1.5 2 2.5 v4-r4-c0 v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c1

Configuration Power (watts)

Congestion Power Link Power Crossbar Power Buffer Power

Total Power (8x8 Folded Torus) UN - Dynamic

0.5 1 1.5 2 2.5 3 v4-r4-c0 v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c1

Configuration Power (watts)

Congestion Power Link Power Crossbar Power Buffer Power

slide-21
SLIDE 21

21

  • Reduction in power for all configurations, under all traffic patterns, compared

to the baseline (440)

  • For example, under Complement traffic the 428 configuration achieves 45%

savings under Static allocation and 37.5% savings under Dynamic allocation

  • Reduction in power for all configurations, under all traffic patterns, compared

to the baseline (440)

  • For example, under Complement traffic the 428 configuration achieves 45%

savings under Static allocation and 37.5% savings under Dynamic allocation

0.2 0.4 0.6 0.8 1 S - U N S - CO S - TO S - PS S - B R S - M T S - N E S - B U

Power (watts)

v4-r4-c0 v4-r3-c4 v4-r2-c8

D - UN D - CO D - TO D - PS D - BR D - MT D - NE D - BU

P tt

Buffer Power (8x8 Mesh) at an offered load = 0.5 Traffic Pattern

Buffer Power Buffer Power – – 8x8 Mesh 8x8 Mesh – – all Traffic Patterns all Traffic Patterns

slide-22
SLIDE 22

22

  • No significant decrease in throughput under any traffic pattern, using Dynamic

allocation

  • No significant decrease in throughput under any traffic pattern, using Dynamic

allocation

Throughput (8x8 Mesh) at an offered load = 0.5 Traffic Pattern

Throughput Throughput – – 8x8 Mesh 8x8 Mesh – – all Traffic Patterns all Traffic Patterns

10 20 30 40 50 60 70 S - UN S - CO S - TO S - PS S - BR S - M T S - N E S - BU

Throughput (GBps)

v4-r4-c0 v4-r3-c4 vr-r2-c8 D

  • U

N D

  • CO

D

  • TO

D

  • P

S D

  • B

R D

  • M

T D

  • N

E D

  • B

U

slide-23
SLIDE 23

Conclusion Conclusion

  • iDEAL

iDEAL architecture provides a Low-Power Area-efficient solution for NoCs, by reducing power consumption through circuit-level and architecture-level techniques.

  • Simulation results show that by reducing the buffer size in half, a

40 40-

  • 52% savings in power

52% savings in power is achieved, with a significant reduction in router area. There is only a marginal 1-5% drop in performance, under dynamic buffer allocation.

  • Future work will involve (a) Simulation using real-application traces

(b) Exploring architectural improvements such as aggressive speculation in the credit loop

23

slide-24
SLIDE 24

Backup Slides Backup Slides

24

slide-25
SLIDE 25

vnV – rnR - cnC Buffer Area (μm2) Total Buffer + Link Area (μm2) % Change 81,407 81,439 v4-r3-c4 63,991 52 64,011

  • 21.40

v4-r2-c8 48,066 80 48,146

  • 40.88

v3-r3-c7 50,373 73 50,446

  • 38.05

v5-r2-c6 53,712 66 53,778

  • 33.96

63,302 73,803 63,250

  • 22.27
  • 9.37

73,797 Link Repeater Area (μm2) v4-r4-c0 32 v3-r4-c4 52 v5-r3-c1 38

Area Estimation Area Estimation – – Summary Summary with values from Synopsys Design Compiler with values from Synopsys Design Compiler

nV = number of VCs per input port, nR = number of router buffers per VC, nC = number of channel buffers

v4-r2-c8 48,066 80 48,146 -40.88 v4-r2-c8 48,066 80 48,146 -40.88

slide-26
SLIDE 26

26

Latency Latency – – 8x8 Mesh and Folded Torus 8x8 Mesh and Folded Torus

Average Latency (8x8 Mesh) - UN - Dynamic

0.2 0.4 0.6 0.8 1 1.2 1.4 0.1 0.2 0.3 0.4 0.5

Offered Load (as a fraction of network capacity) Average Latency (microsec)

v4-r4-c0 v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c1

Average Latency (8x8 Folded Torus) UN - Dynamic

0.2 0.4 0.6 0.8 1 1.2 1.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Offered Load (as a fraction of network capacity) Average Latency (microsec)

v4-r4-c0 v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c1

  • Uniformly distributed traffic

⇒ For all cases (except 531), saturation for a network load of about 0.3 in case of Mesh and about 0.4 in case of Folded torus

  • Uniformly distributed traffic

⇒ For all cases (except 531), saturation for a network load of about 0.3 in case of Mesh and about 0.4 in case of Folded torus

slide-27
SLIDE 27

Comparison with FC Comparison with FC-

  • CB and DAMQ

CB and DAMQ

27

  • FC-CB shows similar performance as the

dynamically allocated 440 case

  • 434 and 428 achieve nearly 4% increase in

saturation throughput compared to FC-CB

  • 428 achieves nearly 12.5% improvement in

saturation throughput compared to DAMQ

  • FC-CB shows similar performance as the

dynamically allocated 440 case

  • 434 and 428 achieve nearly 4% increase in

saturation throughput compared to FC-CB

  • 428 achieves nearly 12.5% improvement in

saturation throughput compared to DAMQ

Comparison of Saturation Throughput (8x8 Mesh) - Uniform Traffic

38 40 42 44 46 48 50 v4-r3-c4 v4-r2-c8 FC-CB DAMQ

Configuration Throughput (in GBps)

Comparison of Average Latency (8x8 Mesh) - Uniform Traffic

0.5 1 1.5 2 2.5 3 0.1 0.2 0.3 0.4 0.5

Offered Traffic (as a fraction of network capacity) Average Latency (in microsec) v4-r3-c4 v4-r2-c8 FC-CB DAMQ

slide-28
SLIDE 28

Power calculations using Power calculations using Synopsys Power Compiler Synopsys Power Compiler

28

  • 428 case shows nearly 40% reduction in

buffer power alone

  • Nearly 30% decrease in overall network

power for the 428 case

  • 428 case shows nearly 40% reduction in

buffer power alone

  • Nearly 30% decrease in overall network

power for the 428 case

Total Power (8x8 Mesh) - Uniform Traffic

1 2 3 4 5 6 7 8 9 10 v4-r4-c0 v5-r3-c1 v3-r4-c4 v4-r3-c4 v4-r2-c8

Configuration Power (Watts)

Control Link Switch Arbiter Buffer

Buffer Power (8x8 Mesh) - Uniform Traffic

1 2 3 4 5 6 7 8 v4-r4-c0 v5-r3-c1 v3-r4-c4 v4-r3-c4 v4-r2-c8

Configuration Power (Watts)

Leakage Power Dynamic Power

slide-29
SLIDE 29

Data flow Control Simulated with Synopsys VCS

29

5 10 15 20 25 30 35 40 Data_out2 from stage 2 Data_out3 from stage 3 Data_out4 from stage 4 Time (ns) Congestion at stage 2 Congestion at stage 3 Congestion at stage 4 Data_out1 from stage 1 500 MHz Clock Signal Data_in Congestion input Congestion at stage1

slide-30
SLIDE 30

Router Router -

  • Power Estimation

Power Estimation

30

Component Power / Area Calculation Explanation

Cbuf (1/2 x W x L x Cox) + (W x Lov x Cox)

Cbuf = additional capacitance due to three-state repeater along the links W, L = Width & Length of min. sized inverter Cox = oxide capacitance Lov = gate-drain/source overlap length

Ṕdynamic a x [k(Co + Cp + Cbuf) + ℓCw] x VDD

2 x freq

a = activity factor, k = repeater sizing, ℓ = repeater spacing Co = diffusion capacitance Cp = gate capacitance Cw = wire capacitance VDD = supply voltage freq = operating frequency

Ṕleakage 2 x [1/2 x VDD x (Ioff(Wn + Wp)k)]

Ioff = subthreshold leakage current Wn (Wp) = width of the NMOS (PMOS) in the repeater

Ṕshort-ckt a x trise x Wn x k x VDD x Isc x freq

trise = rise time of the short-ckt current Isc

slide-31
SLIDE 31
  • Self-checking Double-sampling technique for the Control block
  • Slightly more power (0.02 uW v/s 0.06 uW) and area, but more reliable

iDEAL iDEAL – – Control Block Control Block

31

Output Port of Router A Input Port of Router B

Clock Double-sampling the congestion input Congestion Delay Buffer

1

XOR Error MUX

1

XOR D Flip-Flop Error MUX Clock D Flip-Flop

slide-32
SLIDE 32

Aggressive Speculation Aggressive Speculation

32

  • Aggressive speculation by increasing the

number of credits available to 8

  • Additional credits are accounted for by the

channel buffers ⇒ Saturation throughput improves by 10% for the 428 case

  • Aggressive speculation by increasing the

number of credits available to 8

  • Additional credits are accounted for by the

channel buffers ⇒ Saturation throughput improves by 10% for the 428 case

Saturation Throughput (8x8 Folded Torus) - Uniform Traffic

53 54 55 56 57 58 59 60 61 v4-r4-c0 v4-r3-c4 v4-r2-c8 v3-r4-c4 v3-r3-c7

Configuration Throughput (in GBps)

Average Latency (8x8 Folded Torus) - Uniform Traffic

0.5 1 1.5 2 2.5 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Offered Traffic (as a fraction of network capacity) Average Latency (in microsec)

v4-r4-c0 v4-r3-c4 v4-r2-c8 v3-r4-c4 v3-r3-c7

Total Power (8x8 Folded Torus) - Uniform Traffic

1 2 3 4 5 6 7 8 9 10 v4-r4-c0 v3-r4-c4 v4-r3-c4 v3-r3-c7 v4-r2-c8

Configuration Power (Watts)

Control Link Switch Arbiter Buffer

slide-33
SLIDE 33

vnV – rnR - cnC Buffer Power (mW) Total Power (Buffer + Link) (mW) % Change 19.54 21.99 v4-r3-c4 14.51 2.91 17.42

  • 20.78

v4-r2-c8 11.57 3.57 15.14

  • 31.15

v3-r3-c7 12.56 3.50 16.06

  • 26.96

v5-r2-c6 14.41 3.31 17.72

  • 19.41

18.00 22.10 15.09

  • 18.14

+ 0.50 19.29 Mesh Link + Control Power (mW) v4-r4-c0 2.45 v3-r4-c4 2.91 v5-r3-c1 2.81

Power Estimation Power Estimation – – Summary Summary with values from Synopsys Power Compiler with values from Synopsys Power Compiler

nV = number of VCs per input port, nR = number of router buffers per VC, nC = number of channel buffers

v4-r2-c8 11.57 3.57 15.14 -31.15 v4-r2-c8 11.57 3.57 15.14 -31.15