Evaluation of Improved Scalability Comparison points Throughput - - PowerPoint PPT Presentation

▶

Jan 08, 2024 261 likes •619 views

Evaluation of Improved Scalability Comparison points Throughput (IPC/Node) Throughput (IPC/Node) Throughput (IPC/Node) Throughput (IPC/Node) Baseline bufferless : doesnt 1 1 1 1 scale 0.8 0.8 0.8 0.8 0.6 0.6

SLIDE 1

SLIDE 2

SLIDE 3

SLIDE 4

SLIDE 5

SLIDE 6

SLIDE 7

SLIDE 8

SLIDE 9

SLIDE 10

SLIDE 11

SLIDE 12

SLIDE 13

SLIDE 14

SLIDE 15

SLIDE 16

SLIDE 17

SLIDE 18

SLIDE 19

SLIDE 20

SLIDE 21

SLIDE 22

SLIDE 23

SLIDE 24

SLIDE 25

SLIDE 26

SLIDE 27

SLIDE 28

SLIDE 29

SLIDE 30

SLIDE 31

SLIDE 32

Comparison points…
Baseline bufferless: doesn’t

scale

Buffered: area/power

expensive

Contribution: keep area and

power benefits of bufferless, while achieving comparable performance

Application-aware throttling
Overall reduction in congestion
Power consumption reduced

through increase in net efficiency

Many other results in paper,

e.g.,

Fairness, starvation, latency…

Evaluation of Improved Scalability

32 0.2 0.4 0.6 0.8 1 16 64 256 1024 4096 Throughput (IPC/Node) Number of Cores Baseline Bufferless 0.2 0.4 0.6 0.8 1 16 64 256 1024 4096 Throughput (IPC/Node) Number of Cores Baseline Bufferless Buffered 0.2 0.4 0.6 0.8 1 16 64 256 1024 4096 Throughput (IPC/Node) Number of Cores Baseline Bufferless Buffered Throttling Bufferless 5 10 15 20 16 64 256 1024 4096 % Reduction in Power Consumption Number of Cores 0.2 0.4 0.6 0.8 1 16 64 256 1024 4096 Throughput (IPC/Node) Number of Cores

SLIDE 33

Summary of Study, Results, and Conclusions

Highlighted a traditional networking problem in a new context
Unique design requires novel solution
We showed congestion limited efficiency and scalability, and

that self-throttling nature of cores prevents collapse

Study showed congestion control would require app-

awareness

Our application-aware congestion controller provided:
A more efficient network-layer (reduced latency)
Improvements in system throughput (up to 27%)
Effectively scale the CMP (shown for up to 4096 cores)

SLIDE 34

Discussion

Congestion is just one of many similarities, discussion in

paper, e.g.,

Traffic Engineering: multi-threaded workloads w/

“hotspots”

Data Centers: similar topology, dynamic routing &

computation

Coding: “XOR’s In-The-Air” adapted to the on-chip

network:

i.e., instead of deflecting 1 of 2 packets, XOR the