Evaluation of Improved Scalability Comparison points Throughput - - PowerPoint PPT Presentation

evaluation of improved scalability
SMART_READER_LITE
LIVE PREVIEW

Evaluation of Improved Scalability Comparison points Throughput - - PowerPoint PPT Presentation

Evaluation of Improved Scalability Comparison points Throughput (IPC/Node) Throughput (IPC/Node) Throughput (IPC/Node) Throughput (IPC/Node) Baseline bufferless : doesnt 1 1 1 1 scale 0.8 0.8 0.8 0.8 0.6 0.6


slide-1
SLIDE 1
slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
  • Comparison points…
  • Baseline bufferless: doesn’t

scale

  • Buffered: area/power

expensive

  • Contribution: keep area and

power benefits of bufferless, while achieving comparable performance

  • Application-aware throttling
  • Overall reduction in congestion
  • Power consumption reduced

through increase in net efficiency

  • Many other results in paper,

e.g.,

Fairness, starvation, latency…

Evaluation of Improved Scalability

32 0.2 0.4 0.6 0.8 1 16 64 256 1024 4096 Throughput (IPC/Node) Number of Cores Baseline Bufferless 0.2 0.4 0.6 0.8 1 16 64 256 1024 4096 Throughput (IPC/Node) Number of Cores Baseline Bufferless Buffered 0.2 0.4 0.6 0.8 1 16 64 256 1024 4096 Throughput (IPC/Node) Number of Cores Baseline Bufferless Buffered Throttling Bufferless 5 10 15 20 16 64 256 1024 4096 % Reduction in Power Consumption Number of Cores 0.2 0.4 0.6 0.8 1 16 64 256 1024 4096 Throughput (IPC/Node) Number of Cores

slide-33
SLIDE 33

33

Summary of Study, Results, and Conclusions

  • Highlighted a traditional networking problem in a new context
  • Unique design requires novel solution
  • We showed congestion limited efficiency and scalability, and

that self-throttling nature of cores prevents collapse

  • Study showed congestion control would require app-

awareness

  • Our application-aware congestion controller provided:
  • A more efficient network-layer (reduced latency)
  • Improvements in system throughput (up to 27%)
  • Effectively scale the CMP (shown for up to 4096 cores)
slide-34
SLIDE 34

Discussion

  • Congestion is just one of many similarities, discussion in

paper, e.g.,

  • Traffic Engineering: multi-threaded workloads w/

“hotspots”

  • Data Centers: similar topology, dynamic routing &

computation

  • Coding: “XOR’s In-The-Air” adapted to the on-chip

network:

  • i.e., instead of deflecting 1 of 2 packets, XOR the

packets and forward the combination over the optimal hop

34