Thermal-Effective Clustered Thermal-Effective Clustered - - PowerPoint PPT Presentation

thermal effective clustered thermal effective clustered
SMART_READER_LITE
LIVE PREVIEW

Thermal-Effective Clustered Thermal-Effective Clustered - - PowerPoint PPT Presentation

Thermal-Effective Clustered Thermal-Effective Clustered Microarchitectures Microarchitectures P. Chaparro Chaparro, J. , J. Gonz Gonz lez lez and A. and A. Gonz Gonz lez lez P. Intel Labs - - UPC UPC Intel Labs 1


slide-1
SLIDE 1

1

Thermal-Effective Clustered Microarchitectures Thermal-Effective Clustered Microarchitectures

P.

  • P. Chaparro

Chaparro, J. , J. Gonz Gonzá ález lez and A. and A. Gonz Gonzá ález lez Intel Labs Intel Labs -

  • UPC

UPC

slide-2
SLIDE 2

2

Motivation Motivation

  • Removing heat is expensive

Removing heat is expensive

  • Design point is set for worst case temperatures

Design point is set for worst case temperatures

  • Expensive thermal solution guarantees peak

Expensive thermal solution guarantees peak performance performance

  • Usually temperatures are lower

Usually temperatures are lower

  • A localized hotspot may

A localized hotspot may… …

  • trigger global emergency mechanisms: But it could be

trigger global emergency mechanisms: But it could be avoided by focusing only on that hotspot avoided by focusing only on that hotspot

  • not be detected: Sensors covering wider areas

not be detected: Sensors covering wider areas

  • Clustered architectures give new opportunities for

Clustered architectures give new opportunities for temperature reduction temperature reduction

  • Peak temperature 33%

Peak temperature 33%

  • Average temperature 12%

Average temperature 12%

slide-3
SLIDE 3

3

Overview Overview

  • Introduction

Introduction

  • Processor Architecture

Processor Architecture

  • Simulation Infrastructure

Simulation Infrastructure

  • Thermal Analysis of Clustered

Thermal Analysis of Clustered Architectures Architectures

  • Cluster Hopping

Cluster Hopping

  • Conclusions

Conclusions

slide-4
SLIDE 4

4

Introduction Introduction

  • Clustering opens new opportunities for

Clustering opens new opportunities for temperature reduction temperature reduction

  • Distribution of resources

Distribution of resources

  • Activity distribution

Activity distribution

  • Hopping schemes

Hopping schemes

  • Layout flexibility

Layout flexibility

  • Trade off unit location vs. wire delay

Trade off unit location vs. wire delay

  • Resource grouping into clusters

Resource grouping into clusters

  • Voltage and clock domains

Voltage and clock domains

  • Leakage control

Leakage control

  • V

Vdd

dd gating

gating

slide-5
SLIDE 5

5

Processor Architecture Processor Architecture

  • Large

Large frontend frontend

  • 32Kuop trace cache

32Kuop trace cache

  • dispatch 8

dispatch 8 uops uops/cycle /cycle

  • 2MB L2 cache

2MB L2 cache

  • Highly OOO

Highly OOO

  • 80

80-

  • entry issue queue

entry issue queue

  • 384

384-

  • entry MOB

entry MOB

  • 4

4 int int + 3 + 3 fp fp + 4 ld/ + 4 ld/st st

  • 544+544 physical

544+544 physical regs regs

  • 64KB, 2

64KB, 2-

  • way L1

way L1 Memory Bus

slide-6
SLIDE 6

6

Processor Architecture Processor Architecture

ROB ROB ROB UL2 UL2 UL2 RAT RAT RAT TC TC TC DECO DECO DECO BP BP BP ITLB ITLB ITLB FPS FPS IS IS MS/MOB MS/MOB FPRF FPRF IRF IRF FPFU FPFU IFU IFU DL1 DL1 DTLB DTLB

slide-7
SLIDE 7

7

Processor Architecture Processor Architecture

Point to Point Link Memory Bus Disambiguation Bus ...

slide-8
SLIDE 8

8

Processor Architecture Processor Architecture

ROB ROB ROB UL2 UL2 UL2 RAT RAT RAT TC TC TC DECO DECO DECO BP BP BP ITLB ITLB ITLB FPS FPS CS CS IS IS MS/MOB MS/MOB FPRF FPRF IRF IRF FPFU FPFU IFU IFU DL1 DL1 DTLB DTLB Cluster 0 Cluster 0 Cluster 0 Cluster 1 Cluster 1 Cluster 1

Bicluster Each cluster has half the resources of the

  • riginal monolithic

backend

slide-9
SLIDE 9

9

Quadcluster Each cluster has a quarter of the resources

  • f the
  • riginal

monolithic backend

Processor Architecture Processor Architecture

ROB ROB ROB UL2 UL2 UL2 RAT RAT RAT TC TC TC DECO DECO DECO BP BP BP ITLB ITLB ITLB Cluster 0 Cluster 0 Cluster 0 Cluster 3 Cluster 3 Cluster 3 Cluster 2 Cluster 2 Cluster 2 Cluster 1 Cluster 1 Cluster 1 FPS FPS CS CS IS IS MS/MOB MS/MOB FPRF FPRF IRF IRF FPFU FPFU IFU IFU DL1 DL1 DTLB DTLB

slide-10
SLIDE 10

10

Simulation Infrastructure Simulation Infrastructure

  • Computes dynamically

Computes dynamically the temperature of the temperature of selected functional selected functional blocks (emulates blocks (emulates thermal sensors) thermal sensors)

  • Integrated in a

Integrated in a microarchitectural microarchitectural simulator simulator

Performance model Performance Performance model model Dynamic power model Dynamic Dynamic power power model model Temperature model Temperature Temperature model model Leakage model Leakage Leakage model model

slide-11
SLIDE 11

11

Simulation Infrastructure Simulation Infrastructure

Die Die Die Heat spreader Heat Heat spreader spreader Heat sink Heat sink Heat sink Ambient Ambient Ambient R-C pairs R R-

  • C pairs

C pairs T2 T2 T1 T1 R1-2 R1-2 C1-2 C1-2 P2 P2

Time constant Time constant τ τ = R = R · · C C ( (s s) ) Capacity Capacity ( (J / K J / K) ) Capacity Capacity ( (J / V = F J / V = F) ) Resistance Resistance ( (K / W K / W) ) Resistance Resistance ( (V / A = V / A = Ω Ω) ) Power Power ( (W W) ) Current Current ( (A A) ) Temperature Temperature ( (K K) ) Voltage Voltage ( (V V) ) Thermal Thermal Electrical Electrical

slide-12
SLIDE 12

12

Thermal Analysis of Clustered Architectures Thermal Analysis of Clustered Architectures

  • Temperature metrics

Temperature metrics

  • AbsMax

AbsMax

  • Maximum sensed temperature

Maximum sensed temperature

  • Average

Average

  • Average temperature of the chip area over time

Average temperature of the chip area over time

  • AverageMax

AverageMax

  • Average temperature over time of the maximum

Average temperature over time of the maximum sensed temperature sensed temperature

slide-13
SLIDE 13

13

Thermal Analysis of Clustered Architectures Thermal Analysis of Clustered Architectures

  • 40%
  • 30%
  • 20%
  • 10%

0% 10% 20% 30% 40% AbsMax Average AverageMax IPC degradation AbsMax Average AverageMax IPC degradation 2 Clusters 4 Clusters Reduction Backends UL2 Frontend Processor

Average temperature reduction for 16 SPEC

slide-14
SLIDE 14

14

Cluster Hopping Cluster Hopping

  • Based on activity migration [

Based on activity migration [Heo Heo, ISLPED , ISLPED 03] 03]

  • V

Vdd

dd gate a subset of clusters

gate a subset of clusters

  • Rotate clusters to spread activity along time

Rotate clusters to spread activity along time

  • Gated clusters cannot provide any register

Gated clusters cannot provide any register value value

  • Before gating cluster must be emptied

Before gating cluster must be emptied

  • Cache/DTLB contents are lost

Cache/DTLB contents are lost

  • Proactive and/or reactive behavior

Proactive and/or reactive behavior

  • Proactive: Per interval basis

Proactive: Per interval basis

  • Reactive: On thermal events

Reactive: On thermal events

slide-15
SLIDE 15

15

Cluster Hopping Cluster Hopping

HOP-3 HOP-2

slide-16
SLIDE 16

16

Cluster Hopping Cluster Hopping

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% AbsMax Average AverageMax IPC degradation AbsMax Average AverageMax Slowdown Hop-3 Hop-2 Recuction Backends UL2 Frontend Processor

slide-17
SLIDE 17

17

Conclusions Conclusions

  • The analyzed bi

The analyzed bi-

  • cluster architecture is increasing

cluster architecture is increasing temperature: Clustering must be applied smartly temperature: Clustering must be applied smartly

  • The quad

The quad-

  • cluster architecture analyzed is effective

cluster architecture analyzed is effective reducing temperature: reducing temperature:

  • Reduces processor peak temperature 33%

Reduces processor peak temperature 33%

  • Reduces 12% average temperature

Reduces 12% average temperature

  • IPC penalty of 14%

IPC penalty of 14%

  • Ignored other benefits of clustering for this study

Ignored other benefits of clustering for this study

  • Improving the quad

Improving the quad-

  • cluster architecture with a

cluster architecture with a hopping scheme ( hopping scheme (HOP HOP-

  • 3

3): ):

  • Peak temperature is reduced 37%

Peak temperature is reduced 37%

  • Average temperature of the processor 14%

Average temperature of the processor 14%

  • Extra penalty of 3%

Extra penalty of 3%