Automatic Management of TurboMode David Lo Christos Kozyrakis - - PowerPoint PPT Presentation

automatic management of turbomode
SMART_READER_LITE
LIVE PREVIEW

Automatic Management of TurboMode David Lo Christos Kozyrakis - - PowerPoint PPT Presentation

Automatic Management of TurboMode David Lo Christos Kozyrakis Stanford University http://mast.stanford.edu Executive Summary ! TurboMode overclocks cores to exhaust thermal budget ! An important performance feature of multi-core x86 servers !


slide-1
SLIDE 1

Automatic Management of TurboMode

David Lo Christos Kozyrakis Stanford University

http://mast.stanford.edu

slide-2
SLIDE 2

Executive Summary

! TurboMode overclocks cores to exhaust thermal budget

! An important performance feature of multi-core x86 servers

! Challenge: turbomode does not always benefit workloads

! Naively turning TurboMode on often leads to high energy waste

! Solution: predictive model to manage TurboMode (on/off)

! Using machine learning on performance counter data ! Eliminates negative cases, boosts ED and ED2 by 47% and 68%

HPCA-20 February 19, 2014

2

slide-3
SLIDE 3

What is TurboMode (TM)?

! Dynamic overclocking of cores to exhaust thermal budget

! Matches actual power consumption to max design TDP ! Big performance gains: up to 60% frequency boost ! Found on all modern x86 multi-cores

! TurboMode control

! Black-box HW control decides when and how much to overclock ! SW has limited control: can only turn TurboMode on/off

HPCA-20 February 19, 2014

3

slide-4
SLIDE 4

Characterizing TurboMode

! Evaluate the effects of TM across the board

! Efficiency metrics: EDP, ED2P, throughput/W, throughput/$, … ! Many hardware platforms: Intel/AMD, server/notebook ! Many workloads: SpecCPU, SpecPower, websearch, …

! Characterization

! Run with TurboMode on and TM off ! Compare impact on all of efficiency metrics

HPCA-20 February 19, 2014

4

slide-5
SLIDE 5

Efficiency Metrics

! Guidelines

! We all care about performance and energy consumption ! Capture both latency and throughput workloads

! Metric recap

! ED: latency & energy ! ED2: latency & energy, more weighted towards latency (think servers) ! Throughput/W: throughput & energy ! Throughput/$: throughput & cost efficiency (think datacenter TCO)

HPCA-20 February 19, 2014

5

slide-6
SLIDE 6

Evaluation Hardware

! Intel Sandy Bridge server [SBServer]: 19% max boost ! Intel Sandy Bridge mobile [SBMobile]: 44% max boost ! AMD Interlagos [ILServer]: 59% max boost ! Intel Ivy Bridge server [IBServer]: 12% max boost ! Intel Haswell server [Hserver]: 13% max boost

HPCA-20 February 19, 2014

6

slide-7
SLIDE 7

Evaluation Workloads

! Representative of multiple domains ! CPU, memory, and IO workloads ! Single-threaded SpecCPU benchmarks ! Multi-programmed SpecCPU mixes ! Multi-threaded PARSEC ! Enterprise SPECpower_ssj2008 ! Websearch

HPCA-20 February 19, 2014

7

>100 configs

slide-8
SLIDE 8

Observation: No Optimal On/Off Setting

HPCA-20 February 19, 2014

8 Mix$1 Mix$2 Websearch Wo Workload

Sandy$ Sandy$Bridg ridge$Mo Mobile bile

Mix$1 Mix$2 Websearch Wo Workload

In Interlag erlagos

  • s$S

$Ser erver er ED ED ED² ² QP QPS/W QP QPS/$

C50% C25% 0% 25% 50% 75% %$ %$imp mproveme ment$over$Tu TurboMode$off $off Wo Workload

Sandy$ Sandy$Bridg ridge$Se Server r

127%

slide-9
SLIDE 9

~50% mixes benefit from TM ~50% mixes suffer due to TM

Observation: TM leads to High Variance on Efficiency

HPCA-20 February 19, 2014

9

1 82

C30% C20% C10% 0% 10% 20% 30% ED²$ ²$imp mproveme ment Ap App$M $Mix x

Sandy$Bridge$Server$ED² ²

slide-10
SLIDE 10

Characterization Analysis

! TurboMode mostly benefits CPU bound workloads

! Boost in performance and efficiency from higher frequency ! SpecCPU mixes of CPU-intensive workloads, SpecPower, websearch, …

! TurboMode ineffective when memory/IO bound

! Interference on memory/IO really aggravates this ! Small/no performance gain, high energy waste with higher frequency ! SpecCPU mixes of memory-intensive workloads, canneal, streamcluster, …

! Applications have multiple phases

! CPU bound vs. memory/IO bound ! SpecCPU mixes

HPCA-20 February 19, 2014

10

slide-11
SLIDE 11

TurboMode Control

! Naïve TM control

! Always off: miss boost on CPU bound applications ! Always on: suffer inefficiency on interference-bound applications

! Need dynamic TM control

! Understands applications running and metric of interest ! Predicts optimal setting (on/off), adjust dynamically to phases ! No a priori knowledge of applications, no new hardware needed

HPCA-20 February 19, 2014

11

slide-12
SLIDE 12

Predictive Model for TurboMode

! Idea: use runtime info to dynamically predict TM benefits ! Focus primarily on detecting memory interference

! Build predictive model based on performance counters ! Use performance counters & model to predict interference severity ! If too severe, turn off TurboMode

HPCA-20 February 19, 2014

12

slide-13
SLIDE 13

Core 1 Core N Core N-1 Core 2

Autoturbo: Predictive Control for TurboMode

HPCA-20 February 19, 2014

13

Sample perf counters per core Classifier TurboMode heuristic App properties per core Enable/disable TurboMode Metric Training data

App N App N App N TM on/off Perf Perf Perf

slide-14
SLIDE 14

Training the Predictive Model

HPCA-20 February 19, 2014

14

Single SPECCPU, TurboMode on Single SPECCPU, TurboMode off Single SPECCPU +stream, TurboMode on Single SPECCPU +stream, TurboMode off Raw training data Feature selection Naïve Bayes Logistic Regression Nearest Neighbors Decision Tree Model selection Single SpecCPU, TurboMode on Single SpecCPU, TurboMode off Single SpecCPU +stream, TurboMode on Single SpecCPU +stream, TurboMode off 85% 81% 73% 75%

slide-15
SLIDE 15

Model Validation

! Model accuracy: ~90% on cross-validation ! Best counters: those that indicate memory-bound workload

! SBServer/SBMobile: % cycles with outstanding memory requests, … ! ILServer: L2 MPKI, # requests to memory/instruction, …

! CPU/thermal intensity counters don’t correlate strongly!

! E.g., floating-point intensity counters

HPCA-20 February 19, 2014

15

slide-16
SLIDE 16

Autoturbo Evaluation

! Used autoturbo in conjunction with workloads

! Evaluation workloads are apps other than single-thread SpecCPU

! Measure efficiency metrics ! Compare against

! Baseline: TurboMode is always off ! Naïve TM: TurboMode is always on ! Static oracle: TurboMode on if leads to benefit for the overall run

HPCA-20 February 19, 2014

16

slide-17
SLIDE 17

Autoturbo results

Sandy Bridge Mobile QPS/$ Sandy Bridge Server ED²

HPCA-20 February 19, 2014

17

  • 40%
  • 20%

0% 20% 40% ED² improvement App Mix Naïve Auto Static Oracle

  • 10%
  • 5%

0% 5% 10% QPS/$ improvement App Mix Naïve Auto Static Oracle

1 1 35 82 Gains over always using TurboMode Gains over never using TurboMode

slide-18
SLIDE 18

Autoturbo Analysis

! Autoturbo gets best of both worlds

! Reduces cases where TM causes efficiency degradation ! Keeps cases where TM leads to benefits

! autoturbo often disables TM even though it is beneficial

! Cause: the interference predictor assumes worst case interference

! autoturbo beats the static oracle

! Cause: autoturbo can take advantage of dynamism during the run

HPCA-20 February 19, 2014

18

slide-19
SLIDE 19

Conclusions

! TurboMode is useful but must be managed dynamically ! This work: dynamic TurboMode control

! Predictive model for memory interference ! Dynamic control with no hand-tuning needed ! Eliminates efficiency drops, maintains efficiency gains of TurboMode

! Future work

! Apply similar approach to manage advanced power settings

HPCA-20 February 19, 2014

19

slide-20
SLIDE 20

autoturbo dealing with a phase change

HPCA-20 February 19, 2014

20

2.40 2.50 2.60 2.70 2.80 2.90 3.00 215 235 255 275 295 Frequency (GHz) Time (s)

autoturbo dynamic adjustment on Sandy Bridge Mobile

Memory interference

  • ccurs mid-workload