µDPM: Dynamic Power Management for the Microsecond Era
Chih-Hsun Chou
cchou001@cs.ucr.edu
Laxmi N. Bhuyan
bhuyan@cs.ucr.edu
Daniel Wong
danwong@ucr.edu
DPM: Dynamic Power Management for the Microsecond Era Chih-Hsun - - PowerPoint PPT Presentation
DPM: Dynamic Power Management for the Microsecond Era Chih-Hsun Chou Laxmi N. Bhuyan Daniel Wong cchou001@cs.ucr.edu bhuyan@cs.ucr.edu danwong@ucr.edu Computer systems efficiently support . . . ns ms events s Killer
cchou001@cs.ucr.edu
bhuyan@cs.ucr.edu
danwong@ucr.edu
HPCA 2019
Killer Microsecond
2
Masked by microarchitectural techniques Masked by OS-level techniques
HPCA 2019
3
[1] Luiz Barroso, Mike Marty, David Patterson, and Parthasarathy Ranganathan. Attack of the killer microseconds. Commun. ACM 60, 4 (March 2017), 48-54.
HPCA 2019
4
Request Response ~milliseconds
Traditional Monolithic Services
HPCA 2019
5
Request Response
Emerging Microservices
microseconds
HPCA 2019
6
Source: Adrian Cockcroft, “Monitoring Microservices and Containers: A Challenge”
HPCA 2019 7
HPCA 2019
8
0% 20% 40% 60% 80% 100% 120% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Tail Latency (% of SLA) Load (% of peak load) tail latency SLA Latency Slack
HPCA 2019
› Rubik adjusts f per request
› SleepScale [ISCA’14] finds optimal frequency & C-state depth for 60s epochs
9
Epoch 0 Epoch 1
t t f f
HPCA 2019
’17])
10
sleep time
HPCA 2019
’17])
10
sleep time R1 arrives
HPCA 2019
’17])
10
sleep time R1 arrives
Target Tail Latency
HPCA 2019
’17])
10
sleep time R1 arrives Wake
Target Tail Latency
HPCA 2019
’17])
10
sleep time R1 arrives Wake
Target Tail Latency
HPCA 2019
’17])
10
sleep time Wake R2 arrives
HPCA 2019
’17])
10
sleep time Wake R2 arrives
Target Tail Latency
HPCA 2019
’17])
10
sleep time Wake R2 arrives
Target Tail Latency
HPCA 2019
’17])
10
sleep time Wake R3 arrives
HPCA 2019
’17])
10
Target Tail Latency
sleep time Wake R3 arrives
HPCA 2019
’17])
10
Wake
Target Tail Latency
sleep time R3 arrives
HPCA 2019
’17])
10
Wake
Target Tail Latency
sleep time R3 arrives
HPCA 2019
’17])
10
Wake sleep time
HPCA 2019
11
Power (W) 28 30.75 33.5 36.25 39 Avg service time(us) 10 100 1000 Baseline Rubik SleepScale DynSleep DVFS Deep Sleep DVFS+Sleep Baseline
HPCA 2019
11
Power (W) 28 30.75 33.5 36.25 39 Avg service time(us) 10 100 1000 Baseline Rubik SleepScale DynSleep D P M I n e f f e c t i v e DVFS Deep Sleep DVFS+Sleep Baseline
HPCA 2019
11
Power (W) 28 30.75 33.5 36.25 39 Avg service time(us) 10 100 1000 Baseline Rubik SleepScale DynSleep D P M I n e f f e c t i v e DVFS Deep Sleep DVFS+Sleep Baseline
HPCA 2019
11
Power (W) 28 30.75 33.5 36.25 39 Avg service time(us) 10 100 1000 Baseline Rubik SleepScale DynSleep D P M I n e f f e c t i v e DVFS Deep Sleep DVFS+Sleep Baseline
HPCA 2019
11
Power (W) 28 30.75 33.5 36.25 39 Avg service time(us) 10 100 1000 Baseline Rubik SleepScale DynSleep D P M I n e f f e c t i v e DVFS Deep Sleep DVFS+Sleep Baseline
HPCA 2019
12
50% utilization Longer Service Time Shorter Service Time 50% utilization
HPCA 2019
13
DVFS (500µs) Sleep (500µs) Sleep (200µs) DVFS (200µs)
HPCA 2019
13
DVFS (500µs) Sleep (500µs) Sleep (200µs) DVFS (200µs)
HPCA 2019
14
Baseline Rubik Sleepscale DynSleep Optimal Normalized Energy 0.6 0.68 0.76 0.84 0.92 1 Busy Idle C-state tran. VFS tran.
HPCA 2019
15
Tail Service Time = 78µs C6 Residency Time = 300µs
R1 Arrival Tail Latency Target = 800µs Wasted Energy Wasted Energy DVFS limited in closing Latency Gap t
* SPECjbb timing
HPCA 2019
15
Tail Service Time = 78µs C6 Residency Time = 300µs
R1 Arrival Tail Latency Target = 800µs Wasted Energy Wasted Energy Solution: Aggressive Deep Sleep Solution: Request Delaying DVFS limited in closing Latency Gap t
Solution: Coordinate DVFS
* SPECjbb timing
HPCA 2019
16
Power (W) 28 30.75 33.5 36.25 39 Avg service time(us) 10 100 1000 Baseline Rubik SleepScale DynSleep µDPM
Deep Sleep DVFS+Sleep DVFS Baseline
HPCA 2019
17
Tail Service Time = 78µs
t Tail Latency Target = 800µs Solution: Aggressive Deep Sleep Solution: Request Delaying R1 Arrival Residency Time = 300µs
Solution: Coordinate DVFS
HPCA 2019
18
Memcached SPECjbb Xapian Masstree
Tail Service Time (95th percentile)
33µs 78µs 250µs 1200µs
HPCA 2019
18
Memcached SPECjbb Xapian Masstree
Target Tail Latency (95th percentile)
150µs 800µs 1100µs 2100µs
Tail Service Time (95th percentile)
33µs 78µs 250µs 1200µs
HPCA 2019
18
Memcached SPECjbb Xapian Masstree
Target Tail Latency (95th percentile)
150µs 800µs 1100µs 2100µs
Tail Service Time (95th percentile)
33µs 78µs 250µs 1200µs Opportunity
HPCA 2019
19
R0 Req
R0 Arrival t
C0 C3 C6
Residency Time = 300µs
HPCA 2019
› Statistical performance model[2] › Online periodic resampling (100ms)
20
R0 Req
R0 Arrival
t
C0 C3 C6
Si
[2] Kasture, Harshad, Davide B. Bartolini, Nathan Beckmann, and Daniel Sanchez. "Rubik: Fast analytical power management for latency-critical systems." MICRO 2015.
HPCA 2019
› Statistical performance model[2] › Online periodic resampling (100ms)
› L = W + Twake + Tdvfs + Si / f
20
R0 Req
R0 Arrival
t
C0 C3 C6
Si W Si/f Twake + Tdvfs
[2] Kasture, Harshad, Davide B. Bartolini, Nathan Beckmann, and Daniel Sanchez. "Rubik: Fast analytical power management for latency-critical systems." MICRO 2015.
HPCA 2019
› If inter-arrival time between 2 consecutive requests are shorter than the tail service time
21
R0 t
HPCA 2019
› If inter-arrival time between 2 consecutive requests are shorter than the tail service time
21
R0 t
R1 Arrival – Critical!
HPCA 2019
› If inter-arrival time between 2 consecutive requests are shorter than the tail service time
21
R0 t
R1 Arrival – Critical! QoS Violation!
HPCA 2019
› If inter-arrival time between 2 consecutive requests are shorter than the tail service time
21
R0 t
R1 Arrival – Critical! QoS Violation!
R1 critical if
tR1 – tR0 ≤ Sftail
HPCA 2019
22
R0 t
R1 QoS Violation!
R1 critical if
tR1 – tR0 ≤ Sftail
HPCA 2019
22
R0 t R1
R1 critical if
tR1 – tR0 ≤ Sftail
f’ = Stail/(TRiTargetCompletion-TRi-1Completion )
HPCA 2019
22
R0 t R1
R1 critical if
tR1 – tR0 ≤ Sftail
Increase frequency on wakeup Can sleep longer due to higher freq.
f’ = Stail/(TRiTargetCompletion-TRi-1Completion )
HPCA 2019
22
R0 t R1
R1 critical if
tR1 – tR0 ≤ Sftail
Increase frequency on wakeup Can sleep longer due to higher freq.
f’ = Stail/(TRiTargetCompletion-TRi-1Completion )
HPCA 2019
23
See paper for details!
HPCA 2019
› In-House Simulator (similar to BigHouse) › Empirical Power Model
› 10µs DVFS transition, 89µs sleep transition time (double to account for cache flushing) › Add 25µs to first request service time after idle period for cold miss penalty
› Baseline – Linux menu idle governor and intel_pstate driver › Workloads
24
HPCA 2019
25
Energy Saving (%) 10 20 30 40 Load 0.1 0.2 0.3 0.4 0.5
Memcached
Energy Saving (%) 10 20 30 40 Load 0.1 0.2 0.3 0.4 0.5 0.6 0.7
SPECjbb
Energy Saving (%) 5 10 15 20 Load 0.1 0.2 0.3 0.4 0.5 0.6
Masstree
Energy Saving (%) 2.75 5.5 8.25 11 Load 0.1 0.2 0.3 0.4 0.5 0.6
Xapian Rubik DynSleep Sleepscale µDPM Optimal
HPCA 2019
25
Energy Saving (%) 10 20 30 40 Load 0.1 0.2 0.3 0.4 0.5
Memcached
Energy Saving (%) 10 20 30 40 Load 0.1 0.2 0.3 0.4 0.5 0.6 0.7
SPECjbb
Energy Saving (%) 5 10 15 20 Load 0.1 0.2 0.3 0.4 0.5 0.6
Masstree
Energy Saving (%) 2.75 5.5 8.25 11 Load 0.1 0.2 0.3 0.4 0.5 0.6
Xapian Rubik DynSleep Sleepscale µDPM Optimal
µDPM typically within 2-3% of Optimal µDPM saves ~2X vs others
HPCA 2019
26
Baseline Rubik Sleepscale DynSleep Optimal µDPM Normalized Energy 0.6 0.68 0.76 0.84 0.92 1 Busy Idle C-state tran. VFS tran.
HPCA 2019
27
HPCA 2019
28
Energy saving (%) 3 6 9 12 Latency contraint (µs) 600 1133 1667 2200 Energy saving (%) 7.5 15 22.5 30 Latency contraint (µs) 80 135 190 245 300 Energy saving (%) 2.25 4.5 6.75 9 Latency contraint (µs) 1200 1950 2700 3450 4200 Energy saving (%) 7.5 15 22.5 30 Latency contraint (µs) 400 700 1000 1300 1600
Memcached SPECjbb Masstree Xapian
HPCA 2019
5 10 15 20 25 30 100 200 300 Energyr Saving (%) sleep transition time(µsec) µDPM µDPM w/ criticality-awareness
29
10 20 30 20 40 60 80 100 Energy Saving (%) VFS transition time (µsec) Rubik µDPM µDPM w/ criticality-awareness
HPCA 2019
30
cchou001@cs.ucr.edu
bhuyan@cs.ucr.edu
danwong@ucr.edu