KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
System Architecture Group
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Andreas Merkel, Jan Stoess, Frank Bellosa
Resource-Conscious Scheduling for Energy Efficiency on Multicore - - PowerPoint PPT Presentation
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universitt Karlsruhe (TH) Memory
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
System Architecture Group
Andreas Merkel, Jan Stoess, Frank Bellosa
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 2
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 3
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 4
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 5
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 6
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 7
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 8
0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 instance 2 instances 4 instances
n
m a l i z e d r u n t i m e p e r i n s t a n c e stream memory benchmark
core0
stream
core1 idle core2 idle core3 idle core0
stream
core2 idle core0
stream
core1
stream
core2 idle core3 idle core1
stream
core2
stream
core3
stream
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 9
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 10
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 11
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 12
Memory-bound tasks: low frequency/voltage Compute-bound tasks: high frequency/voltage
Often shared frequency/voltage domains
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 13
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 14
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 15
Resource contention Shared frequency/voltage domains
OS task scheduling VM scheduling Frequency selection
Reduction of resource contention Increase in energy efficiency by 10 to 20%
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 16
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 17
hmmer libquantum 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 instance 2 instances separate caches 2 instances shared caches 4 instances
n
m a l i z e d r u n t i m e p e r i n s t a n c e
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 18
hmmer libquantum 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 instance 2 instances separate caches 2 instances shared caches 4 instances
n
m a l i z e d r u n t i m e p e r i n s t a n c e
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 19
hmmer libquantum 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 instance 2 instances separate caches 2 instances shared caches 4 instances
n
m a l i z e d r u n t i m e p e r i n s t a n c e
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 20
hmmer libquantum 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 instance 2 instances separate caches 2 instances shared caches 4 instances
n
m a l i z e d r u n t i m e p e r i n s t a n c e
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 21
hmmer libquantum 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 instance 2 instances separate caches 2 instances shared caches 4 instances
n
m a l i z e d r u n t i m e p e r i n s t a n c e
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 22
g r
a c s n a m d p
r a y h m m e r g a m e s s h 2 6 4 r e f s j e n g g
m k d e a l I I t
t
e u s m p b z i p 2 a s t a r x a l a n c b m k l e s l i e 3 d b w a v e s s p h i n x 3
n e t p p m c f s
l e x G e m s F D T D m i l c l i b q u a n t u m l b m 0.5 1 1.5 2 2.5 3 3.5 1 instance 2 instances separate caches 2 instances shared caches 4 instances
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 23
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 24
Comparison of 1.6GHz to 2.4GHz 4 instances of benchmark Reducing the frequency pays off for memory intensive tasks
g r
a c s n a m d p
r a y h m m e r g a m e s s h 2 6 4 r e f s j e n g g
m k d e a l I I t
t
e u s m p b z i p 2 a s t a r x a l a n c b m k l e s l i e 3 d b w a v e s s p h i n x 3
n e t p p m c f s
l e x G e m s F D T D m i l c l i b q u a n t u m l b m 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
time energy edp
g r
a c s n a m d p
r a y h m m e r g a m e s s h 2 6 4 r e f s j e n g g
m k d e a l I I t
t
e u s m p b z i p 2 a s t a r x a l a n c b m k l e s l i e 3 d b w a v e s s p h i n x 3
n e t p p m c f s
l e x G e m s F D T D m i l c l i b q u a n t u m l b m 0.5 1 1.5 2 2.5 3 3.5 1 instance 2 instances separate caches 2 instances shared caches 4 instances g r
a c s n a m d p
r a y h m m e r g a m e s s h 2 6 4 r e f s j e n g g
m k d e a l I I t
t
e u s m p b z i p 2 a s t a r x a l a n c b m k l e s l i e 3 d b w a v e s s p h i n x 3
n e t p p m c f s
l e x G e m s F D T D m i l c l i b q u a n t u m l b m 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
time energy edp
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 26
g r
a c s n a m d p
r a y h m m e r g a m e s s c a l c u l i x h 2 6 4 r e f s j e n g p e r l b e n c h g
m k c a c t u s A D M d e a l I I t
t
e u s m p w r f b z i p 2 a s t a r x a l a n c b m k l e s l i e 3 d b w a v e s s p h i n x 3
n e t p p m c f s
l e x G e m s F D T D g c c m i l c l i b q u a n t u m l b m 1 2 3 4 5 6 7 8 9 time energy EDP
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 27
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 28
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 29
Requires knowledge of task characteristics Requires coordination of task selection across cores
Task characterization Execution of tasks in a defined order (runqueue sorting) Used for mitigating thermal effects
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 30
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 31
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 32
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 33
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 34
VM a VM b VM c VM d VM a VM b VM c VM d
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 35
Too many memory-bound tasks/VMs are present Sorted scheduling has to co-schedule memory-bound tasks
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 36
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 37
gamess gromacs hmmer namd lbm libquantum mcf soplex 0.8 0.84 0.88 0.92 0.96 1 1.04 time EDP
r e l a t i v e r u n t i m e , E D P
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 38
gamess sjeng hmmer namd lbm libquantum mcf soplex average 0.2 0.4 0.6 0.8 1 1.2 1.4 runtime EDP
Workload: 8 SPEC benchmarks, each in a separate VM Worst case: 4 memory-bound benchmarks on one physical machine
r e l a t i v e r u n t i m e , E D P
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 39
Resource contention Shared voltage domains
Resource-conscious load balancing VM scheduling and migration Frequency scaling as fallback
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 40
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 41
processor
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 42
memory interconnect
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 43
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 44
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 45
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 46
(e.g., functional unit, cache, memory interconnect, ...)
Task Activity Vectors: A New Metric for Temperature-Aware Scheduling Andreas Merkel and Frank Bellosa Third ACM SIGOPS EuroSys Conference, 2008
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 47
Inferred on-line from performance monitoring counters
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 48
Disregarding of interference Disregarding of task-specific optimal frequency
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 49
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 50
EDP factor memory intensity
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 51
gamess gromacs hmmer namd lbm libquantum mcf soplex 0.8 0.84 0.88 0.92 0.96 1 1.04 time EDP
r e l a t i v e r u n t i m e , E D P
On-chip thread-level parallelism
simultaneous multithreading (SMT)
chip multiprocessors (CMP)
shared resources shared power management
Schedulers designed for traditional SMP systems Independent scheduling decisions for each processor
combination of tasks running at a time is arbitrary is this optimal for SMT/CMP? what about resource contention? what about power management features like
frequency scaling?
Assumption: a set of unrelated, single-treaded
no communication
Frequency selection
SMP: independently for each processor SMT: affects all logical threads of a processor CMP: per-core selection possible at the price of
hardware complexity, but often only per-chip
Some tasks run more efficiently at a certain frequency
memory-bound tasks: lower frequencies compute-bound tasks:higher frequencies
Classical SMP
physically different chips interference via memory bus (shared bus, cache
coherency)
SMT
multiple logical threads on one chip heavy contention for almost all resources
CMP
multiple processors on one chip interference via memory access logic, memory bus sometimes shared caches
Intel Core2 Quad
resource contention
L2 cache shared between 2 cores memory access infrastructure shared by all 4 cores
frequency selection
frequency shared by two cores voltage scaling only for entire chip
Microbenchmarks SPEC CPU 2006 benchmarks
Lower frequency is beneficial if all cores execute
But: Overhead in terms of time and energy if all cores
Do the benefits outweigh the overhead?
4x hmmer (compute-intensive) 4x soplex (memory-intensive)
Design scheduling policy that is optimal for the new
Use the resource CPU as efficiently as possible in
energy time
Sometimes controversial goals
compromise: EDP = energy * delay
Run tasks in combinations that cause no interference Run each task at its optimal frequency
combination matters, if frequency selection affects
multiple CPUs
=> we need to be able to determine what tasks run
Task migrations Coordination of scheduling decisions (sort of gang
Run memory-intensive tasks parallel to compute-
Only lower the frequency if nothing but memory-
time power EDP time power EDP
0.5 1 1.5
2.4 1.6 heuristic
Improved runtime and EDP by avoiding contention Reduction of EDP by reduction of runtime Frequency scaling only beneficial if scheduling cannot
Reduction of EDP by reduction of power