DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS
Shoaib Akram, Jennifer B. Sartor, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@elis.UGent.be
DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS - - PowerPoint PPT Presentation
DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS Shoaib Akram, Jennifer B. Sartor, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@elis.UGent.be DVFS Performance PredicEon performance many applicaEons here memory
DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS
Shoaib Akram, Jennifer B. Sartor, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@elis.UGent.be
DVFS Performance PredicEon
2
Sample at all DVFS states L Es/mate performance J
frequency à performance à memory bound many applicaEons here
3
Managed MulEthreaded ApplicaEons
4
Background
Base Frequency Target Frequency
CPU DRAM
Eme à
– Scaling (S) – Non-Scaling (NS)
tbase
– Measuring criEcal path through loads – Ignoring store operaEons
5
performance impact of DVFS for realis;c memory
State of the Art
6
High error for mulEthreaded Java!
MulEthreaded CRIT (M+CRIT)
Base Frequency Target Frequency
Eme à
T0 T1
Eme à
T0 T1
ttarget tbase
Use CRIT to idenEfy each thread’s non-scaling
2X
1 0.5 1
criEcal
7
app0 ApplicaEon CollecEon busy wait store burst
Scaling or non-scaling?
Sources of Inaccuracy in M+CRIT
ApplicaEon app1 gc0 gc1
8
app0 ApplicaEon CollecEon busy wait store burst
Scaling or non-scaling?
Sources of Inaccuracy in M+CRIT
ApplicaEon app1 gc0 gc1
BURST
DEP DEP DEP DEP DEP
9
app0 ApplicaEon CollecEon busy wait store burst
Scaling or non-scaling?
Our ContribuEon
ApplicaEon app1 gc0 gc1
BURST
DEP DEP DEP DEP DEP
A New DVFS Performance Predictor
10
Our ContribuEon
A New DVFS Performance Predictor
11
while (cond0) { … } Acquire(lock) crit_sec() … Release(lock) ... while (cond1) { … } Acquire(lock) crit_sec() … Release(lock) ...
T0 T1
Example: Inter-thread Dependences
2 1
3
wait --- wake 4
12
T0 T1
loop wait
IdenEfying SynchronizaEon Epochs
crit_sec() loop crit_sec()
Base Frequency Target Frequency
!me wait() wake() Epoch
# 1
Epoch # 2 Epoch # 3
13
T0 T1
IdenEfying SynchronizaEon Epochs
Base Frequency Target Frequency
!me Epoch
# 1
Epoch # 2 Epoch # 3
14
T0 T1
IdenEfying SynchronizaEon Epochs
Base Frequency Target Frequency
!me Epoch
# 1
Epoch # 2 Epoch # 3
10 10 10 10 10 = 30 units
15
T0 T1
ReconstrucEon at Target Frequency
Base Frequency Target Frequency
!me Epoch
# 1
Epoch # 2 Epoch # 3
2X
10 10 10 10 10 T0 T1 5 7 5 5 CRIT 5
# 1
# 2 # 3
16
T0 T1
ReconstrucEon at Target Frequency
Base Frequency Target Frequency
!me Epoch
# 1
Epoch # 2 Epoch # 3
2X
10 10 10 10 10 T0 T1 5 7 5 3 5
# 1
# 2 # 3
Longest running in an epoch + Zero book-keeping
= 17 units 5
17
T0 T1
ReconstrucEon at Target Frequency
Base Frequency Target Frequency
!me Epoch
# 1
Epoch # 2 Epoch # 3
2X
10 10 10 10 10 T0 T1 5 7 5 5 5
# 1
# 2 # 3
CriEcal thread across epochs + Accurate
= 15 units = 30 units 3
18
Decompose Reconstruct Aggregate
Sync AcEvity
Epochs @ Tgt. Predicted Total Time
DEP: Summary
19
Our ContribuEon
A New DVFS Performance Predictor
20
Our ContribuEon
A New DVFS Performance Predictor
– Zero iniEalizaEon – Copying collectors
– Track how long the store queue is full – Add to the non-scaling component
21
Store Bursts
22
Version 6.0
Methodology
23
Baseline Frequency = 1.0 GHz 10 20 30 2.0 GHz 3.0 GHz 4.0 GHz % average absolute error M+CRIT M+CRIT+BURST DEP+BURST
27% 13% 6%
Accuracy
24
Quantum 5 ms
4 GHz New Freq1 tolerable_performance_degradaEon New Freq2
Energy Manager
25
5 10 15 20 25 % Performance DegradaEon Energy ReducEon
Memory Intensive Compute Intensive
Energy Savings
– ApplicaEon and service threads – SynchronizaEon à inter-thread dependencies – Store bursts
– Less than 10% esEmaEon error for seven Java bmarks.
– One extra performance counter – Minor book-keeping across epochs
– 20 % avg. for a 10% slowdown (mem-intensive Java apps.)
26
Conclusions
Shoaib.Akram@elis.UGent.be DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS