dvfs performance prediction for managed multithreaded
play

DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS - PowerPoint PPT Presentation

DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS Shoaib Akram, Jennifer B. Sartor, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@elis.UGent.be DVFS Performance PredicEon performance many applicaEons here memory


  1. DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS Shoaib Akram, Jennifer B. Sartor, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@elis.UGent.be

  2. DVFS Performance PredicEon performance à many applicaEons here memory bound frequency à Sample at all DVFS states L Es/mate performance J 2

  3. Managed MulEthreaded ApplicaEons 3

  4. Background Base Frequency Target Frequency • r = Base/Target t base • S à S * r CPU DRAM • NS à No change Eme à • t target = (S*r) + NS • t base sum of • Not simple – Scaling (S) • OOO+MLP – Non-Scaling (NS) 4

  5. State of the Art • CRIT esEmates non-scaling by – Measuring criEcal path through loads – Ignoring store operaEons R. Mi&akhutdinov, E. Ebrahimi, and Y. N. Pa8. Predic;ng performance impact of DVFS for realis;c memory systems. MICRO, 2012. 5

  6. MulEthreaded CRIT (M+CRIT) Base Frequency Target Frequency 2X t base t target criEcal T0 T0 T1 T1 0 1 0 0.5 1 Eme à Eme à Use CRIT to idenEfy each thread’s non-scaling High error for mulEthreaded Java! 6

  7. Sources of Inaccuracy in M+CRIT busy wait store burst app0 app1 gc0 gc1 ApplicaEon CollecEon ApplicaEon Scaling or non-scaling? 7

  8. Sources of Inaccuracy in M+CRIT busy wait store burst DEP DEP app0 DEP app1 gc0 DEP DEP gc1 BURST ApplicaEon CollecEon ApplicaEon Scaling or non-scaling? 8

  9. Our ContribuEon busy wait store burst DEP DEP DEP+BURST app0 DEP app1 gc0 DEP DEP A New DVFS Performance Predictor gc1 BURST ApplicaEon CollecEon ApplicaEon Scaling or non-scaling? 9

  10. Our ContribuEon DEP+BURST A New DVFS Performance Predictor 10

  11. Example: Inter-thread Dependences T1 T0 while (cond0) while (cond1) { { 1 2 … … } } wait --- Acquire(lock) Acquire(lock) crit_sec() … crit_sec() … Release(lock) Release(lock) 4 3 wake ... ... • Intercept synchronizaEon acEvity • Reconstruct execuEon at target frequency 11

  12. IdenEfying SynchronizaEon Epochs Base Frequency Target Frequency T0 T1 loop loop Epoch # 1 wait() crit_sec() wait Epoch # 2 wake() crit_sec() Epoch # 3 !me 12

  13. IdenEfying SynchronizaEon Epochs Base Frequency Target Frequency T0 T1 Epoch # 1 Epoch # 2 Epoch # 3 !me 13

  14. IdenEfying SynchronizaEon Epochs Base Frequency Target Frequency T0 T1 Epoch 10 10 # 1 Epoch 10 # 2 Epoch 10 10 # 3 !me = 30 units 14

  15. ReconstrucEon at Target Frequency Base Frequency Target Frequency 2X T0 T0 T1 T1 7 5 # 1 Epoch 10 10 # 1 5 # 2 CRIT Epoch 10 5 5 # 2 # 3 Epoch 10 10 # 3 !me 15

  16. ReconstrucEon at Target Frequency Base Frequency Target Frequency 2X T0 T0 T1 T1 7 5 # 1 Epoch 10 10 # 1 3 5 # 2 Epoch 10 5 5 # 2 # 3 = 17 units Longest running in an epoch Epoch 10 10 + Zero book-keeping # 3 !me - Not accurate 16

  17. ReconstrucEon at Target Frequency Base Frequency Target Frequency 2X T0 T0 T1 T1 7 5 # 1 Epoch 10 10 # 1 3 5 # 2 Epoch 10 5 5 # 2 # 3 = 15 units CriEcal thread across epochs Epoch 10 10 + Accurate # 3 !me - Book-keeping = 30 units 17

  18. DEP: Summary Sync AcEvity • Sync Epochs Decompose • Perf Counters Epochs @ Tgt. Reconstruct Aggregate Predicted Total Time 18

  19. Our ContribuEon DEP+BURST A New DVFS Performance Predictor 19

  20. Our ContribuEon DEP+BURST A New DVFS Performance Predictor 20

  21. Store Bursts • Reasons – Zero iniEalizaEon – Copying collectors • Modeling Steps – Track how long the store queue is full – Add to the non-scaling component 21

  22. Methodology • Jikes RVM 3.1.2 • ProducEon collector (Immix) • # GC threads = 2 • 2x min. heap Version 6.0 • 4 cores, 1.0 GHz à 4.0 GHz • 3-level cache hierarchy • LLC fixed to 1.5 GHz • DVFS semngs for 22 nm Haswell • Seven mulEthreaded benchmarks • Four applicaEon threads 22

  23. Accuracy M+CRIT M+CRIT+BURST DEP+BURST 30 % average absolute error 27% 20 13% 10 6% 0 2.0 GHz 3.0 GHz 4.0 GHz Baseline Frequency = 1.0 GHz 23

  24. Energy Manager tolerable_performance_degradaEon New Freq1 New Freq2 4 GHz Quantum 5 ms 24

  25. Energy Savings Performance DegradaEon Energy ReducEon 25 20 15 % 10 5 0 25 Memory Intensive Compute Intensive

  26. Conclusions • DEP+BURST: First predictor that accounts for – ApplicaEon and service threads – SynchronizaEon à inter-thread dependencies – Store bursts • High accuracy – Less than 10% esEmaEon error for seven Java bmarks. • Negligible hardware cost – One extra performance counter – Minor book-keeping across epochs • Demonstrated energy savings – 20 % avg. for a 10% slowdown (mem-intensive Java apps.) 26

  27. DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS Thank You ! Shoaib.Akram@elis.UGent.be

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend