modeling critical sections in amdahl s law and its
play

Modeling Critical Sections in Amdahls Law and its Implications for - PowerPoint PPT Presentation

Modeling Critical Sections in Amdahls Law and its Implications for Multicore Design Stijn Eyerman and Lieven Eeckhout Ghent University, Belgium ISCA, Saint-Malo, France June 23, 2010 Amdahls Law Speedup by parallelizing fraction f


  1. Modeling Critical Sections in Amdahl’s Law and its Implications for Multicore Design Stijn Eyerman and Lieven Eeckhout Ghent University, Belgium ISCA, Saint-Malo, France June 23, 2010

  2. Amdahl’s Law Speedup by parallelizing fraction f across n processors: 1 S = (1 − f ) + f n Parallel performance is bounded by sequential part: 1 lim n →∞ S = 1 − f S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 2

  3. Amdahl’s software model parallel fraction: sequential fraction: f par = 1 − f seq f seq Can we model critical sections in Amdahl’s Law? S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 3

  4. Extending Amdahl’s software model parallel part inside critical sections sequential part parallel part outside critical sections f seq + f par , cs + f par , ncs = 1 P ctn = probability for two critical sections to contend S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 4

  5. Extending Amdahl’s software model Assumptions Each thread is executed equal share of the critical sections Critical sections are entered at random times Critical sections contend randomly S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 5

  6. Compute parallel speedup in the presence of critical sections? Case #1: Low contention: all threads execute equally long total exec time ≅ avg per-thread exec time Case #2: High contention total exec time ≅ avg exec time slowest thread S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 6

  7. Case #1 f par , cs Each thread executes a fraction of critical sections n = f par , cs If no contention: exec time n = ( j + 1) f par , cs If contention with j threads: exec time n Avg time spent in critical section: n − 1 Pr[ contend with j threads ] ⋅ ( j + 1) f par , cs ∑ = n j = 0 S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 7

  8. n − 1 f par , cs ∑ ( ) Pr[ contend with j threads ] ⋅ j + 1 n j = 0 n − 1 ∑ Pr[ i of n − 1 other threads in critical sections ] ⋅ = i = 0 i f par , cs ∑ ( ) Pr[ j of i critical sections contend ] ⋅ j + 1 n j = 0 n − 1   i   n − 1 i f par , cs i − j ⋅ i 1 − P j 1 − P n − 1 − i ∑ ∑ ( ) ( ) ( ) P P j + 1 = ⋅     cs cs ctn ctn i j n     i = 0 j = 0 f par , cs with P cs = f par , cs + f par , ncs S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 8

  9. Avg time spent in critical section = n − 1   i   n − 1 i f par , cs i − j ⋅ i 1 − P j 1 − P n − 1 − i ∑ ∑ ( ) ( ) ( ) P P j + 1 ⋅     cs cs ctn ctn i j n     i = 0 j = 0   ctn + 1 − P cs P ctn = f par , cs ⋅ P cs P   n   sequential part parallel part S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 9

  10. Back to Amdahl’s Law 1 S = ( ) + f par , ncs ctn + f par , cs ⋅ 1 − P cs P ctn f seq + f par , cs ⋅ P cs P n Impact of critical sections can be modeled as a sequential plus a parallel part S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 10

  11. Case #2 Exec time determined by chain of contending critical sections Approx total exec time as the avg exec time of slowest thread S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 11

  12. Avg exec time of slowest thread Length of chain of contending critical sections = f par , cs P ctn Minimum execution time = f seq + f par , cs P ctn Maximum execution time ( ) + f par , ncs ctn + f par , cs 1 − P ctn = f seq + f par , cs P n Average execution time ( ) + f par , ncs ctn + f par , cs 1 − P ctn = f seq + f par , cs P 2 ⋅ n S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 12

  13. Putting it together & validation Q: Total exec time for parallel workload? A: Max (case #1, case #2) Avg error of 3% compared to synthetic simulation 1.2 normalized exec time 1 case #1 formula 1 case #2 formula 2 0.8 synthetic simulation synthetic simulation 0.6 0.4 0.2 f par , cs = 0.5, f par , ncs = 0.5, P ctn = 0.5 0 0 2 4 6 8 10 number of threads S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 13

  14. 1 Theoretical result: lim n →∞ S = f seq + f par , cs ⋅ P ctn Parallel performance is fundamentally limited by critical sections 10000 8000 f seq = 0 S 6000 4000 2000 0 0.01 0.03 0.1 0.09 0.05 0.08 0.07 0.06 0.07 0.05 0.04 0.03 f par , cs 0.09 0.02 0.01 P ctn S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 14

  15. What are the implications for multicore design? S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 15

  16. Amdahl’s Law suggests wimpy small cores in asymmetric multicore 1 S = 1 − f f + p n + p linear speedup w/ increasing sublinear speedup in single- no. small cores thread performance (Pollack’s law) [M. Hill and M. Marty, IEEE Computer, 2008] S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 16

  17. Critical sections have big impact on asymmetric multicore performance 1 lim n →∞ S = f seq p + f par , cs ⋅ P ctn sequential part due to sequential part is critical sections is executed on big core executed on small cores S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 17

  18. Implication: small cores in asymmetric multicore should not be wimpy but middle-of-the-road Intuition: small cores should be sufficiently large to execute critical sections quickly 256 BCEs (base core equivalents) – Hill & Marty S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 18

  19. Asymmetric vs symmetric multicores S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 19

  20. Accelerating Critical Sections (ACS) by Suleman et al. [ASPLOS’09] • Execute critical sections on big core • Naive ACS – Accelerate all critical sections • Perfect ACS – Accelerate contending critical sections only • Selective ACS – Predict whether critical sections will contend – mitigate false serialization S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 20

  21. Evaluating ACS S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 21

  22. Conclusions • Model impact of critical sections in Amdahl’s Law • Theoretical result – Parallel performance is fundamentally limited by critical sections • Implications for multicore design – Small cores in asymmetric multicore should not be wimpy but middle-of-the-road – Symmetric multicores may yield better performance than asymmetric multicores (w/ wimpy small cores) – Accelerating critical sections is a promising idea • ACS, DVFS, SMT, scalable cores • Longue Vie à la Microarchitecture! S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 22

  23. Modeling Critical Sections in Amdahl’s Law and its Implications for Multicore Design Stijn Eyerman and Lieven Eeckhout Ghent University, Belgium Thank you !

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend