acmp an architecture to handle amdahl s law
play

ACMP: An Architecture to Handle Amdahls Law M. Aater Suleman - PowerPoint PPT Presentation

ACMP: An Architecture to Handle Amdahls Law M. Aater Suleman Advisor: Yale Patt HPS Research Group Acknowledgements Eric Sprangle, Intel Anwar Rohillah, Intel Anwar Ghuloum, Intel Doug Carmean, Intel Background Single-thread


  1. ACMP: An Architecture to Handle Amdahl’s Law M. Aater Suleman Advisor: Yale Patt HPS Research Group

  2. Acknowledgements Eric Sprangle, Intel Anwar Rohillah, Intel Anwar Ghuloum, Intel Doug Carmean, Intel

  3. Background • Single-thread performance is power constrained • To leverage CMPs for a single application, it must be parallelized • Many kernels cannot be parallelized completely • Applications likely include both serial and parallel portions • Amdahl’s law is more applicable now than ever

  4. Serial Bottlenecks • Inherently serial kernels For I = 1 to N A[I] = (A[I-1] + A[I])/2 • Parallelization requires effort 1 0.9 Irregular 0.8 Degree of Parallelism code Loops with early 0.7 termination 0.6 0.5 Data-parallel 0.4 Loops 0.3 0.2 0.1 0 Programmer Effort

  5. CMP Architectures • Tile small cores e.g. Sun Niagara, Intel Larrabee – High throughput on the parallel part – Low serial thread performance – Highest performance for completely parallelized applications • Tile large cores e.g. Intel Core2Duo, AMD Barcelona, and IBM Power 5. – High serial thread performance – Lower throughput than Niagara

  6. ACMP • Run serial thread on the large core to extract ILP • Run parallel threads on small cores

  7. ACMP • Run serial thread on the large core to extract ILP • Run parallel threads on small cores

  8. ACMP • Run serial thread on the large core to extract ILP • Run parallel threads on small cores

  9. Performance vs. Parallelism 18 Speedup vs. 1 P6-type Core 16 ACMP 14 Niagara 12 P6-Tile 10 8 6 4 2 0 0 0.2 0.4 0.6 0.8 1 Degree of Parallelism

  10. Performance vs. Parallelism 18 Speedup vs. 1 P6-type Core 16 ACMP 14 Niagara 12 P6-Tile 10 At low parallelism, 8 ACMP and P6-Tile 6 outperform Niagara 4 2 0 0 0.2 0.4 0.6 0.8 1 Degree of Parallelism

  11. Performance vs. Parallelism 18 At high parallelism, Speedup vs. 1 P6-type Core 16 ACMP Niagara 14 Niagara outperforms ACMP 12 P6-Tile 10 8 6 4 2 0 0 0.2 0.4 0.6 0.8 1 Degree of Parallelism

  12. Performance vs. Parallelism 18 At medium Speedup vs. 1 P6-type Core 16 ACMP parallelism, ACMP 14 Niagara wins 12 P6-Tile 10 8 6 4 2 0 0 0.2 0.4 0.6 0.8 1 Degree of Parallelism

  13. Performance vs. Parallelism 18 Speedup vs. 1 P6-type Core 16 ACMP The cut-off point 14 Niagara moves to the right 12 P6-Tile in the future 10 8 6 4 2 0 0 0.2 0.4 0.6 0.8 1 Degree of Parallelism

  14. Experimental Methodology • Large core: Out-of-order (similar to P6) • Small Core: 2-wide, In-order • Configuration: – Niagara: 16 small cores – P6-Tile: 4 large cores – ACMP: 1 Large core, 12 small cores • Single ISA, shared memory, private L1 and L2 caches, bi-directional ring interconnect • Simulated existing multi-threaded applications without modification • ACMP Thread Scheduling – Master thread � large core – All additional threads � small cores

  15. Performance Results P6-Tile 1.4 ACMP 1.2 Speedup vs. Niagara 1 0.8 0.6 0.4 0.2 Medium High Low 0 is_nasp ep_nasp art_omp mg_nasp fmm_splash cholesky page convert h.264 ed mcf fft_splash cg_nasp Parallelism Parallelism Parallelism

  16. Summary • ACMP trades peak parallel performance for serial performance • Improves performance for a wide range of applications • Performance is less dependent on length of serial portion • Improves programmer efficiency – Programmers can only parallelize easier-to- parallelize kernels

  17. Future Work • Enhanced ACMP scheduling – Accelerate execution of finer-grain serial portions (critical sections) using the large core – Requires compiler support and minimal hardware • Improved threading decision based on run- time feedback

  18. Thank you

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend