measuring the performance improvements as you parallelize
play

Measuring the performance improvements as you parallelize and - PowerPoint PPT Presentation

Measuring the performance improvements as you parallelize and optimize your software 0.25 s -O2 -march=native -mtune=native -ftree-vectorize 0.27 s -O2 -march=native -mtune=native -ftree-vectorize Multiplication of 0.359 s 0.7x -O2


  1. Measuring the performance improvements as you parallelize and optimize your software

  2. 0.25 s -O2 -march=native -mtune=native -ftree-vectorize 0.27 s -O2 -march=native -mtune=native -ftree-vectorize Multiplication of 0.359 s 0.7x -O2 -march=native -mtune=native -ftree-vectorize -fopenmp OpenMP Transposed Sparse pragma atomic 0.3470 s 0.78x Matrix by Vector -O2 -march=native -mtune=native -ftree-vectorize -fopenmp managing race 0.1172 s 2.13x OpenMP -O2 -march=native -mtune=native -ftree-vectorize -fopenmp conditions privatization with... of arrays 0.1177 s 2.29x -O2 -march=native -mtune=native -ftree-vectorize -fopenmp 2

  3. -Evaluate the performance of your serial, parallel and optimized code ○ ○ ○ -Tuning and optimization: ○ ○ ○ 3

  4. Speedup -Avoiding ‘too much parallel’ Ideal ○ Real ○ Threads Processes Cores 4

  5. -Serial and parallel performance -Take regular parallel performance measurements as you progress ○ ○ -Understand your performance limits ○ ○ Use Speedup and Efficiency measures 5

  6. -Measure the relative performance between serial and parallel code. -Improvement in speed of execution of a task executed on the same architecture but with different resources Speedup, S, for problem size N on P processes/threads/cores T(N,1) S(N, P) = T(N,P) -Tips: ○ ○ 6 ○

  7. -Measure the efficiency of the parallel code. -100% efficiency = using double the resources, but taking half the runtime (i.e. the same resources are used in total) Parallel efficiency, E, for problem size N on P processes/threads/cores S(N,P) T(N,1) E(N, P) = = P (P*T(N,P)) 7

  8. - We can never parallelize every single part of code (e.g. initialising and distributing the data). - A fraction of the runtime , α , is completely serial, limiting the parallel runtime even with 100% efficiency of the parallel fraction on P processors/threads/cores. For runtime T, using problem size N for P processes Known as ‘Amdahl’s Law’ (1- α )T(N,1) T= α T(N,1) + P - Limited by the serial fraction: ○ α ○ α ■ α 8

  9. Gene Amdahl, 1967 Serial α =10% Parallel 5x 3.7x α =5% 2x α =sequential (1- α )=90% portion 1x α =10% 1 2 4 8 Processors α =25% α =50% Serial α =50% Parallel (1- α )=50% 1.8x 1.6x 1.33x Source: Wikipedia 1x 1 2 4 8 Processors 9

  10. Use the spreadsheet assigned to your team to record timings ● ○ ■ ■ ■ ○ Use this regularly, particularly once you start trying multiple parallelization ● methods and tuning your implementations. Try and gain an understanding of your serial fraction, α ● ○ ○ α 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend