cpu design e ff ects that can degrade performance of your
play

CPU design e ff ects that can degrade performance of your programs - PowerPoint PPT Presentation

CPU design e ff ects that can degrade performance of your programs Jakub Bernek jakub.beranek@vsb.cz whoami PhD student @ VSB-TUO, Ostrava, Czech Republic Research assistant @ IT Innovations (HPC center) HPC, distributed


  1. Simple branch predictor - unsorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Not taken

  2. Simple branch predictor - unsorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  3. Simple branch predictor - unsorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  4. Simple branch predictor - unsorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Not taken

  5. Simple branch predictor - unsorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Not taken

  6. Simple branch predictor - unsorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  7. Simple branch predictor - unsorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  8. Simple branch predictor - unsorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken

  9. Simple branch predictor - unsorted array if (data[i] < 6) { ... } � � � � � � � � Prediction: Not taken � hits, � misses ( �� % hit rate)

  10. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � Prediction: Not taken Prediction: Not taken

  11. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Not taken

  12. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  13. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  14. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  15. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  16. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  17. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  18. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  19. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Taken

  20. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Not taken

  21. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Not taken

  22. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Not taken

  23. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Not taken

  24. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Not taken

  25. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken Prediction: Not taken

  26. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � � < � ? Prediction: Not taken

  27. Simple branch predictor - sorted array if (data[i] < 6) { ... } � � � � � � � � Prediction: Not taken � hits, � misses ( �� % hit rate)

  28. How can the compiler help? With float , there are two branches per iteration

  29. How can the compiler help? With int , one branch is removed (using cmov )

  30. How to measure? branch-misses How many times was a branch mispredicted?

  31. How to measure? branch-misses How many times was a branch mispredicted? $ perf stat -e branch-misses ./example0a with sort -> 383 902 without sort -> 101 652 009

  32. How to help the branch predictor? •More predictable data

  33. How to help the branch predictor? •More predictable data •Pro fi le-guided optimization

  34. How to help the branch predictor? •More predictable data •Pro fi le-guided optimization •Remove (unpredictable) branches

  35. How to help the branch predictor? •More predictable data •Pro fi le-guided optimization •Remove (unpredictable) branches •Compiler hints (use with caution) if (__builtin_expect(will_it_blend(), 0)) { // this branch is not likely to be taken }

  36. Branch target prediction •Target of a jump is not known at compile time:

  37. Branch target prediction •Target of a jump is not known at compile time: •Function pointer

  38. Branch target prediction •Target of a jump is not known at compile time: •Function pointer •Function return address

  39. Branch target prediction •Target of a jump is not known at compile time: •Function pointer •Function return address •Virtual method

  40. Code (backup) struct A { virtual void handle(size_t* data) const = 0; }; struct B: public A { void handle(size_t* data) const final { *data += 1; } }; struct C: public A { void handle(size_t* data) const final { *data += 2; } }; std::vector<std::unique_ptr<A>> data = /* 4K random B/C instances */ ; // std::sort(data.begin(), data.end(), /* sort by instance type */); size_t sum = 0; for (auto& x : data) { x->handle(&sum); }

  41. Result (backup)

  42. perf (backup) $ perf stat -e branch-misses ./example0b with sort -> 337 274 without sort -> 84 183 161

  43. Code (backup) // Addresses of N integers, each `offset` bytes apart std::vector<int*> data = ...; for (auto ptr: data) { *ptr += 1; } // Offsets: 4, 64, 4000, 4096, 4128

  44. Result (backup)

  45. Cache memory

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend