requirement models for co design
play

Requirement Models for Co-Design Calotoiu Alexandru Dagstuhl - PowerPoint PPT Presentation

Requirement Models for Co-Design Calotoiu Alexandru Dagstuhl Seminar| 23.10.2017 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 1 Automatic empirical modeling Performance measurements


  1. Requirement Models for Co-Design Calotoiu Alexandru Dagstuhl Seminar| 23.10.2017 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 1

  2. Automatic empirical modeling Performance measurements main() { foo() bar() Instrumentation M i M j compute() } Model Input generator Output Human-readable performance models of all functions (e.g., t = c 1 *log(p) + c 2 ) 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 2

  3. Complexity building blocks LU LU t ( p ) ~ c t ( p ) ~ c Communication Computation FFT FFT t ( p ) ~ log 2 ( p ) t ( p ) ~ c Naïve N-body Naïve N-body t ( p ) ~ p t ( p ) ~ p … … Samplesort Samplesort t ( p ) ~ p 2 log 2 2 ( p ) t ( p ) ~ p 2 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 3

  4. Performance model normal form n j k ( p ) c k ⋅ p i k ⋅ log 2 ∑ f ( p ) = k = 1 I , J ⊂ j k ∈ J n ∈ i k ∈ I 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 4

  5. Creating search spaces n = 1 n j k ( p ) c k ⋅ p i k ⋅ log 2 ∑ f ( p ) = I = 0,1,2 { } J = {0,1} k = 1 c 1 c 1 ⋅ log( p ) c 1 ⋅ p c 1 ⋅ p ⋅ log( p ) c 1 ⋅ p 2 ⋅ log( p ) c 1 ⋅ p 2 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 5

  6. Creating search spaces n = 2 n j k ( p ) c k ⋅ p i k ⋅ log 2 ∑ f ( p ) = I = 0,1,2 { } J = {0,1} k = 1 c 1 ⋅ p ⋅ log( p ) + c 2 ⋅ p 2 ⋅ log( p ) c 1 ⋅ log( p ) + c 2 ⋅ p 2 ⋅ log( p ) c 1 + c 2 ⋅ p c 1 ⋅ p 2 + c 2 ⋅ p 2 ⋅ log( p ) c 1 + c 2 ⋅ p 2 c 1 ⋅ p + c 2 ⋅ p ⋅ log( p ) c 1 ⋅ p + c 2 ⋅ p 2 c 1 + c 2 ⋅ log( p ) c 1 ⋅ log( p ) + c 2 ⋅ p c 1 ⋅ p + c 2 ⋅ p 2 ⋅ log( p ) c 1 + c 2 ⋅ p ⋅ log( p ) c 1 ⋅ log( p ) + c 2 ⋅ p ⋅ log( p ) c 1 + c 2 ⋅ p 2 ⋅ log( p ) c 1 ⋅ log( p ) + c 2 ⋅ p 2 c 1 ⋅ p ⋅ log( p ) + c 2 ⋅ p 2 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 6

  7. Case study – HOMME Core of the Community Atmospheric Model (CAM) • Spectral element dynamical core on a cubed sphere grid Predictive error [%] Kernel [3 of 194] Model [s] t = f(p) p t = 130k 3.63 ⋅ 10 -6 p ⋅ p + 7.21 ⋅ 10 -13 p 3 Box_rearrange->MPI_Reduce 30.34 24.44+2.26 ⋅ 10 -7 p 2 Vlaplace_sphere_vk 4.28 49.09 0.83 Compute_and_apply_rhs P i ≤ 43k 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 7

  8. Case study – HOMME 10 8 MPI_Reduce vlaplace_sphere_wk 10 6 compute_and_apply_rhs 10 4 Time ( s ) 10 2 Prediction Training 1 0 . 01 2 10 2 12 2 14 2 16 2 18 2 20 2 22 Processes 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 8

  9. Multi-parameter performance modeling Process count Process count Execution time Problem size Floating point operations Hardware Bytes sent and configuration received Algorithm configuration 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 9

  10. Multi-parameter performance modeling Process count (p) Process count Execution time (t) Model: Problem size (n) t = f ( p ) ⋅ g ( n ) Floating point operations OR Hardware t = f ( p ) + g ( n ) Bytes sent and configuration received OR Algorithm … configuration 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 10

  11. Extended performance model normal form n m n ∈ j kl ( x l ) i kl ⋅ log 2 ∑ ∏ m ∈ f ( x 1 ,.., x m ) = c k x l i kl ∈ I j kl ∈ J k = 1 l = 1 I , J ⊂ 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 11

  12. Extended performance model normal form n m n ∈ j kl ( x l ) i kl ⋅ log 2 ∑ ∏ m ∈ f ( x 1 ,.., x m ) = c k x l i kl ∈ I j kl ∈ J k = 1 l = 1 I , J ⊂ Possible parameter interactions c 1 • Constant c 1 + c 2 ⋅ x 1 • Single parameter c 1 + c 2 ⋅ x 1 + c 3 ⋅ x 2 + c 4 ⋅ x 3 • Additive c 1 + c 2 ⋅ x 1 ⋅ x 2 ⋅ x 3 • Multiplicative c 1 + c 2 ⋅ x 1 ⋅ x 3 + c 3 ⋅ x 2 1 ⋅ x 2 ⋅ log 2 ( x 2 ) • Several options 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 12

  13. Requirements engineering Sweep3d Lulesh OpenFoam Milc Clover Leaf HOMME BLAST Re-learn Kripke 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 13

  14. Requirements engineering – a per-process view Memory capacity Memory bandwidth Computational performance Network bandwidth Network 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 14

  15. Requirements engineering – a per-process view Memory capacity Memory bandwidth Computational performance Network bandwidth Network 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 15

  16. Requirements engineering – a per-process view Memory capacity Memory bandwidth Computational performance Network bandwidth Network 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 16

  17. Application requirements Models represent per process effects p – Number of processes n – Problem size per process Lulesh Requirement Metric Model 10 5 ⋅ n ⋅ log( n ) ⋅ p 0.25 ⋅ log( p ) Computation #FLOPs 10 3 ⋅ n ⋅ p 0.25 ⋅ log( p ) Communication #Bytes sent & received 10 5 ⋅ n ⋅ log( n ) ⋅ log( p ) Memory access #Loads & stores 10 5 ⋅ n ⋅ log( n ) Memory footprint #Bytes used 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 17

  18. Co-design using performance models Lulesh Which is the best investement? 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 18

  19. Co-design using performance models 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 19

  20. Co-design using performance models Double the memory 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 20

  21. Co-design using performance models Double the processors 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 21

  22. Co-design using performance models Double the racks 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 22

  23. Co-design using performance models Double the racks p ' = 2 ⋅ p I # Processes m ' = m Memory per process 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 23

  24. Co-design using performance models Double the racks p ' = 2 ⋅ p I # Processes m ' = m Memory per process m ' = m = 10 5 ⋅ n ' ⋅ log( n ') II Memory requirement n ' = n Problem size per process 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 24

  25. Co-design using performance models Double the racks p ' = 2 ⋅ p I # Processes m ' = m Memory per process m ' = m = 10 5 ⋅ n ' ⋅ log( n ') II Memory requirement n ' = n Problem size per process n ' ⋅ p ' = 2 ⋅ n ⋅ p III Overall problem size 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 25

  26. Co-design using performance models Double the racks 10 5 ⋅ n ⋅ log( n ) ⋅ (2 p ) 0.25 ⋅ log(2 p ) IV # FLOPS 2 0.25 ⋅ (1 + 1/ log( p )) Ratio new to old 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 26

  27. Co-design using performance models Double the racks 10 5 ⋅ n ⋅ log( n ) ⋅ (2 p ) 0.25 ⋅ log(2 p ) IV # FLOPS 2 0.25 ⋅ (1 + 1/ log( p )) Ratio new to old 10 3 ⋅ n ⋅ (2 p ) 0.25 ⋅ log(2 p ) #Bytes sent & received V 2 0.25 ⋅ (1 + 1/ log( p )) Ratio new to old 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 27

  28. Visual representation of requirements Communication p ' = 2 ⋅ p m ' = m Computation Problem size p ' = p p ' = 2 ⋅ p Memory m ' = 2 ⋅ m m ' = m / 2 access 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 28

  29. Co-design using performance models Double the racks 10 5 ⋅ n ⋅ log( n ) ⋅ (2 p ) 0.25 ⋅ log(2 p ) IV # FLOPS 2 0.25 ⋅ (1 + 1/ log( p )) Ratio new to old 10 3 ⋅ n ⋅ (2 p ) 0.25 ⋅ log(2 p ) #Bytes sent & received V 2 0.25 ⋅ (1 + 1/ log( p )) Ratio new to old 10 5 ⋅ n ⋅ log( n ) ⋅ log(2 p ) #Loads & stores VI Ratio new to old 1 + 1/ log( p ) 23.10.17 | Department of Computer Science | Laboratory for Parallel Programming | Alexandru Calotoiu | 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend