finding performance optimal configurations for high
play

Finding Performance-Optimal Configurations for High-Performance - PowerPoint PPT Presentation

Finding Performance-Optimal Configurations for High-Performance Computing Alexander Grebhahn, Norbert Siegmund, Sven Apel University of Passau FOSD Meeting 2014, Dagstuhl High-Performance Computing and ExaStencils Alexander Grebhahn Finding


  1. Finding Performance-Optimal Configurations for High-Performance Computing Alexander Grebhahn, Norbert Siegmund, Sven Apel University of Passau FOSD Meeting 2014, Dagstuhl

  2. High-Performance Computing and ExaStencils Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 2/16

  3. High-Performance Computing and ExaStencils Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 2/16

  4. High-Performance Computing and ExaStencils Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 2/16

  5. High-Performance Computing and ExaStencils Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 2/16

  6. High-Performance Computing and ExaStencils How to identify performance-optimal components and parameters for a specific hardware? Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 2/16

  7. SPL Conqueror [Siegmund et al., 2012] Partial feature Optimal Prediction selection configuration CUDA {Local Memory, CUDA, Local Memory Padding = 0, Pixels per Thread = 3} Objective function: max(performance) Advantages: � Detection of feature interactions � Transparent (i.e., influences of individual features and feature interactions explicitly modeled and quantified) Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 3/16

  8. Influence of Individual Features HIPAcc API Local Memory CUDA OpenCL Identification: = 800s Performance difference is interpreted = 500s as contribution of the feature in = -300s question Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 4/16

  9. Influence of Individual Features HIPAcc API Local Memory CUDA OpenCL Identification: = 800s Performance difference is interpreted = 500s as contribution of the feature in = -300s question Heuristics: Feature-wise (FW) heuristic: Quantifies the influence of individual features on performance Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 4/16

  10. Interactions Between Features = 800s = 800s = 500s = 400s = -300s = -400s  = 100s 350s Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 5/16

  11. Interactions Between Features = 800s = 800s = 500s = 400s = -300s = -400s = 350s Heuristics: � Pair-wise (PW) heuristic: interactions between two features � Higher-order (HO) heuristic: interactions between three or more features � Hot-spot (HS) heuristic: interactions of "hot-spot" features Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 5/16

  12. Numerical Parameters (Non-Boolean Features) Existing heuristics work for boolean features only! Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 6/16

  13. Numerical Parameters (Non-Boolean Features) Existing heuristics work for boolean features only! Discretization:  X System System [0,1, … ,n] ... 0 X0 X1 X2 Xn Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 6/16

  14. Numerical Parameters (Non-Boolean Features) Existing heuristics work for boolean features only! Discretization:  X System System [0,1, … ,n] ... 0 X0 X1 X2 Xn Disadvantages: � Increasing number of features � Loss of connection between parameter values Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 6/16

  15. Influence of Parameters � Determine influence of parameter Padding Pixels per Thread [0..6] [1,2,3,4,5,6,7] HIPAcc values on performance 3 4 API Local Memory � Learn function for each pair of CUDA OpenCL parameter and feature � Independent sampling of parameters Padding Pixels per Thread Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 7/16

  16. Influence of Parameters � Determine influence of parameter Padding Pixels per Thread [0..6] [1,2,3,4,5,6,7] HIPAcc values on performance 3 4 API Local Memory � Learn function for each pair of CUDA OpenCL parameter and feature � Independent sampling of parameters Padding Pixels per Thread Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 7/16

  17. Influence of Parameters � Determine influence of parameter Padding Pixels per Thread [0..6] [1,2,3,4,5,6,7] HIPAcc values on performance 3 4 API Local Memory � Learn function for each pair of CUDA OpenCL parameter and feature � Independent sampling of parameters Padding Heuristics: Pixels per Thread � Function learning (FL) heuristic Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 7/16

  18. First Results [Grebhahn et al., 2014] Research questions: � What is the prediction accuracy of the different heuristics? � Can we predict the performance-optimal configuration? Customizable programs: Highly Scalable Multi-Grid Solver (HSMGS) Multi-Grid Solver using DUNE (DUNE MGS) Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 8/16

  19. HSMGS pre-smoothing post-smoothing [0, … ,6] [0, … ,6] HSMGP 3 3 Number of Cores [64,256,1024,4096] coarse grid solver smoother 64 IP_CG RED_AMG IP_AMG Jac GS GSAC RBGS RBGSAC BS sum (pre-smoothing, post-smoothing) > 0 Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 9/16

  20. HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF FW PW HO HS FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

  21. HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW PW HO HS FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

  22. HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW 26 (0.8) 23.4 ± 18.7 19.0 3.8 40 PW HO HS FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

  23. HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW 26 (0.8) 23.4 ± 18.7 19.0 3.8 40 PW 274 (7.9) 4.8 ± 8.6 1.8 31.4 77 HO HS FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

  24. HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW 26 (0.8) 23.4 ± 18.7 19.0 3.8 40 PW 274 (7.9) 4.8 ± 8.6 1.8 31.4 77 HO 1 331 (38.5) 60.7 ± 67.2 41.5 270.0 312 HS FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

  25. HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW 26 (0.8) 23.4 ± 18.7 19.0 3.8 40 PW 274 (7.9) 4.8 ± 8.6 1.8 31.4 77 HO 1 331 (38.5) 60.7 ± 67.2 41.5 270.0 312 HS 2 902 (84.0) 8.0 ± 33.9 0 270.0 55 FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

  26. HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW 26 (0.8) 23.4 ± 18.7 19.0 3.8 40 PW 274 (7.9) 4.8 ± 8.6 1.8 31.4 77 HO 1 331 (38.5) 60.7 ± 67.2 41.5 270.0 312 HS 2 902 (84.0) 8.0 ± 33.9 0 270.0 55 FL 112 (3.2) 2.5 ± 3.1 1.8 0 1 Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

  27. HSMGS – Feature Interactions (Pair-Wise) GSRB GSACBE GSRBAC GSAC pre=0 GS pre=1 JAC pre=2 RED AMG pre=3 IP AMG pre=4 IP CG pre=5 numCores 4096 pre=6 numCores 1024 post=0 numCores 256 post=1 numCores 64 post=2 post=6 post=3 post=5 post=4 Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 11/16

  28. HIPA cc , DUNE MGS – Results Heu. # M (in %) δ [%] rank e ± s � x ¯ BF HIPA cc HO HS FL BF DUNE MGS HO HS FL Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 12/16

  29. HIPA cc , DUNE MGS – Results Heu. # M (in %) δ [%] rank e ± s � x ¯ BF 13 485 (100) 0 0 0 1 HIPA cc HO HS FL BF DUNE MGS HO HS FL Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 12/16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend