Finding Performance-Optimal Configurations for High-Performance - PowerPoint PPT Presentation

Finding Performance-Optimal Configurations for High-Performance Computing Alexander Grebhahn, Norbert Siegmund, Sven Apel University of Passau FOSD Meeting 2014, Dagstuhl

High-Performance Computing and ExaStencils Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 2/16

High-Performance Computing and ExaStencils How to identify performance-optimal components and parameters for a specific hardware? Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 2/16

SPL Conqueror [Siegmund et al., 2012] Partial feature Optimal Prediction selection configuration CUDA {Local Memory, CUDA, Local Memory Padding = 0, Pixels per Thread = 3} Objective function: max(performance) Advantages: � Detection of feature interactions � Transparent (i.e., influences of individual features and feature interactions explicitly modeled and quantified) Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 3/16

Influence of Individual Features HIPAcc API Local Memory CUDA OpenCL Identification: = 800s Performance difference is interpreted = 500s as contribution of the feature in = -300s question Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 4/16

Influence of Individual Features HIPAcc API Local Memory CUDA OpenCL Identification: = 800s Performance difference is interpreted = 500s as contribution of the feature in = -300s question Heuristics: Feature-wise (FW) heuristic: Quantifies the influence of individual features on performance Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 4/16

Interactions Between Features = 800s = 800s = 500s = 400s = -300s = -400s  = 100s 350s Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 5/16

Interactions Between Features = 800s = 800s = 500s = 400s = -300s = -400s = 350s Heuristics: � Pair-wise (PW) heuristic: interactions between two features � Higher-order (HO) heuristic: interactions between three or more features � Hot-spot (HS) heuristic: interactions of "hot-spot" features Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 5/16

Numerical Parameters (Non-Boolean Features) Existing heuristics work for boolean features only! Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 6/16

Numerical Parameters (Non-Boolean Features) Existing heuristics work for boolean features only! Discretization:  X System System [0,1, … ,n] ... 0 X0 X1 X2 Xn Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 6/16

Numerical Parameters (Non-Boolean Features) Existing heuristics work for boolean features only! Discretization:  X System System [0,1, … ,n] ... 0 X0 X1 X2 Xn Disadvantages: � Increasing number of features � Loss of connection between parameter values Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 6/16

Influence of Parameters � Determine influence of parameter Padding Pixels per Thread [0..6] [1,2,3,4,5,6,7] HIPAcc values on performance 3 4 API Local Memory � Learn function for each pair of CUDA OpenCL parameter and feature � Independent sampling of parameters Padding Pixels per Thread Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 7/16

Influence of Parameters � Determine influence of parameter Padding Pixels per Thread [0..6] [1,2,3,4,5,6,7] HIPAcc values on performance 3 4 API Local Memory � Learn function for each pair of CUDA OpenCL parameter and feature � Independent sampling of parameters Padding Heuristics: Pixels per Thread � Function learning (FL) heuristic Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 7/16

First Results [Grebhahn et al., 2014] Research questions: � What is the prediction accuracy of the different heuristics? � Can we predict the performance-optimal configuration? Customizable programs: Highly Scalable Multi-Grid Solver (HSMGS) Multi-Grid Solver using DUNE (DUNE MGS) Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 8/16

HSMGS pre-smoothing post-smoothing [0, … ,6] [0, … ,6] HSMGP 3 3 Number of Cores [64,256,1024,4096] coarse grid solver smoother 64 IP_CG RED_AMG IP_AMG Jac GS GSAC RBGS RBGSAC BS sum (pre-smoothing, post-smoothing) > 0 Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 9/16

HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF FW PW HO HS FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW PW HO HS FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW 26 (0.8) 23.4 ± 18.7 19.0 3.8 40 PW HO HS FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW 26 (0.8) 23.4 ± 18.7 19.0 3.8 40 PW 274 (7.9) 4.8 ± 8.6 1.8 31.4 77 HO HS FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW 26 (0.8) 23.4 ± 18.7 19.0 3.8 40 PW 274 (7.9) 4.8 ± 8.6 1.8 31.4 77 HO 1 331 (38.5) 60.7 ± 67.2 41.5 270.0 312 HS FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW 26 (0.8) 23.4 ± 18.7 19.0 3.8 40 PW 274 (7.9) 4.8 ± 8.6 1.8 31.4 77 HO 1 331 (38.5) 60.7 ± 67.2 41.5 270.0 312 HS 2 902 (84.0) 8.0 ± 33.9 0 270.0 55 FL Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

HSMGS – Results Heu. # M (in %) e ± s � x δ [%] rank ¯ BF 3 456 (100) 0 0 0 1 FW 26 (0.8) 23.4 ± 18.7 19.0 3.8 40 PW 274 (7.9) 4.8 ± 8.6 1.8 31.4 77 HO 1 331 (38.5) 60.7 ± 67.2 41.5 270.0 312 HS 2 902 (84.0) 8.0 ± 33.9 0 270.0 55 FL 112 (3.2) 2.5 ± 3.1 1.8 0 1 Table: BF: brute force, FW: feature-wise, PW: pair-wise, HO: higher-order, HS: hot-spot, FL: function learning Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 10/16

HSMGS – Feature Interactions (Pair-Wise) GSRB GSACBE GSRBAC GSAC pre=0 GS pre=1 JAC pre=2 RED AMG pre=3 IP AMG pre=4 IP CG pre=5 numCores 4096 pre=6 numCores 1024 post=0 numCores 256 post=1 numCores 64 post=2 post=6 post=3 post=5 post=4 Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 11/16

HIPA cc , DUNE MGS – Results Heu. # M (in %) δ [%] rank e ± s � x ¯ BF HIPA cc HO HS FL BF DUNE MGS HO HS FL Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 12/16

HIPA cc , DUNE MGS – Results Heu. # M (in %) δ [%] rank e ± s � x ¯ BF 13 485 (100) 0 0 0 1 HIPA cc HO HS FL BF DUNE MGS HO HS FL Alexander Grebhahn Finding Performance-Optimal Configurations for High-Performance Computing 12/16

Finding Performance-Optimal Configurations for High-Performance - PowerPoint PPT Presentation

Finding Performance-Optimal Configurations for High-Performance Computing Alexander Grebhahn, Norbert Siegmund, Sven Apel University of Passau FOSD Meeting 2014, Dagstuhl High-Performance Computing and ExaStencils Alexander Grebhahn Finding

Comparison Comparison of the proposed configurations of the proposed configurations for the

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Finding Optimal Mixed Finding Optimal Mixed Strategies to Commit to in g Security Games

Configurations in Lattices & Multiple Mixing Alex Gorodnik (University of Bristol) joint work

Configurations of Extremal Even Unimodular Lattices Scott D. Kominers Harvard Mathematics

Monte Carlo simulations of the 2D Ising model Stochastic sampling of spin configurations to

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Statically Inferring Performance Properties of Software Configurations Chi Li , Shu Wang, Henry

STATUS COUNT FINDING APPROVED 5 FINDING CONDITIONAL 16 FINDING DENIED 11

Tree Pr ee Proximity ximity Finding the good and bad of trees. joe@buildfax.com Tree

Seam Carving for Content-Aware Finding Optimal Seams Image Resizing Optimal Seams Order

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Constructive Matrix Theory for Higher Order Interaction Vasily Sazonov LPT Orsay, University of

More Than Two Factors Designing experiments with two factors extends easily to experiments with

Interaction Effects: Helpful or Harmful? Ben Lengerich CMU AI Seminar Feb 18, 2020 1 Today 1.

Normal and Unimodular Hierarchical Models Daniel Irving Bernstein and Seth Sullivant North

RG Methods for Nuclei and Neutron Stars Achim Schwenk University of Washington / TRIUMF (2006-)

A high-order unstaggered constrained transport method for the 3D ideal magnetohydrodynamic

The Role of Higher-order Models in Robotics and its Reasoning Challenges Herman Bruyninckx,

Measurement of elliptic and higher-order harmonics at 2.76 TeV Pb+Pb collisions with the ATLAS

Finding Performance-Optimal Configurations for High-Performance - PowerPoint PPT Presentation

Finding Performance-Optimal Configurations for High-Performance Computing Alexander Grebhahn, Norbert Siegmund, Sven Apel University of Passau FOSD Meeting 2014, Dagstuhl High-Performance Computing and ExaStencils Alexander Grebhahn Finding

Comparison Comparison of the proposed configurations of the proposed configurations for the

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Finding Optimal Mixed Finding Optimal Mixed Strategies to Commit to in g Security Games

Configurations in Lattices &amp; Multiple Mixing Alex Gorodnik (University of Bristol) joint work

Configurations of Extremal Even Unimodular Lattices Scott D. Kominers Harvard Mathematics

Monte Carlo simulations of the 2D Ising model Stochastic sampling of spin configurations to

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Statically Inferring Performance Properties of Software Configurations Chi Li , Shu Wang, Henry

STATUS COUNT FINDING APPROVED 5 FINDING CONDITIONAL 16 FINDING DENIED 11

Tree Pr ee Proximity ximity Finding the good and bad of trees. joe@buildfax.com Tree

Seam Carving for Content-Aware Finding Optimal Seams Image Resizing Optimal Seams Order

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Constructive Matrix Theory for Higher Order Interaction Vasily Sazonov LPT Orsay, University of

More Than Two Factors Designing experiments with two factors extends easily to experiments with

Interaction Effects: Helpful or Harmful? Ben Lengerich CMU AI Seminar Feb 18, 2020 1 Today 1.

Normal and Unimodular Hierarchical Models Daniel Irving Bernstein and Seth Sullivant North

RG Methods for Nuclei and Neutron Stars Achim Schwenk University of Washington / TRIUMF (2006-)

A high-order unstaggered constrained transport method for the 3D ideal magnetohydrodynamic

The Role of Higher-order Models in Robotics and its Reasoning Challenges Herman Bruyninckx,

Measurement of elliptic and higher-order harmonics at 2.76 TeV Pb+Pb collisions with the ATLAS

Configurations in Lattices & Multiple Mixing Alex Gorodnik (University of Bristol) joint work