a dsl for performance orchestration
play

A DSL for Performance Orchestration Thiago Teixeira David Padua - PowerPoint PPT Presentation

A DSL for Performance Orchestration Thiago Teixeira David Padua William Gropp Department of Computer Science Scalable Tools Workshop University of Illinois at Urbana-Champaign July, 2018 DOE/NNSA/ASC/PSAAPII: XPACC The Center for Exascale


  1. A DSL for Performance Orchestration Thiago Teixeira David Padua William Gropp Department of Computer Science Scalable Tools Workshop University of Illinois at Urbana-Champaign July, 2018 DOE/NNSA/ASC/PSAAPII: XPACC The Center for Exascale Simulation of Plasma-coupled Combustion

  2. XPACC ‣ The Center for Exascale Simulation of Plasma-Coupled Combution ‣ Developing a framework to leverage parallelism on exascale systems ‣ Comprises Aerospace, Chemistry, CS, ECE, Mechanical Eng ‣ Some of the tools being developed: ✦ Moya: Just-in-time recompilations ✦ Tangram: Compiler programming system for performance portability {Cisneros, Eckert} [Pantano/Adamovich] Physics ✦ AMPI: Model for coarse-grained s [Johnson/Freund] s C c B i m Low-D Physics-targeted / s (Nishihara) a Research c s {Mackay} {Ghale} n {Shields} (Munafo) i c Experiments [Elliott/Glumac] s y i y {Retter, Koll} t r D h e e n s P [Pantano] e a i [Panesi/Stephani] n e K m L w {Alberti, Taylor} ( MacArt ) c / a a Detailed Experimental a o l m f F d r u s k Diagnostics (Frederickson) S a a e d e l F o [Bodony/Freund] o r M m l e c overdecomposition for load balancing P E r n [Adamovich] r B {Gulko} r {Hagen} e [Freund/Panesi] o l Validation r u (Buchta, Popov) b r u (Movahed) Engineering T Integrated (Tang) Sensitivity/Adjoints UQ Application Models [Freund/Bodony] {Chung} (Popov, Massa) {Bay, Smith} ✦ PickPocket: Data relocation for {Wang} (Capecelatro, Bryngelson) [Bodony] Discretizations Plasma-PIC Methods (Petty) Overset Full Application [Olson/Freund] (Yeoh) PlasComCM/2 Experiment Prediction Meshes Multi-rate Integrators [Elliott/Glumac/Lee] {Fellows} (Kress) (Campbell) Plasma Mediated (Anderson) Overkit Leap ( Baccarella ) [Kloeckner/Bodony] {Mikida} {Lee} Ignition Threshold Interoperable (Smith) Solvers ( E -field) efficient computation CS Tools ICE [Gropp/Padua] [Gropp/Padua] {Reisner} [Olson] {Teixeira} {Teixeira} (Diener) Numerics s LANL-BoxMG Annotations Annotations u o ) s T e m I n (Spies) Tangram [Hwu] J n Research e s ( e UG Research o t i e g s s [Snir] y i m o y VectorSeeker [Padua] t l i Moya [Gropp] r S AMPI [Kale] a s i e o T (Larson) n ( Evans ) t - p e A n {Brooks} H m {Prabhu} i - t {Garcia} o s (Ibeid) c u Code/Tool e {White} J d r [Faculty] e v O (Sta ff ) ( Pending Sta fg ) CS Research {XPACC Student} DOE/NNSA/ASC/PSAAPII: XPACC � 2 The Center for Exascale Simulation of Plasma-coupled Combustion

  3. Performance Optimization ‣ Applications are often targeting multiple complex systems → Large optimization space ‣ Compilers deliver unsatisfactory performance (-O3 is not enough) No Optimization → No Performance ‣ Hard to maintain and manage optimizations as the code evolves and new features are added ‣ And, as optimizations are added it becomes hard to maintain the code Source Code DOE/NNSA/ASC/PSAAPII: XPACC � 3 The Center for Exascale Simulation of Plasma-coupled Combustion

  4. Performance Optimization on HPC ‣ Scientists make decisions based on maximizing scientific output, not application’s performance ‣ They also want to control the performance to the level needed, even sacrificing abstraction and ease of programming ‣ A new technology that can coexist with older ones has a greater chance of success (e.g. MPI) ‣ No complete buy-in at the beginning ✦ A barrier for new frameworks is that you can’t integrate them incrementally ‣ Risk-mitigation strategy is to let competing technologies coexist, but not always possible Source: Understanding the High-Performance Computing Community: A Software Engineer’s Perspective . Victor Basili et al DOE/NNSA/ASC/PSAAPII: XPACC � 4 The Center for Exascale Simulation of Plasma-coupled Combustion

  5. Performance Optimization ‣ How handle all the required optimizations together for many different scenarios? ‣ How to keep the code maintainable? ‣ How to find the best sequence of optimizations? Source Code DOE/NNSA/ASC/PSAAPII: XPACC 5 The Center for Exascale Simulation of Plasma-coupled Combustion

  6. Goal ‣ No complete buy-in ‣ Incremental adoption ‣ Coexistence with other tools ‣ Automatically finding the best sequence of optimizations and applying them without disrupting the original code is important to improve performance and keep the code maintainable DOE/NNSA/ASC/PSAAPII: XPACC � 6 The Center for Exascale Simulation of Plasma-coupled Combustion

  7. ICE ‣ Illinois Coding Environment ‣ Golden copy approach: baseline version without architecture- or compiler-specific optimizations (not buy-in) ‣ Search combined with application’s developer expertise ‣ Build-time, Compile-time and Runtime optimizations ‣ Non-prescriptive, Gradual adoption, Separation of Concerns ‣ Reuse of other optimizations tools already implemented ✦ Interfaces to simplify plug-in Golden Copy Optimization File ICE search and optimization tools Search Alternative Transformations Moya Tangram OpenMP PIPs DOE/NNSA/ASC/PSAAPII: XPACC � 7 The Center for Exascale Simulation of Plasma-coupled Combustion

  8. ICE ‣ Source code is annotated to define code regions ‣ Optimization file notation orchestrates the use of the optimization tools on the code regions defined ‣ Interface provides operations on the Source code to invoke optimizations through: ✦ Adding pragmas ✦ Adding labels ✦ Replacing code regions ‣ These operations are used by the interface to plug-in optimization tools ‣ Most tools are source-to-source ✦ tools must understand output of previous tools DOE/NNSA/ASC/PSAAPII: XPACC � 8 The Center for Exascale Simulation of Plasma-coupled Combustion

  9. FrontEnd Source Code Opt Language • Parser the original code (Fortran/C/C++) Parser • Extract Code Regions Parser RoseLoops BackEnd Pips Select a Code Gen OpenMP / OpenACC variant Clay Moya • Goes through the optimization space Evaluate • Machine Learning methods to select variants • Empirically evaluate variants Best Variant DOE/NNSA/ASC/PSAAPII: XPACC � 9 The Center for Exascale Simulation of Plasma-coupled Combustion

  10. Optimization Tools ‣ Pips (MINES ParisTech) ✦ Code optimization tool based on polyhedral framework ‣ Moya (Tarun Prabhu/UIUC) ✦ Rutime optimizations ‣ Clay (Joël Poudroux, Oleksandr Zinenko) ✦ Loop transformations using the polyhedral framework ‣ OpenMP ✦ Parallelization of code regions using pragmas ‣ RoseLoop ✦ Loop transformations based Rose compiler infrastructure ‣ Altdesc ✦ Replace of code regions (e.g. hand optimized ones) DOE/NNSA/ASC/PSAAPII: XPACC � 10 The Center for Exascale Simulation of Plasma-coupled Combustion

  11. Search Methods ‣ Optimizations Space: optimizations, parameters and software stack (compiler version, flags, libraries) ‣ It cannot be exhaustively traversed (gcc parameters has 10 806 configurations) ‣ Complex space that requires different search techniques ‣ OpenTuner (Jason Ansel et al) ✦ Meta technique is used to control the use of the other techniques (e.g round robin, random, auc bandit) ✦ Multi-armed bandit problem: deciding which, on which order, and how many times to pull levers on a slot machine with many arms with an unknown payout probability ‣ Spearmint (Jasper Snoek et al) ✦ Bayesian Optimization of Machine Learning Algorithms (NIPS’12) DOE/NNSA/ASC/PSAAPII: XPACC � 11 The Center for Exascale Simulation of Plasma-coupled Combustion

  12. Optimization Language ‣ Domain specific (easier and straightforward) ‣ Expose the optimization space ✦ What sequences of optimizations to evaluate? ✦ What are the best parameters? ‣ Control the use of the optimization tools ‣ Record the steps to efficient code ‣ It can be shared with others or go along with the deployment and installation DOE/NNSA/ASC/PSAAPII: XPACC 12 The Center for Exascale Simulation of Plasma-coupled Combustion

  13. ICE ‣ Annotations in Fortran • Block • Loop ! @ICE loop=l1 ! @ICE block=b1 DO i = 1, n … … ! @ICE endblock END DO ‣ Annotations in C/C++ • Block • Loop #pragma @ICE block=<id> #pragma @ICE loop=<id> … for(…) { #pragma @ICE endblock … } DOE/NNSA/ASC/PSAAPII: XPACC � 13 The Center for Exascale Simulation of Plasma-coupled Combustion

  14. ICE --- compilers: [gcc, icc] ‣ Optimization file (extended .yaml) # Built command before compilation prebuildcmd: • Direct # Compilation command before tests buildcmd: make clean all • Search #Command call for each test runcmd: time ./run.sh --- search: on # or off <preamble commands> memoryBound: &id01 - unroll: loop: 3 <block/loop id>: <ref> factor: 4 <commands>[*+?] - tile: loop: 8 ... factor: 1 example1: *id01 - runtime: • <commands>+ : 1 or more in the combinations example2: • <commands>* : 0 or more in the combinations - altdesc: ./opt2/*.opt • <commands>? : 0 or 1 sc2: - altdesc: ./opt2/*.opt ... DOE/NNSA/ASC/PSAAPII: XPACC � 14 The Center for Exascale Simulation of Plasma-coupled Combustion

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend