A DSL for Performance Orchestration Thiago Teixeira David Padua - PowerPoint PPT Presentation

A DSL for Performance Orchestration Thiago Teixeira David Padua William Gropp Department of Computer Science Scalable Tools Workshop University of Illinois at Urbana-Champaign July, 2018 DOE/NNSA/ASC/PSAAPII: XPACC The Center for Exascale Simulation of Plasma-coupled Combustion

XPACC ‣ The Center for Exascale Simulation of Plasma-Coupled Combution ‣ Developing a framework to leverage parallelism on exascale systems ‣ Comprises Aerospace, Chemistry, CS, ECE, Mechanical Eng ‣ Some of the tools being developed: ✦ Moya: Just-in-time recompilations ✦ Tangram: Compiler programming system for performance portability {Cisneros, Eckert} [Pantano/Adamovich] Physics ✦ AMPI: Model for coarse-grained s [Johnson/Freund] s C c B i m Low-D Physics-targeted / s (Nishihara) a Research c s {Mackay} {Ghale} n {Shields} (Munafo) i c Experiments [Elliott/Glumac] s y i y {Retter, Koll} t r D h e e n s P [Pantano] e a i [Panesi/Stephani] n e K m L w {Alberti, Taylor} ( MacArt ) c / a a Detailed Experimental a o l m f F d r u s k Diagnostics (Frederickson) S a a e d e l F o [Bodony/Freund] o r M m l e c overdecomposition for load balancing P E r n [Adamovich] r B {Gulko} r {Hagen} e [Freund/Panesi] o l Validation r u (Buchta, Popov) b r u (Movahed) Engineering T Integrated (Tang) Sensitivity/Adjoints UQ Application Models [Freund/Bodony] {Chung} (Popov, Massa) {Bay, Smith} ✦ PickPocket: Data relocation for {Wang} (Capecelatro, Bryngelson) [Bodony] Discretizations Plasma-PIC Methods (Petty) Overset Full Application [Olson/Freund] (Yeoh) PlasComCM/2 Experiment Prediction Meshes Multi-rate Integrators [Elliott/Glumac/Lee] {Fellows} (Kress) (Campbell) Plasma Mediated (Anderson) Overkit Leap ( Baccarella ) [Kloeckner/Bodony] {Mikida} {Lee} Ignition Threshold Interoperable (Smith) Solvers ( E -field) efficient computation CS Tools ICE [Gropp/Padua] [Gropp/Padua] {Reisner} [Olson] {Teixeira} {Teixeira} (Diener) Numerics s LANL-BoxMG Annotations Annotations u o ) s T e m I n (Spies) Tangram [Hwu] J n Research e s ( e UG Research o t i e g s s [Snir] y i m o y VectorSeeker [Padua] t l i Moya [Gropp] r S AMPI [Kale] a s i e o T (Larson) n ( Evans ) t - p e A n {Brooks} H m {Prabhu} i - t {Garcia} o s (Ibeid) c u Code/Tool e {White} J d r [Faculty] e v O (Sta ff ) ( Pending Sta fg ) CS Research {XPACC Student} DOE/NNSA/ASC/PSAAPII: XPACC � 2 The Center for Exascale Simulation of Plasma-coupled Combustion

Performance Optimization ‣ Applications are often targeting multiple complex systems → Large optimization space ‣ Compilers deliver unsatisfactory performance (-O3 is not enough) No Optimization → No Performance ‣ Hard to maintain and manage optimizations as the code evolves and new features are added ‣ And, as optimizations are added it becomes hard to maintain the code Source Code DOE/NNSA/ASC/PSAAPII: XPACC � 3 The Center for Exascale Simulation of Plasma-coupled Combustion

Performance Optimization on HPC ‣ Scientists make decisions based on maximizing scientific output, not application’s performance ‣ They also want to control the performance to the level needed, even sacrificing abstraction and ease of programming ‣ A new technology that can coexist with older ones has a greater chance of success (e.g. MPI) ‣ No complete buy-in at the beginning ✦ A barrier for new frameworks is that you can’t integrate them incrementally ‣ Risk-mitigation strategy is to let competing technologies coexist, but not always possible Source: Understanding the High-Performance Computing Community: A Software Engineer’s Perspective . Victor Basili et al DOE/NNSA/ASC/PSAAPII: XPACC � 4 The Center for Exascale Simulation of Plasma-coupled Combustion

Performance Optimization ‣ How handle all the required optimizations together for many different scenarios? ‣ How to keep the code maintainable? ‣ How to find the best sequence of optimizations? Source Code DOE/NNSA/ASC/PSAAPII: XPACC 5 The Center for Exascale Simulation of Plasma-coupled Combustion

Goal ‣ No complete buy-in ‣ Incremental adoption ‣ Coexistence with other tools ‣ Automatically finding the best sequence of optimizations and applying them without disrupting the original code is important to improve performance and keep the code maintainable DOE/NNSA/ASC/PSAAPII: XPACC � 6 The Center for Exascale Simulation of Plasma-coupled Combustion

ICE ‣ Illinois Coding Environment ‣ Golden copy approach: baseline version without architecture- or compiler-specific optimizations (not buy-in) ‣ Search combined with application’s developer expertise ‣ Build-time, Compile-time and Runtime optimizations ‣ Non-prescriptive, Gradual adoption, Separation of Concerns ‣ Reuse of other optimizations tools already implemented ✦ Interfaces to simplify plug-in Golden Copy Optimization File ICE search and optimization tools Search Alternative Transformations Moya Tangram OpenMP PIPs DOE/NNSA/ASC/PSAAPII: XPACC � 7 The Center for Exascale Simulation of Plasma-coupled Combustion

ICE ‣ Source code is annotated to define code regions ‣ Optimization file notation orchestrates the use of the optimization tools on the code regions defined ‣ Interface provides operations on the Source code to invoke optimizations through: ✦ Adding pragmas ✦ Adding labels ✦ Replacing code regions ‣ These operations are used by the interface to plug-in optimization tools ‣ Most tools are source-to-source ✦ tools must understand output of previous tools DOE/NNSA/ASC/PSAAPII: XPACC � 8 The Center for Exascale Simulation of Plasma-coupled Combustion

FrontEnd Source Code Opt Language • Parser the original code (Fortran/C/C++) Parser • Extract Code Regions Parser RoseLoops BackEnd Pips Select a Code Gen OpenMP / OpenACC variant Clay Moya • Goes through the optimization space Evaluate • Machine Learning methods to select variants • Empirically evaluate variants Best Variant DOE/NNSA/ASC/PSAAPII: XPACC � 9 The Center for Exascale Simulation of Plasma-coupled Combustion

Optimization Tools ‣ Pips (MINES ParisTech) ✦ Code optimization tool based on polyhedral framework ‣ Moya (Tarun Prabhu/UIUC) ✦ Rutime optimizations ‣ Clay (Joël Poudroux, Oleksandr Zinenko) ✦ Loop transformations using the polyhedral framework ‣ OpenMP ✦ Parallelization of code regions using pragmas ‣ RoseLoop ✦ Loop transformations based Rose compiler infrastructure ‣ Altdesc ✦ Replace of code regions (e.g. hand optimized ones) DOE/NNSA/ASC/PSAAPII: XPACC � 10 The Center for Exascale Simulation of Plasma-coupled Combustion

Search Methods ‣ Optimizations Space: optimizations, parameters and software stack (compiler version, flags, libraries) ‣ It cannot be exhaustively traversed (gcc parameters has 10 806 configurations) ‣ Complex space that requires different search techniques ‣ OpenTuner (Jason Ansel et al) ✦ Meta technique is used to control the use of the other techniques (e.g round robin, random, auc bandit) ✦ Multi-armed bandit problem: deciding which, on which order, and how many times to pull levers on a slot machine with many arms with an unknown payout probability ‣ Spearmint (Jasper Snoek et al) ✦ Bayesian Optimization of Machine Learning Algorithms (NIPS’12) DOE/NNSA/ASC/PSAAPII: XPACC � 11 The Center for Exascale Simulation of Plasma-coupled Combustion

Optimization Language ‣ Domain specific (easier and straightforward) ‣ Expose the optimization space ✦ What sequences of optimizations to evaluate? ✦ What are the best parameters? ‣ Control the use of the optimization tools ‣ Record the steps to efficient code ‣ It can be shared with others or go along with the deployment and installation DOE/NNSA/ASC/PSAAPII: XPACC 12 The Center for Exascale Simulation of Plasma-coupled Combustion

ICE ‣ Annotations in Fortran • Block • Loop ! @ICE loop=l1 ! @ICE block=b1 DO i = 1, n … … ! @ICE endblock END DO ‣ Annotations in C/C++ • Block • Loop #pragma @ICE block=<id> #pragma @ICE loop=<id> … for(…) { #pragma @ICE endblock … } DOE/NNSA/ASC/PSAAPII: XPACC � 13 The Center for Exascale Simulation of Plasma-coupled Combustion

ICE --- compilers: [gcc, icc] ‣ Optimization file (extended .yaml) # Built command before compilation prebuildcmd: • Direct # Compilation command before tests buildcmd: make clean all • Search #Command call for each test runcmd: time ./run.sh --- search: on # or off <preamble commands> memoryBound: &id01 - unroll: loop: 3 <block/loop id>: <ref> factor: 4 <commands>[*+?] - tile: loop: 8 ... factor: 1 example1: *id01 - runtime: • <commands>+ : 1 or more in the combinations example2: • <commands>* : 0 or more in the combinations - altdesc: ./opt2/*.opt • <commands>? : 0 or 1 sc2: - altdesc: ./opt2/*.opt ... DOE/NNSA/ASC/PSAAPII: XPACC � 14 The Center for Exascale Simulation of Plasma-coupled Combustion

A DSL for Performance Orchestration Thiago Teixeira David Padua - PowerPoint PPT Presentation

A DSL for Performance Orchestration Thiago Teixeira David Padua William Gropp Department of Computer Science Scalable Tools Workshop University of Illinois at Urbana-Champaign July, 2018 DOE/NNSA/ASC/PSAAPII: XPACC The Center for Exascale

AI Driven Orchestration, Challenges & Opportunities Openstack Summit 2018 Sana Tariq (Ph.D.)

Smart Space Orchestration Orchestration The Internet of Things Cyber-Physical Systems Pervasive

Orchestration in Docker Swarm mode, Docker services and declarative application deployment Mike

DSL with pyrser Author: L. Auroux lionel@lse.epita.fr For pyParis 2018 lionel@lse.epita.fr For

100% JDclare Language Workbench Software Factories DSL Workbenches - PMW DSL Workbenches -

Distributed Smart Space Orchestration System 2pace Marc-Oliver Pahl Distributed Smart

Swing Orchestration: Structural and sectional devices in big band swing Splanky, Count Basie

Perl in Scheme: A DSL Abram Hindle Kitchener/Waterloo Perl Mongers Canada http://kw.pm.org/ {

DSL CPE Module A unique solution for enabling board functional test of existing & emerging

Using Aspects for Language Portability Lennart Kats Eelco Visser DSLs Stratego SDF Spoofax

DSL vs. Library API Shootout Rich Unger Salesforce.com Jaroslav Tulach Oracle Agenda What do

Twitter: @pandamonial www.pandamonial.com Objectives DSL/LOP Background Internal/External

(DSL) ETI 2506 TELECOMMUNICATION SYSTEMS Monday, 10 October 2016 1 COURSE OUTLINE (5) 2

DSL Design DSL Design Jumps/GOTOs Control flow in DSLs A jump transfers control to a

Control Theory In Container Orchestration Vallery Lancey Lead DevOps Engineer, Checkfront

Process Orchestration Sukriti Goel, Jyoti M. Bhat BPM Research Group Software Engineering and

Safety and Pharmacokinetics of Dapivirine Ring Use during Lactation Lisa Noguchi, CNM, PhD on

60 YEARS 0. Old tales 1. Ion dynamics 2. Electron dynamics 3. Perspectives U. Tokyo, STP

Plasma potential evolution study by HIBP diagnostic during NBI experiments in TJ-II stellarator

Solving the high-dimensional Vlasov equation with deal.II and hyper.deal Eighth deal.II Users and

Plasma Sneaks Into Your Pocket... Plasma for Phones... Artur Duque de Souza and Alexis Menard

Coherent Radio emission, PSG model and Drifting subpulses George Melikidze In collaboration

Analysis and Modeling of Mid-Latitude Decameter- Scale Plasma Wave Irregularities Utilizing GPS

Magnetic fields generated by the Weibel Instability C. M. Ryu POSTECH, KOREA FFP14 Marseille

Sambuz

Useful Links

Newsletter

Mail Us

A DSL for Performance Orchestration Thiago Teixeira David Padua - PowerPoint PPT Presentation

A DSL for Performance Orchestration Thiago Teixeira David Padua William Gropp Department of Computer Science Scalable Tools Workshop University of Illinois at Urbana-Champaign July, 2018 DOE/NNSA/ASC/PSAAPII: XPACC The Center for Exascale

AI Driven Orchestration, Challenges &amp; Opportunities Openstack Summit 2018 Sana Tariq (Ph.D.)

Smart Space Orchestration Orchestration The Internet of Things Cyber-Physical Systems Pervasive

Orchestration in Docker Swarm mode, Docker services and declarative application deployment Mike

DSL with pyrser Author: L. Auroux lionel@lse.epita.fr For pyParis 2018 lionel@lse.epita.fr For

100% JDclare Language Workbench Software Factories DSL Workbenches - PMW DSL Workbenches -

Distributed Smart Space Orchestration System 2pace Marc-Oliver Pahl Distributed Smart

Swing Orchestration: Structural and sectional devices in big band swing Splanky, Count Basie

Perl in Scheme: A DSL Abram Hindle Kitchener/Waterloo Perl Mongers Canada http://kw.pm.org/ {

DSL CPE Module A unique solution for enabling board functional test of existing &amp; emerging

Using Aspects for Language Portability Lennart Kats Eelco Visser DSLs Stratego SDF Spoofax

DSL vs. Library API Shootout Rich Unger Salesforce.com Jaroslav Tulach Oracle Agenda What do

Twitter: @pandamonial www.pandamonial.com Objectives DSL/LOP Background Internal/External

(DSL) ETI 2506 TELECOMMUNICATION SYSTEMS Monday, 10 October 2016 1 COURSE OUTLINE (5) 2

DSL Design DSL Design Jumps/GOTOs Control flow in DSLs A jump transfers control to a

Control Theory In Container Orchestration Vallery Lancey Lead DevOps Engineer, Checkfront

Process Orchestration Sukriti Goel, Jyoti M. Bhat BPM Research Group Software Engineering and

Safety and Pharmacokinetics of Dapivirine Ring Use during Lactation Lisa Noguchi, CNM, PhD on

60 YEARS 0. Old tales 1. Ion dynamics 2. Electron dynamics 3. Perspectives U. Tokyo, STP

Plasma potential evolution study by HIBP diagnostic during NBI experiments in TJ-II stellarator

Solving the high-dimensional Vlasov equation with deal.II and hyper.deal Eighth deal.II Users and

Plasma Sneaks Into Your Pocket... Plasma for Phones... Artur Duque de Souza and Alexis Menard

Coherent Radio emission, PSG model and Drifting subpulses George Melikidze In collaboration

Analysis and Modeling of Mid-Latitude Decameter- Scale Plasma Wave Irregularities Utilizing GPS

Magnetic fields generated by the Weibel Instability C. M. Ryu POSTECH, KOREA FFP14 Marseille

Sambuz

Useful Links

Newsletter

Mail Us

AI Driven Orchestration, Challenges & Opportunities Openstack Summit 2018 Sana Tariq (Ph.D.)

DSL CPE Module A unique solution for enabling board functional test of existing & emerging