SWIRL++ :Evaluating Performance Models System Overview to Guide - PowerPoint PPT Presentation

SWIRL++ T. Rusira et al LCPC’19 Convolution SWIRL++ :Evaluating Performance Models System Overview to Guide Code Transformation in Model Guided Optimization Convolutional Neural Networks Code Variants Memory Cost Space Pruning Heuristics Unrolling Tharindu Rusira 1 Anand Venkat 2 Raj Barik 3 Evaluation Performance Mary Hall 1 Empirical Stability Conclusion References 1. University of Utah 2. Intel Labs 3. Uber Technologies Inc. Appendices SWIRL experiments LCPC’19 SuRF 22 Oct 2019 1 / 29

SWIRL++ Overview T. Rusira et al LCPC’19 Convolution Convolution System Overview System Overview Model Guided Optimization Model Guided Optimization Code Variants Code Variants Memory Cost Memory Cost Space Pruning Heuristics Space Pruning Heuristics Unrolling Evaluation Unrolling Performance Empirical Stability Evaluation Conclusion Performance References Empirical Stability Appendices SWIRL experiments Conclusion SuRF Appendices SWIRL experiments SuRF 2 / 29

SWIRL++ 2D-Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF https: //machinelearninguru.com/computer_vision/basics/convolution/convolution_layer.html 3 / 29

SWIRL++ 2D-Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 4 / 29

SWIRL 4 SWIRL++ Provides compiler optimizations for Latte.py 2 through T. Rusira et al LCPC’19 transformation recipes . Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 2. https://github.com/IntelLabs/Latte.py 4.Venkat, A. et al., SWIRL:High-performance many-core CPU code generation for deep neural networks. IJHPCA, 33(6) 7 / 29

SWIRL++ Transformation Recipes T. Rusira et al LCPC’19 Convolution I ∈ R NC B HWC , W ∈ R K B C B PQKC , O ∈ R NK B PQK System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 8 / 29

SWIRL++ SWIRL’s output T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 9 / 29

SWIRL++ SWIRL++ T. Rusira et al Manual exploration of optimizing transformations does not LCPC’19 always guarantee high performance. SWIRL++ is a step Convolution towards automation. System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 10 / 29

SWIRL++ Code Variants T. Rusira et al LCPC’19 Convolution Given a loop order L , apply varying T L , P L , V L to generate System Overview convolution variants with constraints H L . Return best k Model Guided variants that minimize the cost . Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 11 / 29

SWIRL++ Memory cost of Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 12 / 29

SWIRL++ Space Pruning Heuristics T. Rusira et al LCPC’19 Convolution System Overview H L defines a set of heuristics to restrict the search space. Model Guided Optimization ◮ Data layouts and candidate loop orders are selected a Code Variants Memory Cost Space Pruning priori Heuristics Unrolling ◮ Two outermost loops are parallelized with omp Evaluation parallel for collapse(2) Performance Empirical Stability ◮ Feature map dimensions tiled by SIMDWIDTH and Conclusion inner loop vectorized References Appendices ◮ K, C, P, Q dimensions are candidates for tiling SWIRL experiments SuRF 19 / 29

SWIRL++ Loop Unrolling T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants ◮ Unroll candidate loops after tile factors are determined Memory Cost Space Pruning Heuristics ◮ Unroll factors are derived to fully utilize the register file Unrolling ◮ if P, Q are candidates for unrolling, corresponding tile Evaluation Performance factors p, q are determined such that p × q ≥ REGS to Empirical Stability Conclusion fully hide FMA latency References Appendices SWIRL experiments SuRF 20 / 29

SWIRL++ Performance Results T. Rusira et al Platform: dual socket Intel Xeon Platinum 8280 LCPC’19 CascadeLake, 2x28 2.7 GHZ (max 4.0 GHz) cores with 192 Convolution GB memory, 32KB L1, 1MB L2, 38.5 MB L3 cache, and 32 System Overview 512-bit vector registers. (icc 18.0.1) Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 21 / 29

SWIRL++ Empirical Stability S earch u sing R andom F orest, (SuRF) 1;3 based autotuner is T. Rusira et al LCPC’19 integrated to traverse the same search space. (a)-(f) VGG Convolution (g)-(i) Overfeat (j)-(l) Inception System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 1. Balaprakash, P. et al. Autotuning in High-Performance Computing Applications.IEEE106.11(’18) 3. Nelson, T. et al. Generating efficient tensor contractions for gpus. ICPP’15. 22 / 29

SWIRL++ SWIRL++ Top-1 Vs. Top-k T. Rusira et al LCPC’19 Convolution Relative speedups of the best among Top- k variants with System Overview respect to Top- 1 Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 23 / 29

SWIRL++ :Evaluating Performance Models System Overview to Guide - PowerPoint PPT Presentation

SWIRL++ T. Rusira et al LCPC19 Convolution SWIRL++ :Evaluating Performance Models System Overview to Guide Code Transformation in Model Guided Optimization Convolutional Neural Networks Code Variants Memory Cost Space Pruning

SWIRL IN THE CLOUD WEBINARS, VIRTUAL TASTINGS & DIGITAL COMMUNITY FOR WINE APRIL 2020 Our

Practice Rose Swirl Classic Swirl http://www.whisktogether.com/2012/01/3

Task 2.3 Experimental study of turbulent swirl combustor Nitin Babu George Acknowledgement :

Welcome NEW Transfer Students! Achieving your goals and avoiding the swirl Presented by:

Development of 0.1MW- class Oxy- PC Swirl Burner with Primary O 2 Direct- injection Korea

Outline Evaluating Models of Natural Image Patches Evaluating Models Comparing Whitening

Evaluating Investments CBA or CGE? Evaluating Investments CBA or CGE? Peter Forsyth Peter

Measuring and Evaluating Computer System Performance CSE 141, S2'06 Jeff Brown Performance

Measuring Performance November 17, 2008 Measuring Performance Introduction CPU Peformance and

Evaluating the Prediction Accuracy of Generated Performance Models in Up- and Downscaling

Disclosures Disclosures No personal conflicts of interest. Pain Swelling Research

Evaluating the Expansion of Oregons Indoor Clean Air Act Shaun Parkman Outline 1. Define the

Evaluating the Productivity of a Evaluating the Productivity of a Multicore Architecture

Evaluating Heat Treatment Evaluating Heat Treatment Effectiveness Effectiveness Bh. .

Evaluating Temperature Data Evaluating Temperature Data Bh. . Subramanyam Subramanyam ( (Subi

Evaluating learners intercultural experiences intercultural experiences Evaluating

Science with synthetic stellar surveys Robyn Sanderson Caltech UPenn/Flatiron CCA OMG

LATTE Study Oral Cabotegravir + Rilpivirine versus Efavirenz + 2 NRTIs LATTE Study: Design

Advantages and disadvantages of current reference and digital objects linking models in

Security Interest Group Who Who Am I? Am I? Jason Donenfeld, also known as zx2c4 .

Fun stuff in R via Miles McBain rayshader. Tyler Morgan-Wall. (package, twitter chatter, talk)

Latte: Improving the Latency of Transiently Consistent Network Update Schedules Mark Glavind,

WALL-E Yini Wang; Ran Mo Development Team Yini Wang: Photomultiplier Tube & Machine Learning

Grid Graphs, Gorenstein Polytopes, and Domino Stackings Matthias Beck (San Francisco State)

SWIRL++ :Evaluating Performance Models System Overview to Guide - PowerPoint PPT Presentation

SWIRL++ T. Rusira et al LCPC19 Convolution SWIRL++ :Evaluating Performance Models System Overview to Guide Code Transformation in Model Guided Optimization Convolutional Neural Networks Code Variants Memory Cost Space Pruning

SWIRL IN THE CLOUD WEBINARS, VIRTUAL TASTINGS &amp; DIGITAL COMMUNITY FOR WINE APRIL 2020 Our

Practice Rose Swirl Classic Swirl http://www.whisktogether.com/2012/01/3

Task 2.3 Experimental study of turbulent swirl combustor Nitin Babu George Acknowledgement :

Welcome NEW Transfer Students! Achieving your goals and avoiding the swirl Presented by:

Development of 0.1MW- class Oxy- PC Swirl Burner with Primary O 2 Direct- injection Korea

Outline Evaluating Models of Natural Image Patches Evaluating Models Comparing Whitening

Evaluating Investments CBA or CGE? Evaluating Investments CBA or CGE? Peter Forsyth Peter

Measuring and Evaluating Computer System Performance CSE 141, S2'06 Jeff Brown Performance

Measuring Performance November 17, 2008 Measuring Performance Introduction CPU Peformance and

Evaluating the Prediction Accuracy of Generated Performance Models in Up- and Downscaling

Disclosures Disclosures No personal conflicts of interest. Pain Swelling Research

Evaluating the Expansion of Oregons Indoor Clean Air Act Shaun Parkman Outline 1. Define the

Evaluating the Productivity of a Evaluating the Productivity of a Multicore Architecture

Evaluating Heat Treatment Evaluating Heat Treatment Effectiveness Effectiveness Bh. .

Evaluating Temperature Data Evaluating Temperature Data Bh. . Subramanyam Subramanyam ( (Subi

Evaluating learners intercultural experiences intercultural experiences Evaluating

Science with synthetic stellar surveys Robyn Sanderson Caltech UPenn/Flatiron CCA OMG

LATTE Study Oral Cabotegravir + Rilpivirine versus Efavirenz + 2 NRTIs LATTE Study: Design

Advantages and disadvantages of current reference and digital objects linking models in

Security Interest Group Who Who Am I? Am I? Jason Donenfeld, also known as zx2c4 .

Fun stuff in R via Miles McBain rayshader. Tyler Morgan-Wall. (package, twitter chatter, talk)

Latte: Improving the Latency of Transiently Consistent Network Update Schedules Mark Glavind,

WALL-E Yini Wang; Ran Mo Development Team Yini Wang: Photomultiplier Tube &amp; Machine Learning

Grid Graphs, Gorenstein Polytopes, and Domino Stackings Matthias Beck (San Francisco State)

SWIRL IN THE CLOUD WEBINARS, VIRTUAL TASTINGS & DIGITAL COMMUNITY FOR WINE APRIL 2020 Our

WALL-E Yini Wang; Ran Mo Development Team Yini Wang: Photomultiplier Tube & Machine Learning