swirl evaluating performance models
play

SWIRL++ :Evaluating Performance Models System Overview to Guide - PowerPoint PPT Presentation

SWIRL++ T. Rusira et al LCPC19 Convolution SWIRL++ :Evaluating Performance Models System Overview to Guide Code Transformation in Model Guided Optimization Convolutional Neural Networks Code Variants Memory Cost Space Pruning


  1. SWIRL++ T. Rusira et al LCPC’19 Convolution SWIRL++ :Evaluating Performance Models System Overview to Guide Code Transformation in Model Guided Optimization Convolutional Neural Networks Code Variants Memory Cost Space Pruning Heuristics Unrolling Tharindu Rusira 1 Anand Venkat 2 Raj Barik 3 Evaluation Performance Mary Hall 1 Empirical Stability Conclusion References 1. University of Utah 2. Intel Labs 3. Uber Technologies Inc. Appendices SWIRL experiments LCPC’19 SuRF 22 Oct 2019 1 / 29

  2. SWIRL++ Overview T. Rusira et al LCPC’19 Convolution Convolution System Overview System Overview Model Guided Optimization Model Guided Optimization Code Variants Code Variants Memory Cost Memory Cost Space Pruning Heuristics Space Pruning Heuristics Unrolling Evaluation Unrolling Performance Empirical Stability Evaluation Conclusion Performance References Empirical Stability Appendices SWIRL experiments Conclusion SuRF Appendices SWIRL experiments SuRF 2 / 29

  3. SWIRL++ 2D-Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF https: //machinelearninguru.com/computer_vision/basics/convolution/convolution_layer.html 3 / 29

  4. SWIRL++ 2D-Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 4 / 29

  5. SWIRL++ 2D-Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 5 / 29

  6. SWIRL++ 2D-Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 6 / 29

  7. SWIRL 4 SWIRL++ Provides compiler optimizations for Latte.py 2 through T. Rusira et al LCPC’19 transformation recipes . Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 2. https://github.com/IntelLabs/Latte.py 4.Venkat, A. et al., SWIRL:High-performance many-core CPU code generation for deep neural networks. IJHPCA, 33(6) 7 / 29

  8. SWIRL++ Transformation Recipes T. Rusira et al LCPC’19 Convolution I ∈ R NC B HWC , W ∈ R K B C B PQKC , O ∈ R NK B PQK System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 8 / 29

  9. SWIRL++ SWIRL’s output T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 9 / 29

  10. SWIRL++ SWIRL++ T. Rusira et al Manual exploration of optimizing transformations does not LCPC’19 always guarantee high performance. SWIRL++ is a step Convolution towards automation. System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 10 / 29

  11. SWIRL++ Code Variants T. Rusira et al LCPC’19 Convolution Given a loop order L , apply varying T L , P L , V L to generate System Overview convolution variants with constraints H L . Return best k Model Guided variants that minimize the cost . Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 11 / 29

  12. SWIRL++ Memory cost of Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 12 / 29

  13. SWIRL++ Memory cost of Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 13 / 29

  14. SWIRL++ Memory cost of Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 14 / 29

  15. SWIRL++ Memory cost of Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 15 / 29

  16. SWIRL++ Memory cost of Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 16 / 29

  17. SWIRL++ Memory cost of Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 17 / 29

  18. SWIRL++ Memory cost of Convolution T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 18 / 29

  19. SWIRL++ Space Pruning Heuristics T. Rusira et al LCPC’19 Convolution System Overview H L defines a set of heuristics to restrict the search space. Model Guided Optimization ◮ Data layouts and candidate loop orders are selected a Code Variants Memory Cost Space Pruning priori Heuristics Unrolling ◮ Two outermost loops are parallelized with omp Evaluation parallel for collapse(2) Performance Empirical Stability ◮ Feature map dimensions tiled by SIMDWIDTH and Conclusion inner loop vectorized References Appendices ◮ K, C, P, Q dimensions are candidates for tiling SWIRL experiments SuRF 19 / 29

  20. SWIRL++ Loop Unrolling T. Rusira et al LCPC’19 Convolution System Overview Model Guided Optimization Code Variants ◮ Unroll candidate loops after tile factors are determined Memory Cost Space Pruning Heuristics ◮ Unroll factors are derived to fully utilize the register file Unrolling ◮ if P, Q are candidates for unrolling, corresponding tile Evaluation Performance factors p, q are determined such that p × q ≥ REGS to Empirical Stability Conclusion fully hide FMA latency References Appendices SWIRL experiments SuRF 20 / 29

  21. SWIRL++ Performance Results T. Rusira et al Platform: dual socket Intel Xeon Platinum 8280 LCPC’19 CascadeLake, 2x28 2.7 GHZ (max 4.0 GHz) cores with 192 Convolution GB memory, 32KB L1, 1MB L2, 38.5 MB L3 cache, and 32 System Overview 512-bit vector registers. (icc 18.0.1) Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 21 / 29

  22. SWIRL++ Empirical Stability S earch u sing R andom F orest, (SuRF) 1;3 based autotuner is T. Rusira et al LCPC’19 integrated to traverse the same search space. (a)-(f) VGG Convolution (g)-(i) Overfeat (j)-(l) Inception System Overview Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 1. Balaprakash, P. et al. Autotuning in High-Performance Computing Applications.IEEE106.11(’18) 3. Nelson, T. et al. Generating efficient tensor contractions for gpus. ICPP’15. 22 / 29

  23. SWIRL++ SWIRL++ Top-1 Vs. Top-k T. Rusira et al LCPC’19 Convolution Relative speedups of the best among Top- k variants with System Overview respect to Top- 1 Model Guided Optimization Code Variants Memory Cost Space Pruning Heuristics Unrolling Evaluation Performance Empirical Stability Conclusion References Appendices SWIRL experiments SuRF 23 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend