SLIDE 25 SWIRL++
LCPC’19 Convolution System Overview Model Guided Optimization
Code Variants Memory Cost Space Pruning Heuristics Unrolling
Evaluation
Performance Empirical Stability
Conclusion References Appendices
SWIRL experiments SuRF
References I
[1] Balaprakash, P., Dongarra, J., Gamblin, T., Hall, M., Hollingsworth, J.K., Norris, B., Vuduc, R.: Autotuning in high-performance computing applications. Proceedings of the IEEE 106(11), 2068–2083 (Nov 2018). https://doi.org/10.1109/JPROC.2018.2841200 [2] IntelLabs: Latte.py. https://github.com/IntelLabs/Latte.py [3] Nelson, T., Rivera, A., Balaprakash, P., Hall, M., Hovland, P.D., Jessup, E., Norris, B.: Generating efficient tensor contractions for gpus. In: 2015 44th International Conference on Parallel Processing.
[4] Venkat, A., Rusira, T., Barik, R., Hall, M., Truong, L.: Swirl: High-performance many-core cpu code generation for deep neural networks. The International Journal of High Performance Computing Applications 0(0) (0). https://doi.org/10.1177/1094342019866247, https://doi.org/10.1177/1094342019866247 25 / 29