Towards Compositional and Generative Tensor Optimizations Adilla - PowerPoint PPT Presentation

Towards Compositional and Generative Tensor Optimizations Adilla Susungi 1 , Norman A. Rink 2 , Jer´ on 2 , Immo onimo Castrill´ org Stiller 3 and Jochen Huismann 3 , Albert Cohen 4 , Claude Tadonki 1 , J¨ ohlich 3 Fr¨ 1 MINES ParisTech, PSL Research University 2 Chair for Compiler Construction, Technische Universit¨ at Dresden 3 Chair of Fluid Mechanics, Technische Universit¨ at Dresden 4 Inria, Ecole normale sup´ erieure 16th International Conference on Generative Programming: Concepts & Experiences (GPCE’17) Vancouver, Canada, October 24, 2017

Tensor Computations ◮ Underlying data structure: N-dimensional array Applications in numerical applications ◮ Quantum chemistry ◮ Machine learning ◮ Big data ◮ Computational fluid dynamics 2 / 14

Frameworks for Optimizations for Tensor Computations Domain-specific expressivity Flexible/Adaptive Hidden and/or rigid optimization optimization heuristics heuristics Generic expressivity 3 / 14

Tensors in Computational Fluid Dynamics Characteristics ◮ 3 to 4 dimensions nesting ◮ Few iterations per dimension (e.g., 13 iterations) ◮ Tensor contractions, outer products, entrywise multiplications ◮ Same computation for each element of a mesh Inverse Helmholtz [7] � A T kn · A T jm · A T t ijk = il · u lmn l,m,n p ijk = D ijk · t ijk � v ijk = A kn · A jm · A il · p lmn l,m,n 4 / 14

Tensors in Computational Fluid Dynamics Characteristics Search space for optimizations may include ◮ 3 to 4 dimensions nesting ◮ Evaluation order of tensor ◮ Few iterations per dimension contractions (e.g., 13 iterations) ◮ Fusions ◮ Tensor contractions, outer ◮ Interchanges products, entrywise multiplications ◮ Transpositions ◮ Same computation for each ◮ Vectorization element of a mesh ◮ Collapsing ◮ Unrolling Inverse Helmholtz [7] � A T kn · A T jm · A T t ijk = il · u lmn l,m,n p ijk = D ijk · t ijk � v ijk = A kn · A jm · A il · p lmn l,m,n 4 / 14

Implementing CFD Kernels in Existing Frameworks Chill • [6] Flexible, adap- Optimizations Pluto • [5] tive TensorFlow • [3] TVM • [2] Hidden, Tensor Contraction Engine • rigid [4] Specific Generic Numpy • [1] Expressivity Tensor Algebra Compiler • [8] 5 / 14

Implementing CFD Kernels in Existing Frameworks We encounter different levels of limitations Limited expressivity No optimization ability Unadapted heuristics Unadapted constructs 6 / 14

Our contribution An intermediate language with building blocks for declaring: ◮ Tensor computations ◮ Optimization heuristics Arrays, tensor operators, iterators and loop transformations as first class citizens. Meta-programming Iterative search Source file Intermediate Optimized C (C or DSL) language 7 / 14

Our contribution An intermediate language with building blocks for declaring: ◮ Tensor computations ◮ Optimization heuristics Arrays, tensor operators, iterators and loop transformations as first class citizens. Meta-programming Iterative search Source file Intermediate Optimized C (C or DSL) language CFD kernels share common tensor operations with other domains ◮ We want enough flexibility and genericity (at least for tensor-based applications) to be reused in other domains. 7 / 14

Inverse Helmholtz by Example Step 1: Declaring tensor computations � A T kn · A T jm · A T t ijk = il · u lmn l,m,n p ijk = D ijk · t ijk A = array(2, double, [N, N]) u = array(3, double, [N, N, N]) D = array(3, double, [N, N, N]) � v ijk = A kn · A jm · A il · p lmn At = vtranspose(A, 1, 2) l,m,n tmp1 = contract(At, u, [2, 1]) tmp2 = contract(At, tmp1, [2, 2]) tmp3 = contract(At, tmp2, [2, 3]) tmp4 = entrywise(D, tmp3) tmp5 = contract(A, tmp4, [2, 1]) tmp6 = contract(A, tmp5, [2, 2]) v = contract(A, tmp6, [2, 3]) 8 / 14

Inverse Helmholtz by Example Step 2: Associating iterators to computations i1 = iterator(0, N, 1) i2 = iterator(0, N, 1) # ... other iterator declarations build(D, [td1, td2, td3]) build(tmp1, [i1, i2, i3, i4]) ## Also applies to tmp2, ..., tmp6 build(v, [k12, k22, k32, k42]) 9 / 14

Inverse Helmholtz by Example Step 3: Applying transformations interchange(i4, i3) interchange(i4, i2) interchange(j2, j1) interchange(j1, j4) 10 / 14

Inverse Helmholtz by Example Example of results from different heuristics ◮ Variant L1: Loop interchanges 12 only + parallelization; ◮ Variant L2: Loop interchanges 11 + data transpositions of tensor A + parallelization; Speed-up 10 ◮ Variant L3: Loop interchanges + data transpositions of tensors 9 tmp1, ..., tmp6 + parallelization. 8 ◮ Pluto1: Loop interchanges + parallelization + vectorization; 7 L1 L2 L3 Pluto1 Pluto2 Pluto3 ◮ Pluto2: Loop interchanges + ◮ Mesh size: 750; data size: 33. partial fusions + vectorization; ◮ Baseline: sequential execution. ◮ Pluto3: Loop interchanges + ◮ Machine: 24-core Intel(R) maximum fusions + Xeon(R) CPU E5-2680 v3 @ vectorization; 2.50GHz (Haswell) 11 / 14

Conclusion ◮ Cross-domain building-blocks → One intermediate language to rule them all flexibly ◮ Possibility to assess different variants → Through meta-programming or auto-tuning techniques Ongoing work ◮ Syntax refinement ◮ Formal semantics ◮ Applications to other domains 12 / 14

References I NumPy, package for scientific computing with Python. http://www.numpy.org/ , 2017. TVM: An End to End IR Stack for Deploying Deep Learning Workloads on Hardware Platforms. https://www.tvmlang.org , 2017. Abadi, M., and et al., A. A. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. http://download.tensorflow.org/paper/whitepaper2015.pdf, 2015. Baumgartner, G., Auer, A., Bernholdt, D. E., Bibireata, A., Choppella, V., Cociorva, D., Gao, X., Harrison, R. J., Hirata, S., Krishnamoorthy, S., Krishnan, S., chung Lam, C., Lu, Q., Nooijen, M., Pitzer, R. M., Ramanujam, J., Sadayappan, P., and Sibiryakov, A. Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. Proceedings of the IEEE 93 , 2 (Feb 2005), 276–292. 13 / 14

References II Bondhugula, U., Hartono, A., Ramanujam, J., and Sadayappan, P. A practical automatic polyhedral program optimization system. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2008). Chen, C., Chame, J., and Hall, M. Chill: A framework for composing high-level loop transformations. Tech. rep., Technical Report 08-897, University of Southern California, 2008. Huismann, I., Stiller, J., and Fr¨ ohlich, J. Factorizing the factorization — a spectral-element solver for elliptic equations with linear operation count. Journal of Computational Physics 346 (2017), 437–448. Kjolstad, F., Kamil, S., Chou, S., Lugato, D., and Amarasinghe, S. The tensor algebra compiler. In Proceedings of ACM Program. Lang (October 2017), OOPSLA’ 17. 14 / 14

Towards Compositional and Generative Tensor Optimizations Adilla - PowerPoint PPT Presentation

Towards Compositional and Generative Tensor Optimizations Adilla Susungi 1 , Norman A. Rink 2 , Jer on 2 , Immo onimo Castrill org Stiller 3 and Jochen Huismann 3 , Albert Cohen 4 , Claude Tadonki 1 , J ohlich 3 Fr 1 MINES ParisTech, PSL

Loop Optimizations Important because lots of execution Loop Optimizations Loop Optimizations

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

generative design systems Generative Brief Design Definitions Workshop Processes

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Analysis and Optimizations Analysis and Optimizations Program Analysis Program Analysis

Concepts Introduced in Chapter 9 introduction to compiler optimizations basic blocks and

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

AutoTVM & Device Fleet ` Learning to Optimize Tensor Programs Frameworks High-level data

Tensor Methods for Signal Processing and Machine Learning Qibin Zhao Tensor Learning Unit RIKEN

and You Tensor network methods Matrix product states (MPS) Projected Entangled Pair States

Tensor Invariants and Kronecker Coefficients Jiarui Fei University of California, Riverside

Higher order black holes of scalar tensor theories E Babichev and CC gr-qc/1312.3204 CC, T

Hydrodynamics and Transport Bjrn Schenke Physics Department, Brookhaven National Laboratory,

( ) = F dt Mv Principle of virtual work: constraint force Could include rigid bodies if M

Compressible Navier-Stokes (Euler) Solver based on Deal.II Library Lei Qiao Northwestern

DYNAMICS : THE RESIDUAL DISTRIBUTION POINT OF VIEW . A PPLICATION TO LAMINAR AND TURBULENT FLOWS

USQCD intensity-frontier program: Perspective Ruth Van de Water for the SPC 2013 USQCD All

Mary Wollstonecraft Egalitarianism Polanyi Wollstonecraft Review Todays menu Review

2 2 th All Japan High School W om ens Football Cham pionship Tokyo Broadcasting System

Classification James H. Steiger Department of Psychology and Human Development Vanderbilt

Towards Compositional and Generative Tensor Optimizations Adilla - PowerPoint PPT Presentation

Towards Compositional and Generative Tensor Optimizations Adilla Susungi 1 , Norman A. Rink 2 , Jer on 2 , Immo onimo Castrill org Stiller 3 and Jochen Huismann 3 , Albert Cohen 4 , Claude Tadonki 1 , J ohlich 3 Fr 1 MINES ParisTech, PSL

Loop Optimizations Important because lots of execution Loop Optimizations Loop Optimizations

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

generative design systems Generative Brief Design Definitions Workshop Processes

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Analysis and Optimizations Analysis and Optimizations Program Analysis Program Analysis

Concepts Introduced in Chapter 9 introduction to compiler optimizations basic blocks and

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

AutoTVM &amp; Device Fleet ` Learning to Optimize Tensor Programs Frameworks High-level data

Tensor Methods for Signal Processing and Machine Learning Qibin Zhao Tensor Learning Unit RIKEN

and You Tensor network methods Matrix product states (MPS) Projected Entangled Pair States

Tensor Invariants and Kronecker Coefficients Jiarui Fei University of California, Riverside

Higher order black holes of scalar tensor theories E Babichev and CC gr-qc/1312.3204 CC, T

Hydrodynamics and Transport Bjrn Schenke Physics Department, Brookhaven National Laboratory,

( ) = F dt Mv Principle of virtual work: constraint force Could include rigid bodies if M

Compressible Navier-Stokes (Euler) Solver based on Deal.II Library Lei Qiao Northwestern

DYNAMICS : THE RESIDUAL DISTRIBUTION POINT OF VIEW . A PPLICATION TO LAMINAR AND TURBULENT FLOWS

USQCD intensity-frontier program: Perspective Ruth Van de Water for the SPC 2013 USQCD All

Mary Wollstonecraft Egalitarianism Polanyi Wollstonecraft Review Todays menu Review

2 2 th All Japan High School W om ens Football Cham pionship Tokyo Broadcasting System

Classification James H. Steiger Department of Psychology and Human Development Vanderbilt

AutoTVM & Device Fleet ` Learning to Optimize Tensor Programs Frameworks High-level data