design of a high performance gemm like tensor tensor
play

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication - PowerPoint PPT Presentation

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer and Paolo Bientinesi Aachen Institute for Advanced Study in Computational Engineering Science Austin, Sep. 20th 2016 Paul Springer (AICES) Tensor Contraction


  1. Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer and Paolo Bientinesi Aachen Institute for Advanced Study in Computational Engineering Science Austin, Sep. 20th 2016 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 1 / 19

  2. Outline Introduction 1 GEMM-like Tensor-Tensor Multiplication 2 Tensor Contraction Code Generator 3 Performance 4 Conclusion and Future Work 5 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 2 / 19

  3. Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

  4. Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

  5. Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) We propose a novel approach: GETT 1 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

  6. Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) We propose a novel approach: GETT 1 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer Tensor Contraction Code Generator (TCCG) combine GETT, TTGT and LoG into a unified tool 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

  7. Matrix-Matrix Multiplication Matrix-Matrix Multiplication A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← � k A m , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

  8. Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← A m , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

  9. Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← A m , k B k , n // N-Loop for j = 0 : N − 1 // M-Loop for i = 0 : M − 1 tmp = 0 // K-Loop ( contracted ) for k = 0 : K − 1 tmp += A i , k B k , j // update C C i , j = α tmp + β C i , j Naive GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

  10. Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← A m , k B k , n // N-Loop for n = 0 : nc : N − 1 // K-Loop ( contracted) for k = 0 : kc : K − 1 � B = identify_submatrix ( B , n , k ) � � // pack B into B B = packB( � � B ∈ R kc × nc � B ) // // M-Loop // N-Loop for m = 0 : mc : M − 1 for j = 0 : N − 1 � A = identify_submatrix ( A , m , k ) // M-Loop for i = 0 : M − 1 � � // pack A into A tmp = 0 A = packA( � � A ∈ R mc × kc � A ) // // K-Loop ( contracted ) for k = 0 : K − 1 � C = identify_submatrix ( C , m , n ) tmp += A i , k B k , j A � � // matrix -matrix product: B // update C macroKernel ( � A , � B , � C i , j = α tmp + β C i , j C , α, β ) Naive GEMM. High-performance GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

  11. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  12. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  13. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  14. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  15. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 C m 1 , n 1 , n 2 , m 2 ← A m 1 , k 1 , m 2 , k 2 B k 2 , n 2 , k 1 , n 1 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  16. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 C m 1 , n 1 , n 2 , m 2 ← A m 1 , k 1 , m 2 , k 2 B k 2 , n 2 , k 1 , n 1 C m 1 , n 1 , n 2 , m 2 , n 3 ← A m 1 , k 1 , m 2 , k 2 B n 3 , k 2 , n 2 , k 1 , n 1 ... Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  17. Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 C m 1 , n 1 , n 2 , m 2 ← A m 1 , k 1 , m 2 , k 2 B k 2 , n 2 , k 1 , n 1 C m 1 , n 1 , n 2 , m 2 , n 3 ← A m 1 , k 1 , m 2 , k 2 B n 3 , k 2 , n 2 , k 1 , n 1 ... ⇒ Quite similar to GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

  18. GETT Tensor-Tensor Multiplication (Einstein notation) Let the input tensors A ∈ R S A 1 × S A 2 × ... S A r A , and B ∈ R S B 1 × S B 2 × ... S B r B update the output tensor C ∈ R S C 1 × S C 2 × ... S C r C : C Π C ( I m ∪ I n ) ← α A Π A ( I m ∪ I k ) B Π B ( I n ∪ I k ) + β C Π C ( I m ∪ I n ) . Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 6 / 19

  19. GETT Tensor-Tensor Multiplication (Einstein notation) Let the input tensors A ∈ R S A 1 × S A 2 × ... S A r A , and B ∈ R S B 1 × S B 2 × ... S B r B update the output tensor C ∈ R S C 1 × S C 2 × ... S C r C : C Π C ( I m ∪ I n ) ← α A Π A ( I m ∪ I k ) B Π B ( I n ∪ I k ) + β C Π C ( I m ∪ I n ) . These index sets I m , I n and I k are critical I m := { m 1 , m 2 , ..., m γ } : free indices of A I n := { n 1 , n 2 , ..., n ζ } : free indices of B I k := { k 1 , k 2 , ..., k ξ } : contracted indices Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 6 / 19

  20. GETT 1 // N-Loops 2 for n 1 = 1 : S n 1 3 // ... remaining N-loops omitted ... 4 for n ζ = 1 : S n ζ 5 // M-Loops 6 for m 1 = 1 : S m 1 7 // ... remaining M-loops omitted ... 8 for m γ = 1 : S m γ 9 tmp = 0 10 // K-Loops ( contracted ) 11 for k 1 = 1 : S k 1 12 // ... remaining K-loops omitted ... 13 for k ξ = 1 : S k ξ 14 tmp += A Π A ( m 1 ,..., m γ , k 1 ,..., k ξ ) B Π B ( k 1 ,..., k ξ, n 1 ,..., n ζ ) 15 // update C 16 C Π C ( m 1 ,..., m γ , n 1 ,..., n ζ ) = α tmp + β C Π C ( m 1 ,..., m γ , n 1 ,..., n ζ ) Naive GETT. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 7 / 19

  21. GETT 1 // N-Loops 2 for n 1 = 1 : S n 1 3 // ... remaining N-loops omitted ... 4 for n ζ = 1 : S n ζ 5 // M-Loops 6 for m 1 = 1 : S m 1 7 // ... remaining M-loops omitted ... 8 for m γ = 1 : S m γ 9 tmp = 0 10 // K-Loops ( contracted ) 11 for k 1 = 1 : S k 1 12 // ... remaining K-loops omitted ... 13 for k ξ = 1 : S k ξ 14 tmp += A Π A ( m 1 ,..., m γ , k 1 ,..., k ξ ) B Π B ( k 1 ,..., k ξ, n 1 ,..., n ζ ) 15 // update C 16 C Π C ( m 1 ,..., m γ , n 1 ,..., n ζ ) = α tmp + β C Π C ( m 1 ,..., m γ , n 1 ,..., n ζ ) Naive GETT. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 7 / 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend