Design of a High-Performance GEMM-like Tensor-Tensor Multiplication - PowerPoint PPT Presentation

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer and Paolo Bientinesi Aachen Institute for Advanced Study in Computational Engineering Science Austin, Sep. 20th 2016 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 1 / 19

Outline Introduction 1 GEMM-like Tensor-Tensor Multiplication 2 Tensor Contraction Code Generator 3 Performance 4 Conclusion and Future Work 5 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 2 / 19

Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) We propose a novel approach: GETT 1 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) We propose a novel approach: GETT 1 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer Tensor Contraction Code Generator (TCCG) combine GETT, TTGT and LoG into a unified tool 1 Paul Springer and Paolo Bientinesi. “Design of a high-performance GEMM-like Tensor-Tensor Multiplication”. In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19

Matrix-Matrix Multiplication Matrix-Matrix Multiplication A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← � k A m , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← A m , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← A m , k B k , n // N-Loop for j = 0 : N − 1 // M-Loop for i = 0 : M − 1 tmp = 0 // K-Loop ( contracted ) for k = 0 : K − 1 tmp += A i , k B k , j // update C C i , j = α tmp + β C i , j Naive GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) A ∈ R M × K , B ∈ R K × N and C ∈ R M × N be 2D tensors: C m , n ← A m , k B k , n // N-Loop for n = 0 : nc : N − 1 // K-Loop ( contracted) for k = 0 : kc : K − 1 � B = identify_submatrix ( B , n , k ) � � // pack B into B B = packB( � � B ∈ R kc × nc � B ) // // M-Loop // N-Loop for m = 0 : mc : M − 1 for j = 0 : N − 1 � A = identify_submatrix ( A , m , k ) // M-Loop for i = 0 : M − 1 � � // pack A into A tmp = 0 A = packA( � � A ∈ R mc × kc � A ) // // K-Loop ( contracted ) for k = 0 : K − 1 � C = identify_submatrix ( C , m , n ) tmp += A i , k B k , j A � � // matrix -matrix product: B // update C macroKernel ( � A , � B , � C i , j = α tmp + β C i , j C , α, β ) Naive GEMM. High-performance GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19

Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 C m 1 , n 1 , n 2 , m 2 ← A m 1 , k 1 , m 2 , k 2 B k 2 , n 2 , k 1 , n 1 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 C m 1 , n 1 , n 2 , m 2 ← A m 1 , k 1 , m 2 , k 2 B k 2 , n 2 , k 1 , n 1 C m 1 , n 1 , n 2 , m 2 , n 3 ← A m 1 , k 1 , m 2 , k 2 B n 3 , k 2 , n 2 , k 1 , n 1 ... Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

Tensor Contractions Tensor contraction examples: C m , n ← A m , k B k , n C m 1 , m 2 , n ← A m 1 , m 2 , k B k , n C m 1 , n , m 2 ← A m 1 , m 2 , k B k , n C m 1 , n 1 , n 2 , m 2 ← A m 1 , m 2 , k B n 2 , k , n 1 C m 1 , n 1 , n 2 , m 2 ← A m 1 , k 1 , m 2 , k 2 B k 2 , n 2 , k 1 , n 1 C m 1 , n 1 , n 2 , m 2 , n 3 ← A m 1 , k 1 , m 2 , k 2 B n 3 , k 2 , n 2 , k 1 , n 1 ... ⇒ Quite similar to GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19

GETT Tensor-Tensor Multiplication (Einstein notation) Let the input tensors A ∈ R S A 1 × S A 2 × ... S A r A , and B ∈ R S B 1 × S B 2 × ... S B r B update the output tensor C ∈ R S C 1 × S C 2 × ... S C r C : C Π C ( I m ∪ I n ) ← α A Π A ( I m ∪ I k ) B Π B ( I n ∪ I k ) + β C Π C ( I m ∪ I n ) . Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 6 / 19

GETT Tensor-Tensor Multiplication (Einstein notation) Let the input tensors A ∈ R S A 1 × S A 2 × ... S A r A , and B ∈ R S B 1 × S B 2 × ... S B r B update the output tensor C ∈ R S C 1 × S C 2 × ... S C r C : C Π C ( I m ∪ I n ) ← α A Π A ( I m ∪ I k ) B Π B ( I n ∪ I k ) + β C Π C ( I m ∪ I n ) . These index sets I m , I n and I k are critical I m := { m 1 , m 2 , ..., m γ } : free indices of A I n := { n 1 , n 2 , ..., n ζ } : free indices of B I k := { k 1 , k 2 , ..., k ξ } : contracted indices Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 6 / 19

GETT 1 // N-Loops 2 for n 1 = 1 : S n 1 3 // ... remaining N-loops omitted ... 4 for n ζ = 1 : S n ζ 5 // M-Loops 6 for m 1 = 1 : S m 1 7 // ... remaining M-loops omitted ... 8 for m γ = 1 : S m γ 9 tmp = 0 10 // K-Loops ( contracted ) 11 for k 1 = 1 : S k 1 12 // ... remaining K-loops omitted ... 13 for k ξ = 1 : S k ξ 14 tmp += A Π A ( m 1 ,..., m γ , k 1 ,..., k ξ ) B Π B ( k 1 ,..., k ξ, n 1 ,..., n ζ ) 15 // update C 16 C Π C ( m 1 ,..., m γ , n 1 ,..., n ζ ) = α tmp + β C Π C ( m 1 ,..., m γ , n 1 ,..., n ζ ) Naive GETT. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 7 / 19

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication - PowerPoint PPT Presentation

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer and Paolo Bientinesi Aachen Institute for Advanced Study in Computational Engineering Science Austin, Sep. 20th 2016 Paul Springer (AICES) Tensor Contraction

Integer GEMM (under)performance Marat Dukhan Software Engineer on Ca ff e 2 GEMM in Neural

GEMM-Like Operations Applications Richard Michael Veras Platforms Carnegie Mellon Want

Flyte-MM: A Software Based Sub-Floating Point GEMM Richard Veras (Louisiana State University)

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

JET Job Skills Elementary School I Like Rain By Sarah Rogers-Tanner I like rain I dont like

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

GEMM April 17th Vancouver S.Kibsey ,BSc,BEng,MBA,CFA,SIPC VP Equity Risk Management Caisse de

PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist,

Tensor Methods for Signal Processing and Machine Learning Qibin Zhao Tensor Learning Unit RIKEN

and You Tensor network methods Matrix product states (MPS) Projected Entangled Pair States

Network Structure A Hyperbolic Space Analytics Framework for Big Network Data and Their

MA111: Contemporary mathematics . Jack Schmidt University of Kentucky January 20, 2012

Decision Support for Policing Violent Crime Jim Q. Smith & Aditi Shenvi (with Rob Procter,

Discrete Flavor Symmetries and Origin of CP Violation Mu-Chun Chen, University of California at

The Media War at the UN and the DPRK Why Netizen Journalism Matters Ronda Hauben Stony Brook

How Green Was My Valley Joe Conway joe.conway@crunchydata.com mail@joeconway.com Crunchy Data

Overview of New BiDirectional Amplifier Requirements for Hospitals Florida Hospital

E XAMPLE 1 SOL 1 SOL 2 Given: Given: 1. AEC = 90 1. AEC = 90 2. BDA = 90 2. BDA

Sambuz

Useful Links

Newsletter

Mail Us

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication - PowerPoint PPT Presentation

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer and Paolo Bientinesi Aachen Institute for Advanced Study in Computational Engineering Science Austin, Sep. 20th 2016 Paul Springer (AICES) Tensor Contraction

Integer GEMM (under)performance Marat Dukhan Software Engineer on Ca ff e 2 GEMM in Neural

GEMM-Like Operations Applications Richard Michael Veras Platforms Carnegie Mellon Want

Flyte-MM: A Software Based Sub-Floating Point GEMM Richard Veras (Louisiana State University)

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

JET Job Skills Elementary School I Like Rain By Sarah Rogers-Tanner I like rain I dont like

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

GEMM April 17th Vancouver S.Kibsey ,BSc,BEng,MBA,CFA,SIPC VP Equity Risk Management Caisse de

PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist,

Tensor Methods for Signal Processing and Machine Learning Qibin Zhao Tensor Learning Unit RIKEN

and You Tensor network methods Matrix product states (MPS) Projected Entangled Pair States

Network Structure A Hyperbolic Space Analytics Framework for Big Network Data and Their

MA111: Contemporary mathematics . Jack Schmidt University of Kentucky January 20, 2012

Decision Support for Policing Violent Crime Jim Q. Smith &amp; Aditi Shenvi (with Rob Procter,

Discrete Flavor Symmetries and Origin of CP Violation Mu-Chun Chen, University of California at

The Media War at the UN and the DPRK Why Netizen Journalism Matters Ronda Hauben Stony Brook

How Green Was My Valley Joe Conway joe.conway@crunchydata.com mail@joeconway.com Crunchy Data

Overview of New BiDirectional Amplifier Requirements for Hospitals Florida Hospital

E XAMPLE 1 SOL 1 SOL 2 Given: Given: 1. AEC = 90 1. AEC = 90 2. BDA = 90 2. BDA

Sambuz

Useful Links

Newsletter

Mail Us

Decision Support for Policing Violent Crime Jim Q. Smith & Aditi Shenvi (with Rob Procter,