Linear Transform Libraries Yevgen Voronenko, Frdric de Mesmay, and - PowerPoint PPT Presentation

Computer Generation of General Size Linear Transform Libraries Yevgen Voronenko, Frédéric de Mesmay, and Markus Püschel Carnegie Mellon University, Pittsburgh, USA

PROBLEM AND MOTIVATION http://4.bp.blogspot.com/_LkcgSIEg0jw/TOi0XygeC4I/AAAAAAAAAH0/K0FyPn28gtc/s1600/Confuse.jpeg 2

Situation Today Numerical Library hand written library e.g. FFTW Codelet auto generated base cases for fixed size Codelet (Advanced Libraries) Codelet Not used in practice http://en.wikipedia.org/wiki/File:DLL_icon_on_Windows_Vista.png 3

Problem automatic new plattform arrives generation optimize & implement reoptimize & reimplement for spezific plattform on new plattform http://en.wikipedia.org/wiki/File:DLL_icon_on_Windows_Vista.png 4 http://en.wikipedia.org/wiki/Intel

Goal Platform Parameters Cacheline size Optimized #Cores , … Library Generator Transform Description DFT, DCT, … http://www.anilgears.com/images1/gear6.jpg 5

BACKGROUND ON LINEAR TRANSFORMS http://www.dimelo.com/img/style/analytics.jpg 6

Example DFT 𝑧 = 𝐸𝐺𝑈 𝑜 · 𝑦 Discrete Fourier Transform (DFT): 𝑜 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 base case: 𝐸𝐺𝑈 2 = 1 1 −1 1 Recursive Cases 7

Example: Implementation (Naïve) 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑜 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 void dft( int n, cpx *y, cpx *x) { int k = choose_factor(n); int m = n/k; cpx *t1 = Permute x with L(n,k); // t2 = (I_k tensor DFT_m)*t1 for ( int i=0; i<k; ++i) memory to dft(m, t2 + m*i, t1 + m*i); memory copy // t3 = diag( d(j) )*t2 for ( int i=0; i<n; ++i) SLOW! t3[i] = d(i) * t2[i]; // y = (DFT_k tensor I_m)*t3, for ( int i=0; i<m; ++i) dft_str(k, m, y + i, t3 + i); } // to be implemented void dft_str( int n, int str, cpx *Y, cpx *X); 8

DFT – Matrix Representation 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑜 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 Input Permutation Diagonal Matrix Recursive Cases graphics from paper 9

DFT – Dataflow Representation 1.1 2.1 1 3.1 4.1 1.2 2.2 2 3.2 4.2 1.3 2.3 3 3.3 4.3 1.4 2.4 4 3.4 4.4 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑜 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 10

DFT – Improvement Idea 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑜 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 𝑜 : A scalar multiplier for each entry 𝑈 𝑛 𝑜 : A permutation function 𝑀 𝑙 Merge 𝑜 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 “Scaled DFT” “ Strided DFT” 11

Example – Implementation (FFTW) n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m void dft( int n, cpx *y, cpx *x) { int k = choose_factor(n); // t1 = (I_k tensor DFT_m)L(n,k)*x for ( int i=0; i < k; ++i) dft_str(m, k, 1, t1 + m*i, x + m*i); // y = (DFT_k tensor I_m) diag(d(j)) for ( int i=0; i < m; ++i) dft_scaled(k, m, precomp_d[i], y + i, t1 + i); } // to be implemented void dft_str( int n, int istr, int ostr, cpx *y, cpx*x); void dft_scaled( int n, int str, cpx *d, cpx *y, cpx*x); Challenge: How to find those recursive functions? 12

Example - Observation Not yet parallelized and vectorized Many other recursive functions needed Optimizations are to be identified and implemented by hand 13 http://bookszeus.files.wordpress.com/2011/09/hands-on-keyboard.jpg

LIBRARY GENERATOR http://www.artistsvalley.com/images/icons/Database%20Application%20Icons/Data%20Generator/ 14 256x256/Data%20Generator.jpg

Transform Description Formulated in special language (SPL) New set of breakdown rules for each transform Can be complicated 15

Library Generation for Transforms dft(), dft_str() dft_scaled() Recursion Steps Breakdown Rules Library 𝐄𝐆𝐔 𝐨 = n 𝐄𝐆𝐔 𝐥 ⊗ I m T m n Base Cases I k ⊗ 𝐄𝐆𝐔 𝐧 L k DFT 2 to DFT 16 Generator http://www.anilgears.com/images1/gear6.jpg 16

From SPL to Σ -SPL DFT n Problem A Linear Transform: Breakdown rules in SPL: Breakdown Rule n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m Rewritten breakdown rules into Σ -SPL Σ -SPL 17

Difference of SPL and Σ -SPL Σ -SPL = Extended SPL + Loop constructs Σ : iterative matrix sum + explicit I/O: gather G(f) and scatter S(f) + index mapping functions M(f) for strides (permutations) How to rewrite SPL to Σ -SPL? 𝑔: 𝐵 → 𝐶 Functions: A: Integer interval [1..n] B: Integer interval [1..n] or ℂ, ℝ Example: Permutation 𝑔 𝑜→𝑜 (𝑐𝑗𝑘𝑓𝑑𝑢𝑗𝑤𝑓) 18

Σ -SPL – Gather and Scatter Matrices Parameterized matrices 1 𝑈 = 𝐻 𝑔 ≔ 𝑓 𝑔 0 , … , 𝑓 𝑔 𝑜−1 ⋱ 1 1 ⋱ 1 𝑧 = 𝐻 𝑔 ⋅ 𝑦 ⇔ 𝑧 𝑗 = 𝑦 𝑔 𝑗 𝑇 𝑔 ≔ 𝐻 𝑔 𝑈 = 𝑧 = 𝑇 𝑔 ⋅ 𝑦 ⇔ 𝑧 𝑔 𝑗 = 𝑦 𝑗 19

Inside the Tensor Product n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m Y = 𝐽 𝑙 ⊗ 𝐵 x 𝑍 𝑌 1 𝐵 ⋯ 0 1 ⋮ ⋮ = ⋮ ⋱ ⋮ 𝑍 𝑌 𝑜 0 ⋯ 𝐵 𝑜 𝑍 𝑌 1 𝐵 ⋯ 0 0 ⋯ 0 1 ⋮ ⋮ = + ⋯ + ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 𝑍 𝑌 𝑜 0 ⋯ 0 0 ⋯ 𝐵 𝑜 Iterative Matrix Sum 𝑍 𝑗 = 𝐵 ⋅ 𝑌 𝑗 Partition of vectors: 20

Inside the Tensor Product n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m j=0 j=0 0 0 A : : loop j=k-1 j=k-1 k-1 k-1 kernel Y X 𝑙−1 𝐽 𝑙 ⊗ 𝐵 = 𝑇 0𝐵 𝐻 0 + ⋯ + 𝑇 𝑙−1 𝐵 𝐻 𝑙−1 = 𝑇 𝑘 𝐵 𝐻 𝑘 𝑘=0 21

Σ -SPL – Rewriting Rules (Extract) SPL to Σ -SPL Loop merging 22

From SPL to Σ -SPL n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m 𝑙−1 𝑜 = m 𝐽 𝑙 ⊗ 𝐸𝐺𝑈 𝑛 𝑀 𝑙 𝑇 ℎ 𝑘𝑛,1 𝐸𝐺𝑈 𝑛 𝐻(ℎ 𝑘𝑛,1 ) perm l k 𝑘=0 𝑙−1 = 𝑇 ℎ 𝑘𝑛,1 𝐸𝐺𝑈 𝑛 𝐻(ℎ 𝑘,𝑙 ) 𝑘=0 𝑛−1 𝑜 = 𝐸𝐺𝑈 𝑙 ⊗ 𝐽 𝑛 𝑈 𝑇 ℎ 𝑘,𝑛 𝐸𝐺𝑈 𝑙 𝑒𝑗𝑏𝑕(𝑔 ∘ ℎ 𝑗,𝑙 )𝐻(ℎ 𝑘,𝑛 ) 𝑛 𝑘=0 23

Recursion Step Closure Implement Transforms & Breakdown Rules expand & rewrite Parameterize Parametrize Implement find params according to Yes constraints New (Matrix dimensions ) R-Steps ? No 24

Recursion Steps - Example n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m dft Function Tag to mark recursions 𝑙−1 𝑜 = 𝐽 𝑙 ⊗ 𝐸𝐺𝑈 𝑛 𝑀 𝑙 𝑇 ℎ 𝑘𝑛,1 𝐸𝐺𝑈 𝑛 𝐻(ℎ 𝑘,𝑙 ) 𝑘=0 dft_str 𝑛−1 𝑜 = 𝐸𝐺𝑈 𝑙 ⊗ 𝐽 𝑛 𝑈 𝑇 ℎ 𝑘,𝑛 𝐸𝐺𝑈 𝑙 𝑒𝑗𝑏𝑕(𝑔 ∘ ℎ 𝑗,𝑙 )𝐻(ℎ 𝑘,𝑛 ) 𝑛 𝑘=0 dft_scaled Expand tag to scatter / gather 25

Recursion Steps – Graph dft dft_str dft_scale Stride = 1 dft_scale_base 26

Library Generation for Transforms dft(), dft_str() dft_scaled() Recursion Steps Code Generation Breakdown Rules Library Parameter Handling 𝐄𝐆𝐔 𝐨 = n 𝐄𝐆𝐔 𝐥 ⊗ I m T m n Base Cases I k ⊗ 𝐄𝐆𝐔 𝐧 L k DFT 2 to DFT 16 Generator http://www.anilgears.com/images1/gear6.jpg 27

Code Translation (Extract) Σ -SPL Code Rewriting Rules Code ( 𝐻 𝑔 𝑜→𝑂 , 𝑧, 𝑦 ) for(j=0..n-1) y[j] = x[f(j)]; Code ( 𝑒𝑗𝑏𝑕 𝑔 𝑜→ℂ , 𝑧, 𝑦 ) for(j=0..n-1) y[j] = f(j)*x[j]; 𝑙−1 Code ( 𝑁 , 𝑧, 𝑦 ) 𝑘 𝑘=0 for(j=0..k-1) Code ( 𝑁 𝑘 , 𝑧, 𝑦 ) http://www.carlosag.net/images/codeTranslatorLogo.gif 28

Parameter Handling Precomputed Function Parameters Parameters (Cold) (Hot) “Initial - Parameter” “Runtime - parameter” Large Parameter Set u1,u2, u3, u4, u5, u6, u7, u8, f1 𝑣1→𝑣2 𝑣1→ℂ 𝐸𝐺𝑈 𝑣1 = 𝑇 ℎ 𝑣3,1 𝐸𝐺𝑈 𝑔1 ⊗ 𝐽 𝑣1/𝑔1 𝑒𝑗𝑏𝑕 𝑞𝑠𝑓 𝑒 𝑣1/𝑔1 𝑣1 𝐻(ℎ 𝑣7,𝑣8 𝑣1→𝑣6 ) ⋅ 𝐽 𝑔1 ⊗ 𝐸𝐺𝑈 𝑣1/𝑔1 𝑀 𝑔1 29

Library Generation Generated Functions Actual Code Higher-order descriptor Classes Identified recursion DFT, DFT_STR, steps DFT_SCALED Cold parameters Constructor arguments DFT size n DFT(int n,…) Hot parameters Method arguments Vectors x,y DFT->compute(x, y) Base cases Constants DFT2 [1 1; 1 -1] 30

Library Generation: Platform Parallel & vector code New rewriting rules with # Processors, Cache-line size, vector length Additional recursion steps: DFT scalar: 4 DFT vectorized+parallel: 8 31

RESULTS http://britishmastersaf.files.wordpress.com/2011/05/results.jpg 32

Test Plattform CPU 2x dual core 3GHz Intel Xeon 5160 processors Compiler: Intel C/C++ Compiler 10.1 and SUN JDK 1.6.0 Threading: OpenMP pragmas Library Generation Time: between 10 and 60 minutes 33

Scalar Just the base approx. same algorithms cases up to 64 and optimization 34 Source: Paper

4 cores used 1 core used 2 cores used 35 Source: Paper

Extra passes due to conversion into RDFT 36 Source: Paper

Linear Transform Libraries Yevgen Voronenko, Frdric de Mesmay, and - PowerPoint PPT Presentation

Computer Generation of General Size Linear Transform Libraries Yevgen Voronenko, Frdric de Mesmay, and Markus Pschel Carnegie Mellon University, Pittsburgh, USA PROBLEM AND MOTIVATION

Topic 10: The Z Transform o Introduction to Z Transform o Relationship to the Fourier transform o

Fourier Series and Transform Overview Why Fourier transform? Trigonometric functions Who is

SMART GOVERNMENT INVOICING: INVOICE PROCESSING PLATFORM LEAD. TRANSFORM. DELIVER LEAD. TRANSFORM.

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Topic 4: Continuous-Time Fourier Transform (CTFT) o Introduction to Fourier Transform o Fourier

Linear Systems Linear Systems Transform Ax = b into an equivalent but Transform Ax b into

Libraries Jonathan Platt Head of Libraries and Heritage 22 nd July 2014 Libraries 1.

Libraries In C++ its possible to create static libraries and shared libraries Static

Transform Coding - Overview Principle of block-wise transform coding Properties of orthonormal

Fourier Transform for Partial Differential Equations Introduction: Fourier Transform

Topic 5: Discrete-Time Fourier Transform (DTFT) o DT Fourier Transform o Overview of Fourier

Topic 9: The Laplace Transform o Introduction o Laplace Transform & Examples o Region of

Xamarin One platform to rule them all? Erwin de Groot @ 040 coders .NET frameworks WPF UI SL

Continuous-time systems 1 March 2, 2015 Continuous-time systems 1 Linear differential equations

Carnegie Mellon Generating High Performance Generating High Performance General Size Linear

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Tsing nghua hua University versity Introduction Deep learning has widely used in lots of

Computatio ion Reuse in in DNNs by Exploiting Input Sim imilarity Marc Riera , Jose Maria

Lecture 20: Inheritance 2016-02-04 Prof. Dr. Andreas Podelski, Dr. Bernd Westphal 20

Normalization by Evaluation and the Foundations of Constructive Mathematics 1972 - 2009 Peter

Probability & Statistics Intro / Review NEU 560 Jonathan Pillow Lecture 6, part II 1

Automatic Performance Tuning and Machine Learning Markus Pschel Computer Science, ETH Zrich

Malware Defense II TDDD17 Information Security, Second Course Alireza Mohammadinodooshan

Holographic models for QCD Elias Kiritsis University of Crete ( APC, Paris ) 1- Collaborators

Linear Transform Libraries Yevgen Voronenko, Frdric de Mesmay, and - PowerPoint PPT Presentation

Computer Generation of General Size Linear Transform Libraries Yevgen Voronenko, Frdric de Mesmay, and Markus Pschel Carnegie Mellon University, Pittsburgh, USA PROBLEM AND MOTIVATION

Topic 10: The Z Transform o Introduction to Z Transform o Relationship to the Fourier transform o

Fourier Series and Transform Overview Why Fourier transform? Trigonometric functions Who is

SMART GOVERNMENT INVOICING: INVOICE PROCESSING PLATFORM LEAD. TRANSFORM. DELIVER LEAD. TRANSFORM.

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Topic 4: Continuous-Time Fourier Transform (CTFT) o Introduction to Fourier Transform o Fourier

Linear Systems Linear Systems Transform Ax = b into an equivalent but Transform Ax b into

Libraries Jonathan Platt Head of Libraries and Heritage 22 nd July 2014 Libraries 1.

Libraries In C++ its possible to create static libraries and shared libraries Static

Transform Coding - Overview Principle of block-wise transform coding Properties of orthonormal

Fourier Transform for Partial Differential Equations Introduction: Fourier Transform

Topic 5: Discrete-Time Fourier Transform (DTFT) o DT Fourier Transform o Overview of Fourier

Topic 9: The Laplace Transform o Introduction o Laplace Transform &amp; Examples o Region of

Xamarin One platform to rule them all? Erwin de Groot @ 040 coders .NET frameworks WPF UI SL

Continuous-time systems 1 March 2, 2015 Continuous-time systems 1 Linear differential equations

Carnegie Mellon Generating High Performance Generating High Performance General Size Linear

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Tsing nghua hua University versity Introduction Deep learning has widely used in lots of

Computatio ion Reuse in in DNNs by Exploiting Input Sim imilarity Marc Riera , Jose Maria

Lecture 20: Inheritance 2016-02-04 Prof. Dr. Andreas Podelski, Dr. Bernd Westphal 20

Normalization by Evaluation and the Foundations of Constructive Mathematics 1972 - 2009 Peter

Probability &amp; Statistics Intro / Review NEU 560 Jonathan Pillow Lecture 6, part II 1

Automatic Performance Tuning and Machine Learning Markus Pschel Computer Science, ETH Zrich

Malware Defense II TDDD17 Information Security, Second Course Alireza Mohammadinodooshan

Holographic models for QCD Elias Kiritsis University of Crete ( APC, Paris ) 1- Collaborators

Topic 9: The Laplace Transform o Introduction o Laplace Transform & Examples o Region of

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Probability & Statistics Intro / Review NEU 560 Jonathan Pillow Lecture 6, part II 1