linear transform libraries
play

Linear Transform Libraries Yevgen Voronenko, Frdric de Mesmay, and - PowerPoint PPT Presentation

Computer Generation of General Size Linear Transform Libraries Yevgen Voronenko, Frdric de Mesmay, and Markus Pschel Carnegie Mellon University, Pittsburgh, USA PROBLEM AND MOTIVATION


  1. Computer Generation of General Size Linear Transform Libraries Yevgen Voronenko, Frédéric de Mesmay, and Markus Püschel Carnegie Mellon University, Pittsburgh, USA

  2. PROBLEM AND MOTIVATION http://4.bp.blogspot.com/_LkcgSIEg0jw/TOi0XygeC4I/AAAAAAAAAH0/K0FyPn28gtc/s1600/Confuse.jpeg 2

  3. Situation Today Numerical Library hand written library e.g. FFTW Codelet auto generated base cases for fixed size Codelet (Advanced Libraries) Codelet Not used in practice http://en.wikipedia.org/wiki/File:DLL_icon_on_Windows_Vista.png 3

  4. Problem automatic new plattform arrives generation optimize & implement reoptimize & reimplement for spezific plattform on new plattform http://en.wikipedia.org/wiki/File:DLL_icon_on_Windows_Vista.png 4 http://en.wikipedia.org/wiki/Intel

  5. Goal Platform Parameters Cacheline size Optimized #Cores , … Library Generator Transform Description DFT, DCT, … http://www.anilgears.com/images1/gear6.jpg 5

  6. BACKGROUND ON LINEAR TRANSFORMS http://www.dimelo.com/img/style/analytics.jpg 6

  7. Example DFT 𝑧 = 𝐸𝐺𝑈 𝑜 · 𝑦 Discrete Fourier Transform (DFT): 𝑜 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 base case: 𝐸𝐺𝑈 2 = 1 1 −1 1 Recursive Cases 7

  8. Example: Implementation (Naïve) 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑜 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 void dft( int n, cpx *y, cpx *x) { int k = choose_factor(n); int m = n/k; cpx *t1 = Permute x with L(n,k); // t2 = (I_k tensor DFT_m)*t1 for ( int i=0; i<k; ++i) memory to dft(m, t2 + m*i, t1 + m*i); memory copy // t3 = diag( d(j) )*t2 for ( int i=0; i<n; ++i) SLOW! t3[i] = d(i) * t2[i]; // y = (DFT_k tensor I_m)*t3, for ( int i=0; i<m; ++i) dft_str(k, m, y + i, t3 + i); } // to be implemented void dft_str( int n, int str, cpx *Y, cpx *X); 8

  9. DFT – Matrix Representation 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑜 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 Input Permutation Diagonal Matrix Recursive Cases graphics from paper 9

  10. DFT – Dataflow Representation 1.1 2.1 1 3.1 4.1 1.2 2.2 2 3.2 4.2 1.3 2.3 3 3.3 4.3 1.4 2.4 4 3.4 4.4 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑜 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 10

  11. DFT – Improvement Idea 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑜 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 𝑜 : A scalar multiplier for each entry 𝑈 𝑛 𝑜 : A permutation function 𝑀 𝑙 Merge 𝑜 𝑜 𝐽 𝑙 ⊗ 𝑬𝑮𝑼 𝒏 𝑀 𝑙 𝑬𝑮𝑼 𝒐 = 𝑬𝑮𝑼 𝒍 ⊗ 𝐽 𝑛 𝑈 𝑛 “Scaled DFT” “ Strided DFT” 11

  12. Example – Implementation (FFTW) n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m void dft( int n, cpx *y, cpx *x) { int k = choose_factor(n); // t1 = (I_k tensor DFT_m)L(n,k)*x for ( int i=0; i < k; ++i) dft_str(m, k, 1, t1 + m*i, x + m*i); // y = (DFT_k tensor I_m) diag(d(j)) for ( int i=0; i < m; ++i) dft_scaled(k, m, precomp_d[i], y + i, t1 + i); } // to be implemented void dft_str( int n, int istr, int ostr, cpx *y, cpx*x); void dft_scaled( int n, int str, cpx *d, cpx *y, cpx*x); Challenge: How to find those recursive functions? 12

  13. Example - Observation Not yet parallelized and vectorized Many other recursive functions needed Optimizations are to be identified and implemented by hand 13 http://bookszeus.files.wordpress.com/2011/09/hands-on-keyboard.jpg

  14. LIBRARY GENERATOR http://www.artistsvalley.com/images/icons/Database%20Application%20Icons/Data%20Generator/ 14 256x256/Data%20Generator.jpg

  15. Transform Description Formulated in special language (SPL) New set of breakdown rules for each transform Can be complicated 15

  16. Library Generation for Transforms dft(), dft_str() dft_scaled() Recursion Steps Breakdown Rules Library 𝐄𝐆𝐔 𝐨 = n 𝐄𝐆𝐔 𝐥 ⊗ I m T m n Base Cases I k ⊗ 𝐄𝐆𝐔 𝐧 L k DFT 2 to DFT 16 Generator http://www.anilgears.com/images1/gear6.jpg 16

  17. From SPL to Σ -SPL DFT n Problem A Linear Transform: Breakdown rules in SPL: Breakdown Rule n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m Rewritten breakdown rules into Σ -SPL Σ -SPL 17

  18. Difference of SPL and Σ -SPL Σ -SPL = Extended SPL + Loop constructs Σ : iterative matrix sum + explicit I/O: gather G(f) and scatter S(f) + index mapping functions M(f) for strides (permutations) How to rewrite SPL to Σ -SPL? 𝑔: 𝐵 → 𝐶 Functions: A: Integer interval [1..n] B: Integer interval [1..n] or ℂ, ℝ Example: Permutation 𝑔 𝑜→𝑜 (𝑐𝑗𝑘𝑓𝑑𝑢𝑗𝑤𝑓) 18

  19. Σ -SPL – Gather and Scatter Matrices Parameterized matrices 1 𝑈 = 𝐻 𝑔 ≔ 𝑓 𝑔 0 , … , 𝑓 𝑔 𝑜−1 ⋱ 1 1 ⋱ 1 𝑧 = 𝐻 𝑔 ⋅ 𝑦 ⇔ 𝑧 𝑗 = 𝑦 𝑔 𝑗 𝑇 𝑔 ≔ 𝐻 𝑔 𝑈 = 𝑧 = 𝑇 𝑔 ⋅ 𝑦 ⇔ 𝑧 𝑔 𝑗 = 𝑦 𝑗 19

  20. Inside the Tensor Product n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m Y = 𝐽 𝑙 ⊗ 𝐵 x 𝑍 𝑌 1 𝐵 ⋯ 0 1 ⋮ ⋮ = ⋮ ⋱ ⋮ 𝑍 𝑌 𝑜 0 ⋯ 𝐵 𝑜 𝑍 𝑌 1 𝐵 ⋯ 0 0 ⋯ 0 1 ⋮ ⋮ = + ⋯ + ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 𝑍 𝑌 𝑜 0 ⋯ 0 0 ⋯ 𝐵 𝑜 Iterative Matrix Sum 𝑍 𝑗 = 𝐵 ⋅ 𝑌 𝑗 Partition of vectors: 20

  21. Inside the Tensor Product n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m j=0 j=0 0 0 A : : loop j=k-1 j=k-1 k-1 k-1 kernel Y X 𝑙−1 𝐽 𝑙 ⊗ 𝐵 = 𝑇 0𝐵 𝐻 0 + ⋯ + 𝑇 𝑙−1 𝐵 𝐻 𝑙−1 = 𝑇 𝑘 𝐵 𝐻 𝑘 𝑘=0 21

  22. Σ -SPL – Rewriting Rules (Extract) SPL to Σ -SPL Loop merging 22

  23. From SPL to Σ -SPL n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m 𝑙−1 𝑜 = m 𝐽 𝑙 ⊗ 𝐸𝐺𝑈 𝑛 𝑀 𝑙 𝑇 ℎ 𝑘𝑛,1 𝐸𝐺𝑈 𝑛 𝐻(ℎ 𝑘𝑛,1 ) perm l k 𝑘=0 𝑙−1 = 𝑇 ℎ 𝑘𝑛,1 𝐸𝐺𝑈 𝑛 𝐻(ℎ 𝑘,𝑙 ) 𝑘=0 𝑛−1 𝑜 = 𝐸𝐺𝑈 𝑙 ⊗ 𝐽 𝑛 𝑈 𝑇 ℎ 𝑘,𝑛 𝐸𝐺𝑈 𝑙 𝑒𝑗𝑏𝑕(𝑔 ∘ ℎ 𝑗,𝑙 )𝐻(ℎ 𝑘,𝑛 ) 𝑛 𝑘=0 23

  24. Recursion Step Closure Implement Transforms & Breakdown Rules expand & rewrite Parameterize Parametrize Implement find params according to Yes constraints New (Matrix dimensions ) R-Steps ? No 24

  25. Recursion Steps - Example n n I k ⊗ 𝐄𝐆𝐔 𝐧 L k 𝐄𝐆𝐔 𝐨 = 𝐄𝐆𝐔 𝐥 ⊗ I m T m dft Function Tag to mark recursions 𝑙−1 𝑜 = 𝐽 𝑙 ⊗ 𝐸𝐺𝑈 𝑛 𝑀 𝑙 𝑇 ℎ 𝑘𝑛,1 𝐸𝐺𝑈 𝑛 𝐻(ℎ 𝑘,𝑙 ) 𝑘=0 dft_str 𝑛−1 𝑜 = 𝐸𝐺𝑈 𝑙 ⊗ 𝐽 𝑛 𝑈 𝑇 ℎ 𝑘,𝑛 𝐸𝐺𝑈 𝑙 𝑒𝑗𝑏𝑕(𝑔 ∘ ℎ 𝑗,𝑙 )𝐻(ℎ 𝑘,𝑛 ) 𝑛 𝑘=0 dft_scaled Expand tag to scatter / gather 25

  26. Recursion Steps – Graph dft dft_str dft_scale Stride = 1 dft_scale_base 26

  27. Library Generation for Transforms dft(), dft_str() dft_scaled() Recursion Steps Code Generation Breakdown Rules Library Parameter Handling 𝐄𝐆𝐔 𝐨 = n 𝐄𝐆𝐔 𝐥 ⊗ I m T m n Base Cases I k ⊗ 𝐄𝐆𝐔 𝐧 L k DFT 2 to DFT 16 Generator http://www.anilgears.com/images1/gear6.jpg 27

  28. Code Translation (Extract) Σ -SPL Code Rewriting Rules Code ( 𝐻 𝑔 𝑜→𝑂 , 𝑧, 𝑦 ) for(j=0..n-1) y[j] = x[f(j)]; Code ( 𝑒𝑗𝑏𝑕 𝑔 𝑜→ℂ , 𝑧, 𝑦 ) for(j=0..n-1) y[j] = f(j)*x[j]; 𝑙−1 Code ( 𝑁 , 𝑧, 𝑦 ) 𝑘 𝑘=0 for(j=0..k-1) Code ( 𝑁 𝑘 , 𝑧, 𝑦 ) http://www.carlosag.net/images/codeTranslatorLogo.gif 28

  29. Parameter Handling Precomputed Function Parameters Parameters (Cold) (Hot) “Initial - Parameter” “Runtime - parameter” Large Parameter Set u1,u2, u3, u4, u5, u6, u7, u8, f1 𝑣1→𝑣2 𝑣1→ℂ 𝐸𝐺𝑈 𝑣1 = 𝑇 ℎ 𝑣3,1 𝐸𝐺𝑈 𝑔1 ⊗ 𝐽 𝑣1/𝑔1 𝑒𝑗𝑏𝑕 𝑞𝑠𝑓 𝑒 𝑣1/𝑔1 𝑣1 𝐻(ℎ 𝑣7,𝑣8 𝑣1→𝑣6 ) ⋅ 𝐽 𝑔1 ⊗ 𝐸𝐺𝑈 𝑣1/𝑔1 𝑀 𝑔1 29

  30. Library Generation Generated Functions Actual Code Higher-order descriptor Classes Identified recursion DFT, DFT_STR, steps DFT_SCALED Cold parameters Constructor arguments DFT size n DFT(int n,…) Hot parameters Method arguments Vectors x,y DFT->compute(x, y) Base cases Constants DFT2 [1 1; 1 -1] 30

  31. Library Generation: Platform Parallel & vector code New rewriting rules with # Processors, Cache-line size, vector length Additional recursion steps: DFT scalar: 4 DFT vectorized+parallel: 8 31

  32. RESULTS http://britishmastersaf.files.wordpress.com/2011/05/results.jpg 32

  33. Test Plattform CPU 2x dual core 3GHz Intel Xeon 5160 processors Compiler: Intel C/C++ Compiler 10.1 and SUN JDK 1.6.0 Threading: OpenMP pragmas Library Generation Time: between 10 and 60 minutes 33

  34. Scalar Just the base approx. same algorithms cases up to 64 and optimization 34 Source: Paper

  35. 4 cores used 1 core used 2 cores used 35 Source: Paper

  36. Extra passes due to conversion into RDFT 36 Source: Paper

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend