Determinant Computation on the GPU using the Condensation Method - PowerPoint PPT Presentation

Determinant Computation on the GPU using the Condensation Method Sardar Anisul Haque Marc Moreno Maza Ontario Research Centre for Computer Algebra University of Western Ontario, London, Ontario AMMCS 2011, Waterloo, July 25, 2011 Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 1 / 1

Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 2 / 1

Plan Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 3 / 1

Dodgson’s condensation Algorithm Example of a condensation step: � � 0 − 1 2 � � � � − 1 − 5 8 � � � � 1 1 − 4 � � = > � � � � � � 0 − 1 − 1 2 � � � � � � � � � � � � − 1 − 5 − 5 8 � � � � � � � � � � � � � � � � − 1 − 5 − 5 8 � � � � � � � � � � � � 1 1 1 − 4 � � � � � � Reference: C. L. Dodgson, Condensation of Determinants , Proceedings of the Royal Society of London, 15(1866), 150-155. Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 4 / 1

Dodgson’s condensation Algorithm (cont.) Condensation step (cont.) � � 0 − 1 2 � � � � − 1 − 5 8 � � � � 1 1 − 4 � � = > � � − 1 2 � � � � 4 12 � � = − 20 And the determinant is: − 20 / − 5 = 4. Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 5 / 1

Salem and Said’s condensation Algorithm Condensation step with the same example: � � − 1 0 2 � � � � − 1 − 5 8 � � � � 1 1 − 4 � � = > � � � � � � 0 − 1 − 1 2 � � � � � � � � � � � � − 1 − 5 − 5 8 � � � � � � � � � � � � � � � � 0 − 1 − 1 2 � � � � � � � � � � � � 1 1 1 − 4 � � � � � � A formula is needed before concluding (see next slide). Reference: Abdelmalek Salem, and Kouachi Said, Condensation of Determinants , http://arxiv.org/abs/0712.0822. Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 6 / 1

Salem and Said’s condensation Algorithm (cont.) The input of a condensation step is a matrix A = ( a i , j | 0 ≤ i , j ≤ n − 1 ) of order n > 2. It produces a matrix B = ( b i , j | 0 ≤ i , j ≤ n − 1 ) of order n − 1 such that a 0 ,ℓ a 0 , j + 1 � � b i , j = � � a i + 1 ,ℓ a i + 1 , j + 1 � � � � for j ≥ ℓ and by b i , j = − a i + 1 , j a 0 ,ℓ for j < ℓ . The key relation between A and B is the following: det ( A ) = det ( B ) / ( a 0 ,ℓ ) n − 2 Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 7 / 1

Salem and Said’s condensation Algorithm (cont.) Returning to our example, we obtain: � � 0 − 1 2 � � � � − 1 − 5 8 � � � � 1 1 − 4 � � = > � � − 1 2 � � � � 1 2 � � = > − 4 So the determinant is: − 4 / ( − 1 ) 3 − 2 = 4. Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 8 / 1

Complexity estimates for Salem and Said’s Algorithm For the usual RAM model, In the worst case, the work is n 3 − 3 / 2 n 2 + 1 / 2 n − 3 coefficient operations. Asymptotically, on ( Z , L ) ideal cache, the number of cache misses is in the order of ( n − Z ) n 2 − n + Z 2 − Z + Zn + 1 + 4 L � � L Hence, the ratio between the two is L , similarly to Gaussian Elimination. However, the condensation method is more data-oblivious which is good for the hardware scheduling of a GPU. Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 9 / 1

Plan Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 10 / 1

Data mapping Each condensation step is performed by one kernel call. No data 1 copied back to the host until n = 2. At each condensation step, the input A and output B are stored in 2 global memory. Shared memory is used for efficiency issues. Salem and Said’s Algorithm suggest to store A and B in column 3 major layout . We use a 1-D grid of 1-D thread blocks . 4 With T threads per block and t elements of B written per thread, 5 ⌈ ( n − 1 ) 2 / ( Tt ) ⌉ blocks are required. For t = 4 and n > 200 this leads to about 10 , 000 threads in flight. The j -th thread in the i -th block computes-and-writes 6 B [ Ttj + it , Ttj + it + 1 , . . . Ttj + it + t − 1 ] . Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 11 / 1

Finding the ℓ -th column: finite field case A condensation step produces a matrix B = ( b i , j | 0 ≤ i , j ≤ n − 1 ) of order n − 1 such that a 0 ,ℓ a 0 , j + 1 � � b i , j = � � a i + 1 ,ℓ a i + 1 , j + 1 � � � � for j ≥ ℓ and by b i , j = − a i + 1 , j a 0 ,ℓ for j < ℓ . Recall that we have det ( A ) = det ( B ) / ( a 0 ,ℓ ) n − 2 The above formula requires a 0 ,ℓ to be the first non-zero on the first row: we call it the pivot. It is computed by one kernel call. The successive pivots are in the global memory of GPU and used to compute the determinant of the original matrix. Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 12 / 1

Finding the ℓ -th column: floating point number Case On the first row, we choose the element p whose absolute value is the closest to 1: we call it the pivot. Then we divide each element of the first row by p and we have det ( A ) = det ( B ) ∗ p The successive pivots are stored in the GPU global memory. After all condensation steps are completed, the pivots are multiplied together so as to avoid overflow/underflow, if possible: Step 1 L 1 := { p ∈ Pivots | − 1 ≤ p ≤ 1 } ; L 2 := { p ∈ Pivots | p �∈ L 1 } ; m := 1; Step 2 While L 1 and L 2 not empty do m := m pop ( L 1 ) pop ( L 1 ) Step 3 Finish wih the non-empty stack, if any. Sardar Anisul Haque (UWO) Determinant Computation on the GPU using the Condensation Method AMMCS 2011 13 / 1

Determinant Computation on the GPU using the Condensation Method - PowerPoint PPT Presentation

Determinant Computation on the GPU using the Condensation Method Sardar Anisul Haque Marc Moreno Maza Ontario Research Centre for Computer Algebra University of Western Ontario, London, Ontario AMMCS 2011, Waterloo, July 25, 2011 Sardar

d i E Determinant of a matrix a l l u d Dr. Abdulla Eid b A College of Science . r D

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Bose-Einstein Condensation Bose-Einstein Condensation and Superfluidity of and Superfluidity of

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Tracking using Goal CONDENSATION: Model-based visual tracking in dense Conditional Density

Tracking using CONDENSATION: Conditional Density Propagation M. Isard and A. Blake,

Determinants Every n n matrix A has an associated scalar value called the determinant of A ,

On Hadamards Maximal Determinant Problem Judy-anne Osborn MSI, ANU April 2009 Judy-anne

d i E Determinant of a matrix a l l u d Dr. Abdulla Eid b A College of Science . r D

Picard categories, determinant functors and K -theory Fernando Muro Universitat de Barcelona,

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

Chapter 9: Gaussian Elimination ES 240: Scientific and Engineering Computation. Chapter 9:

Elliptic hypergeometric functions and elliptic difference Painlev e equation Masatoshi NOUMI

Toeplitz determinants, Painlev equations, and special functions Part I: an operator approach

The Bipartite Matching Problem Math 482, Lecture 21 Misha Lavrov March 25, 2020 Bipartite graph

Rectangular Kronecker coefficients and plethysms in GCT Christian Ikenmeyer Christian Ikenmeyer

18. Change of variables Question 18.1. What is the area of the ellipse x y 2 2 +

Homework 1: Gibbsian Line Ensembles Lecturer: Ivan Corwin TA: Xuan Wu, Promit Ghosal 1( a ) .

Urban proximity as a determinant of rural self-employment profitability: Evidence from Ethiopia

Determinant Computation on the GPU using the Condensation Method - PowerPoint PPT Presentation

Determinant Computation on the GPU using the Condensation Method Sardar Anisul Haque Marc Moreno Maza Ontario Research Centre for Computer Algebra University of Western Ontario, London, Ontario AMMCS 2011, Waterloo, July 25, 2011 Sardar

d i E Determinant of a matrix a l l u d Dr. Abdulla Eid b A College of Science . r D

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Bose-Einstein Condensation Bose-Einstein Condensation and Superfluidity of and Superfluidity of

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Tracking using Goal CONDENSATION: Model-based visual tracking in dense Conditional Density

Tracking using CONDENSATION: Conditional Density Propagation M. Isard and A. Blake,

Determinants Every n n matrix A has an associated scalar value called the determinant of A ,

On Hadamards Maximal Determinant Problem Judy-anne Osborn MSI, ANU April 2009 Judy-anne

d i E Determinant of a matrix a l l u d Dr. Abdulla Eid b A College of Science . r D

Picard categories, determinant functors and K -theory Fernando Muro Universitat de Barcelona,

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Super GPU &amp; Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

Chapter 9: Gaussian Elimination ES 240: Scientific and Engineering Computation. Chapter 9:

Elliptic hypergeometric functions and elliptic difference Painlev e equation Masatoshi NOUMI

Toeplitz determinants, Painlev equations, and special functions Part I: an operator approach

The Bipartite Matching Problem Math 482, Lecture 21 Misha Lavrov March 25, 2020 Bipartite graph

Rectangular Kronecker coefficients and plethysms in GCT Christian Ikenmeyer Christian Ikenmeyer

18. Change of variables Question 18.1. What is the area of the ellipse x y 2 2 +

Homework 1: Gibbsian Line Ensembles Lecturer: Ivan Corwin TA: Xuan Wu, Promit Ghosal 1( a ) .

Urban proximity as a determinant of rural self-employment profitability: Evidence from Ethiopia

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,