SLIDE 13 Danilo De Donno Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy
Implementation (2a/3) Implementation (2a/3)
Supported sparse matrix formats Supported sparse matrix formats
CRS (Compressed Row Compressed Row Storage Storage)
HYB (hybrid ELLpack hybrid ELLpack-
COOrdinate format) Main modifications to the original code Main modifications to the original code
- double precision complex matrix support
double precision complex matrix support
- CUDA grid, register number, shared and
CUDA grid, register number, shared and texture memory exploitation optimized for texture memory exploitation optimized for double precision complex data. double precision complex data. SpMV SpMV – this CUDA kernel implements the Bell and Garland algorithm (*) which is the best performing code currently avaible for solving sparse matrix-vector product.
1 1 i i H i i
q A p q A p
(*) N. Bell and M. Garland: “Implementing sparse matrix Implementing sparse matrix-
- vector multiplication on throughput oriented
vector multiplication on throughput oriented processors processors”, In Supercomputing ’09, Nov. 2009.