parallel eigensolver for graph spectral analysis on gpu
play

Parallel Eigensolver for Graph Spectral Analysis on GPU Yimin Liu - PowerPoint PPT Presentation

15-618 Final Project Parallel Eigensolver for Graph Spectral Analysis on GPU Yimin Liu Heran Lin yiminliu@andrew.cmu.edu lin1@andrew.cmu.edu Carnegie Mellon University May 11, 2015 Overview Undirected graph G = ( V , E ) Symmetric


  1. 15-618 Final Project Parallel Eigensolver for Graph Spectral Analysis on GPU Yimin Liu Heran Lin yiminliu@andrew.cmu.edu lin1@andrew.cmu.edu Carnegie Mellon University May 11, 2015

  2. Overview ◮ Undirected graph G = ( V , E ) ◮ Symmetric square matrix M associated with graph G (adjacency matrix A , graph Laplacian L , etc.) ◮ Eigenvalues of M encodes interesting properties of the graph Mx = λ x

  3. Eigendecomposition Overview ◮ Transform M to a symmetric tridiagonal matrix T m ◮ Calculate eigenvalues of T m ⇒ ⇒ (easy) Lanczos

  4. The Lanczos Algorithm for Tridiagonalization   α 1 β 2 ...   β 2 α 2   T m =   ... ...   β m     β m α m 1. v 0 ← 0 , v 1 ← norm-1 random vector, β 1 ← 0 2. for j = 1 , . . . , m ◮ w j ← Mv j ◮ α j ← w ⊤ j v j ◮ w j ← w j − α j v j − β j v j − 1 ◮ β j +1 ← � w j � 2 ◮ v j +1 ← w j /β j +1 Potential parallelism for CUDA: matrix-vector product , dot-product, SAXPY

  5. Challenges Characteristics of M ◮ Really sparse ◮ Skewed distribution of non-zero elements ◮ Example: power-law node degree distribution in social networks

  6. Compressed Sparse Row (CSR) Matrix-Vector Multiplication (SPMV) · · · Row 0 Row 1 Row 2 column index = ×

  7. Naive Work Assignment Thread 0 Thread 1 Thread 2 · · · Row 0 Row 1 Row 2 Row 0 Result ◮ Each thread is responsible for one row ◮ Work imbalance issues

  8. Warp-based Work Assignment Warp 0 Warp 1 Warp 2 · · · Row 0 Row 1 Row 2 Partial Sum Row 0 Result ◮ Each warp (32 threads) is responsible for one row ◮ Reduce partial sum in shared memory

  9. Warp-based Work Assignment for Row Groups Warp 0 Warp 1 · · · Row 0 Row 1 Row 2 Row 0 Result Row 1 Result ◮ Each warp is responsible for a group of rows ◮ Group size depending on the average row sparsity of the matrix

  10. Evaluation Environment Amazon Web Service EC2 g2.2xlarge ◮ NVIDIA GK104 GPU, 1,536 CUDA cores, with CUDA 7.0 Toolkit installed ◮ Intel Xeon E5-2670 CPU, 8 cores, with gcc/g++ 4.8.2 installed, -O3 optimization switched on Competitive reference: SPMV implementation in cuSparse ( http://docs.nvidia.com/cuda/cusparse/ ) Dataset: generated scale-free networks based-on the Barab´ asi-Albert model, using Python NetworkX

  11. float SPMV Performance Similiar to cuSparse 9 Speedup of GPU SPMV over CPU 8 7 6 5 Group SPMV 4 cuSparse SPMV Naive SPMV 3 0 400 800 1 , 600 3 , 200 Graph Node Count ( × 10 3 )

  12. double SPMV Performance Better than cuSparse 11 Speedup of GPU SPMV over CPU 10 9 8 7 6 Group SPMV cuSparse SPMV 5 Naive SPMV 4 0 400 800 1 , 600 3 , 200 Graph Node Count ( × 10 3 )

  13. Real-world Graphs ◮ as-Skitter: ∼ 1,700,000 nodes, ∼ 11,000,000 edges ◮ cit-Patents: ∼ 3,800,000 nodes, ∼ 17,000,000 edges Converted to symmetric double adjacency matrices Data source: SNAP ( http://snap.stanford.edu/data/index.html )

  14. SPMV Better than cuSparse on Large Real-world Graphs Speedup of GPU SPMV over CPU 11 . 6 Group SPMV 12 10 . 8 cuSparse SPMV Naive SPMV 10 7 . 5 7 . 5 8 7 . 4 6 4 2 . 5 2 as-Skitter cit-Patents Real-world Graph

  15. Faster Eigenvalue Solver on GPU Running Time of Eigensolvers (sec) 40 GPU Eigensolver CPU Eigensolver 31 . 8 30 20 9 10 3 . 1 1 . 6 0 as-Skitter cit-Patents Real-world Graph

  16. Discussion SLEPc ( http://slepc.upv.es ) ◮ A state-of-the-art parallel CPU framework using MPI for sparse matrix eigenvalues solving ◮ Took 84.9 sec to solve 10 largest eigenvalues for the cit-Patents graph, while we took only 31.8 sec on CPU ◮ Unfair to compare? ◮ Many variants of the Lanczos algorithm ◮ Accuracy v.s. performance tradeoff

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend