1
Running PEPPHER benchmarks on top
- f the StarPU runtime system
22th January 2011
Cédric Augonnet Nicolas Collin Nathalie Furmento Raymond Namyst Samuel Thibault
INRIA Bordeaux, LaBRI, Université de Bordeaux
Running PEPPHER benchmarks on top of the StarPU runtime system - - PowerPoint PPT Presentation
1 Running PEPPHER benchmarks on top of the StarPU runtime system Cdric Augonnet Nicolas Collin Nathalie Furmento Raymond Namyst Samuel Thibault INRIA Bordeaux, LaBRI, Universit de Bordeaux 22 th January 2011 2 The StarPU runtime system
1
22th January 2011
Cédric Augonnet Nicolas Collin Nathalie Furmento Raymond Namyst Samuel Thibault
INRIA Bordeaux, LaBRI, Université de Bordeaux
2
Parallel Compilers HPC Applications Runtime system Operating System CPU Parallel Libraries
Motivations
GPU …
3
Motivations
A = A+B
M. M. CPU CPU CPU CPU M. GPU GPU CPU CPU CPU CPU M. M. B M. GPU M. GPU A M. B A
4
Parallel Compilers HPC Applications StarPU Drivers (CUDA, OpenCL) CPU Parallel Libraries
Memory Management
GPU …
5
Parallel Compilers HPC Applications StarPU Drivers (CUDA, OpenCL) CPU Parallel Libraries
– e.g. CUDA and/or CPU
Task scheduling
GPU …
cpu gpu spu
6
7
8
Background
9
Productivity
// Sequential Tile Cholesky FOR k = 0..TILES-1 DPOTRF(A[k][k]) FOR m = k+1..TILES-1 DTRSM(A[k][k], A[m][k]) FOR n = k+1..TILES-1 DSYRK(A[n][k], A[n][n]) FOR m = n+1..TILES-1 DGEMM(A[m][k], A[n][k], A[m][n]) // Hybrid Tile Cholesky FOR k = 0..TILES-1 starpu_Insert_Task(DPOTRF, …) FOR m = k+1..TILES-1 starpu_Insert_Task(DTRSM, …) FOR n = k+1..TILES-1 starpu_Insert_Task(DSYRK, …) FOR m = n+1..TILES-1 starpu_Insert_Task(DGEMM, …)
10
11
12
13
14
15
Perspective
16
17
Background
18
Methodology
19
Post-processing
20
– 1.4s per iteration
– 0.15s per iteration
– 53ms per iteration
– 28ms per iteration
Preliminary results
21
Perspective
22
23