SLIDE 16 ICCS’13 – Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs 20
Optimization techniques for 3D-FWT
Experiments with 3D-FWT parameters for 3 GPUs
Execution
Run on 64 frames, each of them of size:
Times
512x512 1024x1024 2048x2048 Block size 64 128 192 256 64 128 192 256 64 128 192 256 Tesla C870
58.68 56.28 53.51 58.68 225.74 214.36 209.01 217.21 889.83 841.47 840.14 850.26
Tesla C2050
35.33 53.17 32.13 33.59 122.12 115.02 110.88 113.32 467.50 438.46 427.69 433.84
FirePro V5800 130.06 135.87 131.29 114.87 452.95 346.29 313.35 307.54 2123.60 1496.27 1284.56 1217.59
- The optimization engine studies the problem for different block or
work-group sizes
- Selects 192 in the Tesla C870 and Fermi C2050 (optimal)
- Selects 256 for the ATI FirePro (optimal)