SLIDE 25 Results (2/2)
Asynchronous approach implemented in POSIX C on MPPA Outperforms the OpenCL version by 33% Twice better using two DDRs (MPPA OpenCL currently supports
10 12 14 16 18 20
OPAL_async vs. OPAL OpenCL on MPPA Single DDR, duration = 1000 Cavity size Performance (MLUPS)
OPAL OpenCL, WG = 32x1x1, single DDR
OPAL_async inplace 3−depth : 29 % halo BW OPAL_async inplace 4−depth : 36 % halo BW OPAL_async outplace 3−depth : 36 % halo BW OPAL_async outplace 4−depth : 43 % halo BW
64 96 128 160
(a) Single-DDR
10 15 20 25 30 35 40 45
OPAL_async vs. OPAL OpenCL on MPPA Double DDR, duration = 1000 Cavity size Performance (MLUPS)
OPAL OpenCL, WG = 32x1x1, single DDR
OPAL_async inplace 3−depth : 29 % halo BW OPAL_async inplace 4−depth : 36 % halo BW OPAL_async outplace 3−depth : 36 % halo BW OPAL_async outplace 4−depth : 43 % halo BW
64 96 128 160 192 224
(b) Double-DDR Figure: OPAL async vs. OPAL OpenCL on MPPA
25 / 27