SLIDE 50 Future work
Future work: Interference-aware scheduling of complex GPU workloads
0 x sqrt norm 1 x sqrt norm 2 x sqrt norm 3 x sqrt norm 0 x sqrt mag 1 x sqrt mag 2 x sqrt mag 3 x sqrt mag 0 x conj 1 x conj 2 x conj 3 x conj 0 x sum ch 1 x sum ch 2 x sum ch 3 x sum ch 0 x mul 1 x mul 2 x mul 3 x mul 0 x div 1 x div 2 x div 3 x div 0 x add 1 x add 2 x add 3 x add 0 x mul c 1 x mul c 2 x mul c 3 x mul c 0 x add c 1 x add c 2 x add c 3 x add c 0 x elemmul 1 x elemmul 2 x elemmul 3 x elemmul
- Avg. execution time in [us]
sqrt norm sqrt mag conj sum ch mul div add mul c add c elemmul
49.0 55.7 63.1 74.7 49.0 51.4 67.1 75.1 49.0 50.7 65.6 74.5 48.9 48.4 49.3 55.6 48.9 57.7 80.3 98.8 48.9 58.1 78.7 99.1 48.9 53.8 69.9 80.3 48.9 51.0 66.0 73.4 48.9 50.8 66.1 74.2 48.9 59.0 82.3 99.7 10.2 11.6 16.1 21.0 10.2 21.4 31.4 40.2 10.2 19.4 29.0 37.1 10.3 11.3 18.7 26.0 10.2 23.6 35.2 48.5 10.2 22.0 29.8 42.1 10.2 23.6 30.6 38.6 10.2 19.0 28.0 35.7 10.3 19.1 28.5 36.3 10.2 22.7 35.8 48.8 12.0 13.6 19.5 27.5 12.0 25.2 38.9 47.9 12.0 24.1 40.7 51.4 12.0 13.1 22.6 31.0 12.0 39.1 58.4 75.8 12.0 30.6 46.9 67.2 12.0 35.3 48.4 61.7 12.0 24.7 43.4 55.7 12.0 24.1 41.2 53.4 12.0 35.2 58.9 74.9 2.9 3.4 3.9 7.9 2.9 5.3 13.3 17.0 2.9 4.5 11.9 15.1 2.9 4.3 6.1 12.2 2.9 8.6 15.7 19.3 2.9 8.1 13.9 17.7 2.9 7.9 13.7 16.9 2.9 4.4 11.8 14.7 2.9 4.3 11.7 14.5 2.9 8.4 14.8 18.7 26.1 29.4 36.3 38.1 26.1 37.7 45.2 54.9 26.1 39.1 48.9 59.8 26.1 28.6 35.3 39.4 26.1 52.5 78.1 103.6 26.1 46.9 70.0 97.2 26.1 41.8 53.9 69.8 26.1 38.7 47.3 59.8 26.1 39.5 49.3 60.0 26.1 56.4 82.6 105.6 30.4 38.7 49.6 52.8 30.4 46.9 52.6 61.7 30.4 47.4 57.8 67.9 30.4 38.3 45.2 47.7 30.4 56.4 80.5 105.6 30.3 55.7 78.5 104.1 30.4 51.5 61.1 73.8 30.4 48.1 58.1 68.1 30.4 48.1 58.7 68.5 30.4 59.8 84.7 107.9 14.1 20.9 30.3 31.4 14.0 34.3 42.2 50.6 14.0 35.0 44.2 54.2 14.1 23.2 31.8 34.3 14.0 42.0 65.0 86.1 14.0 35.3 54.8 76.8 14.0 38.7 50.0 62.9 14.0 35.0 43.2 54.1 14.0 34.8 44.1 54.9 14.0 43.3 66.5 85.2 12.5 14.2 20.2 28.2 12.5 25.7 39.7 48.8 12.5 24.9 42.0 52.5 12.5 13.8 23.1 31.3 12.5 39.5 60.4 80.6 12.4 30.8 49.0 70.4 12.4 35.1 46.6 60.0 12.4 25.2 41.7 52.1 12.4 24.9 41.8 52.9 12.4 37.4 61.3 79.9 12.2 13.8 19.7 28.1 12.2 25.3 39.3 48.2 12.2 24.1 41.2 51.0 12.2 13.3 22.6 31.0 12.2 39.7 59.2 78.0 12.2 30.8 47.8 68.7 12.2 35.7 46.9 61.0 12.2 25.0 41.9 53.8 12.2 24.7 41.7 53.0 12.2 36.3 60.1 77.3 31.2 34.3 37.7 41.1 31.2 38.0 46.6 55.9 31.2 38.2 48.8 59.0 31.2 33.0 35.3 41.5 31.2 57.3 83.0 108.7 31.2 52.7 78.4 104.6 31.2 45.3 58.8 73.3 31.1 38.1 49.2 61.0 31.1 38.1 49.4 60.3 31.1 61.7 87.9 110.3
1 2 3 4 5 6
- Avg. execution time ratio
Corresponds to algorithms listed on y-axis
Traditional memory model - Avg. execution time
x s q r t n
m 1 x s q r t n
m 2 x s q r t n
m 3 x s q r t n
m x s q r t m a g 1 x s q r t m a g 2 x s q r t m a g 3 x s q r t m a g x c
j 1 x c
j 2 x c
j 3 x c
j x s u m c h 1 x s u m c h 2 x s u m c h 3 x s u m c h x m u l 1 x m u l 2 x m u l 3 x m u l x d i v 1 x d i v 2 x d i v 3 x d i v x a d d 1 x a d d 2 x a d d 3 x a d d x m u l c 1 x m u l c 2 x m u l c 3 x m u l c x a d d c 1 x a d d c 2 x a d d c 3 x a d d c x e l e m m u l 1 x e l e m m u l 2 x e l e m m u l 3 x e l e m m u l Jitter compared to avg. execution time [%] sqrt norm sqrt mag conj sum ch mul div add mul c add c elemmul
0.6 6.9 7.9 17.2 0.5 3.3 3.1 5.5 2.3 2.6 5.8 8.0 0.6 1.3 1.4 2.6 0.6 3.0 3.2 3.4 0.6 3.9 3.4 3.7 0.4 3.0 6.0 9.2 0.3 3.5 5.1 5.5 0.3 3.4 5.0 7.3 0.4 3.3 2.9 2.5 6.6 5.5 9.1 9.9 4.7 7.3 5.6 5.6 7.2 9.9 15.9 17.1 5.9 5.7 10.1 9.9 4.7 7.5 4.5 5.9 4.7 8.0 6.6 7.0 4.7 13.3 14.9 19.5 5.3 8.6 11.0 13.9 5.9 9.9 12.9 15.9 5.0 4.1 4.5 6.2 4.0 4.7 8.2 11.6 5.1 7.6 8.1 8.7 5.1 8.0 14.5 12.8 4.0 4.6 8.5 9.3 4.0 11.0 3.1 4.2 4.0 10.5 5.8 5.2 4.0 15.9 14.0 10.8 4.0 7.9 11.4 10.9 4.0 7.2 12.1 11.7 4.0 2.7 3.2 4.5 9.8 7.6 7.4 10.1 9.8 6.0 14.2 17.8 9.8 10.0 17.8 21.5 9.8 6.7 5.2 12.1 9.8 8.9 11.2 8.4 9.8 11.9 13.6 14.2 9.8 10.1 17.7 15.1 9.8 3.6 17.6 17.4 9.8 3.7 15.0 17.6 9.8 7.2 8.9 10.3 1.1 2.6 9.6 8.4 1.1 6.8 8.1 7.4 1.1 9.9 13.0 9.6 1.6 3.2 6.9 5.6 1.1 0.9 0.6 0.9 1.1 1.7 3.0 3.5 1.1 6.9 2.6 4.6 1.1 10.0 9.5 10.8 1.1 9.7 12.0 11.4 1.1 1.4 1.2 1.5 2.6 3.7 9.9 18.5 2.6 4.8 6.0 5.4 3.1 6.7 10.0 8.7 2.6 4.3 5.4 6.0 3.0 2.6 1.7 1.3 2.7 6.7 0.9 1.2 3.0 10.9 9.6 5.6 2.7 8.1 9.9 9.3 3.0 8.3 10.4 10.9 3.1 1.6 1.1 1.5 3.4 7.7 14.3 13.8 3.4 9.3 9.2 8.3 4.3 15.1 13.7 11.8 3.4 5.5 9.1 7.8 3.4 8.3 2.8 3.4 3.4 12.9 3.5 3.6 3.4 15.7 15.0 10.9 3.4 16.5 11.1 10.7 3.4 15.6 13.8 13.4 3.4 3.3 2.6 3.8 3.8 4.5 8.7 13.5 3.9 7.3 8.5 7.9 3.9 6.4 12.7 10.7 3.9 5.3 6.9 8.7 3.9 9.7 2.9 3.2 4.6 9.7 4.6 4.2 4.6 18.1 14.1 12.8 3.9 6.1 10.1 11.7 3.9 7.4 12.0 12.0 3.9 2.1 2.6 4.3 3.9 5.3 7.8 13.3 3.9 7.6 8.5 8.6 4.7 6.0 13.0 10.5 4.5 4.3 7.8 10.3 3.9 9.8 3.4 4.1 4.7 12.5 4.7 4.9 4.7 18.4 15.7 12.3 4.7 7.7 11.8 13.1 2.9 7.0 13.8 13.4 4.7 3.1 2.9 4.2 0.5 0.9 2.5 4.7 0.5 1.6 3.7 5.2 0.5 2.0 7.5 8.6 0.5 0.8 1.2 4.7 0.7 1.1 0.7 0.8 0.5 1.8 2.7 2.7 0.7 2.8 2.2 3.1 0.8 1.3 7.9 7.6 0.5 1.1 8.9 9.0 0.5 0.8 0.5 0.7
100 101 Jitter ratio Corresponds to algorithms listed on y-axis
Traditional memory model - Jitter compared to avg. execution time
Experiments for Time-Predictable Execution of GPU Kernels OSPERT19 21 / 21