Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs
ISPASS 2017
25th April
Santa Rosa, California
Saumay Dublish, Vijay Nagarajan, Nigel Topham The University of Edinburgh
Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs Saumay - - PowerPoint PPT Presentation
Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs Saumay Dublish, Vijay Nagarajan, Nigel Topham The University of Edinburgh ISPASS 2017 25 th April Santa Rosa, California Multithreading on GPUs Hardware
ISPASS 2017
25th April
Santa Rosa, California
Saumay Dublish, Vijay Nagarajan, Nigel Topham The University of Edinburgh
2 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core Core Core Core
Host CPU to GPU Hardware Scheduler Cores hide memory latencies with concurrent execution
3 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core Core Core Core
Bandwidth Bottleneck
Host CPU to GPU Hardware Scheduler
Appear in critical path
4 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core Core Core Core
Bandwidth filtering Bandwidth filtering
High cache miss rates Small caches High multithreading Distributed Bandwidth Bottleneck
5 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core Core Core Core
Bandwidth filtering Bandwidth filtering
L2 roundtrip latency
(2-3x higher) Distributed Bandwidth Bottleneck High cache miss rates Small caches High multithreading
6 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core Core Core Core
Bandwidth filtering Bandwidth filtering
L2 roundtrip latency
Identify and mitigate bottlenecks across the memory hierarchy High cache miss rates Small caches High multithreading
7 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
8 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
9 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
10 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Performance plateau Latency tolerance Performance versus Latency curve for memory-intensive benchmarks Latency appears in the critical path
11 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Performance plateau [ 120 cycles , 220 cycles ]
12 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Performance plateau Baseline memory latencies critically higher than 1. performance plateau latencies Baseline memory latencies critically higher than 2. ideal access latencies to L2/DRAM
[ 120 cycles , 220 cycles ]
Baseline Memory Latencies
Far from saturation (theoretically possible) Practically possible to improve performance
1x
13 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
2.37x
14 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
2.37x 1.15x
15 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core
L1 access queue L2 access queue DRAM access queue
16 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core
HR L2 MSHR
17 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core
L1 MSHR L2 MSHR
Structural Hazard
FULL
MISS
HIT?
18 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core
L2 MSHR
STALL
Independent compute?
Core
19 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
L1 MSHR L2 MSHR HR 11% 41%
20 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
L1 MSHR L2 MSHR 48% 48%
1. L1 MSHR : 41% (Structural Hazards) 2. L2 back pressure : 48% (Back pressure)
21 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core
L1 MSHR L2 MSHR 42% 42%
Crossbar 1. (response path) : 42% (Back pressure) DRAM 2. : 35% (Back pressure)
35%
22 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core
L1 MSHR L2 MSHR HR
23 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core
L1 MSHR L2 MSHR
24 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
4%
25 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
4% 4%
26 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
27 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
59%
28 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
11%
29 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
69% 59% 59% 4%
212% 226%
30 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
69% 11%
31 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
76%
32 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
90% 90%
33 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
34 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
35 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
36 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
37 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
38 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Core
32 bytes request 16 16 bytes es requ ques est 32 bytes response 48 bytes request
Point-to-point Wiring (bytes)
Control (8 bytes) Cache he line ne (128 8 bytes tes)
39 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
40 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
23%
41 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
25% 29%
42 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
11%
25% 25% 29%
43 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
44 Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs 25/04/2017
Saumay Dublish saumay.dublish@ed.ac.uk http://homepages.inf.ed.ac.uk/s1433370/