Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System
Presentation by Dongjin Kim Ph.D. Student, CORE lab., Electrical Engineering, KAIST djkim@core.kaist.ac.kr October 1st, 2013 @ P2S2-2013
Dongjin Kim and Kyu-Ho Park
Tiled QR Decomposition and Its Optimization on CPU and GPU - - PowerPoint PPT Presentation
Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System Dongjin Kim and Kyu-Ho Park Presentation by Dongjin Kim Ph.D. Student, CORE lab., Electrical Engineering, KAIST djkim@core.kaist.ac.kr October 1 st , 2013 @ P2S2-2013
Presentation by Dongjin Kim Ph.D. Student, CORE lab., Electrical Engineering, KAIST djkim@core.kaist.ac.kr October 1st, 2013 @ P2S2-2013
Dongjin Kim and Kyu-Ho Park
2013-10-01 2 / 33 P2S2 2013
① Performance heterogeneity
② Explicit memory copy needed ③ GPGPUs expect a larger input than CPUs
3 P2S2 2013 2013-10-01
CPU
core core core core
CPU Memory CPU
core core core core
GPU
core core core core
GPU Memory
…
GPU
core core core core
GPU Memory
…
PCI express
utilization
4 2013-10-01 P2S2 2013
tile (E)
5 P2S2 2013 2013-10-01
UE UE UE UE UE UE UE
E E T T T
UT UT UT UT UT UT
T T
UT UT
E
UE
E E …
6 P2S2 2013 2013-10-01
faster than Triangulation or Elimination
much more tiles to be calculated
7 P2S2 2013 2013-10-01
<Single tile operation on GTX680> <The number of tiles to be operated>
speed, …
8 P2S2 2013 2013-10-01
<Single tile operation on GTX680> <The number of tiles to be operated>
parallel threads vs. comm.
9 P2S2 2013 2013-10-01
<Total operation time>
devices
decomposition
10 P2S2 2013 2013-10-01
11 P2S2 2013 2013-10-01
Main Others
Finish job early T/E UT/UE
12 P2S2 2013 2013-10-01
13 P2S2 2013 2013-10-01
The number of tiles, distributed to each device Time taken for each step
14 P2S2 2013 2013-10-01
Expected time for main computing device
15 P2S2 2013 2013-10-01
Expected time for
time
16 P2S2 2013 2013-10-01
time
17 P2S2 2013 2013-10-01
The number of tiles to be transferred Time taken for each step
Transfer speed
time
18 P2S2 2013 2013-10-01
Expected time for Triangulation and Elimination MT: Result Q matrices of Triangulation 2MT: Result Q matrices of Elimination
time
19 P2S2 2013 2013-10-01
Expected time for next column tiles
time
20 P2S2 2013 2013-10-01
performance
be processes on fixed time
21 P2S2 2013 2013-10-01
participating devices, distribute tiles, and migrate dependent data
threads for parallel
22 P2S2 2013 2013-10-01
23 P2S2 2013 2013-10-01
24 P2S2 2013 2013-10-01
Total operation time proportionally decreases
device
25 P2S2 2013 2013-10-01
26 P2S2 2013 2013-10-01
27 P2S2 2013 2013-10-01
28 P2S2 2013 2013-10-01