Real-Time GPU Management
Heechul Yun
1
Real-Time GPU Management Heechul Yun 1 This Week Topic: General - - PowerPoint PPT Presentation
Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing Unit (GPGPU) management Today GPU architecture GPU programming model Challenges Real-Time GPU management 2 History GPU
1
2
3
4
4992 GPU cores 4 CPU cores Graphic DRAM Host DRAM PCIE 3.0
5
GPUSync: A Famework for R eal-Time GP Management Nvidia Tegra K1 Core1 Shared DRAM Shared Memory Controller GPU cores Core2 Core3 Core4
6
7
Image credit: T. Amert et al., “GPU Scheduling on the NVIDIA TX2: Hidde n Details Revealed,” RTSS17
8
9
10
“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University
11
“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University
12
“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University
13
“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University
14
“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University
15
“From Shader Code to a Teraflop: How GPU Shader Cores Work”, Kayvon Fatahalian, Stanford University
16
17
Source: http://www.sdsc.edu/us/training/assets/docs/NVIDIA-02-BasicsOfCUDA.pdf
18
Source: http://www.sdsc.edu/us/training/assets/docs/NVIDIA-02-BasicsOfCUDA.pdf
19
Source: http://www.sdsc.edu/us/training/assets/docs/NVIDIA-02-BasicsOfCUDA.pdf
20
Source: http://www.sdsc.edu/us/training/assets/docs/NVIDIA-02-BasicsOfCUDA.pdf
21
User buffer Kernel buffer
4992 GPU c
4 CPU cores Graphic DR AM Host DRAM PCIE 3.0
GPU CPU
22
4992 GPU cores 4 CPU cores Graphic DRAM Host DRAM PCIE 3.0
480 GB/s 25 GB/s 16 GB/s
23
PTask: Operating System Abstractions To Manage GPUs as Compute Devices, SOSP'11
OS executive capture GPU Run! camdrv GPU driver
PCI-xfer PCI-xfer
xform
copy to GPU copy from GPU PCI-xfer PCI-xfer
filter
copy from GPU
detect
IRP
HIDdrv
read() copy to GPU write() read() write() read() write() read()
capture xform filter detect
#> capture | xform | filter | detect &
24
Acknowledgement: This slide is from the paper’s author’s slide
PTask: Operating System Abstractions To Manage GPUs as Compute Devices, SOSP'11
26
GPUSync: A Famework for R eal-Time GP Management Nvidia Tegra X2 Core1
PMC
Shared DRAM Shared Memory Controller GPU cores Core2
PMC
Core3
PMC
Core4
PMC
16 GB/s
27
3 GPU 2 1 GPU App Co-runners CPU
Waqar Ali, Heechul Yun. Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms. Euromicro Conference on Real-Time Systems (ECRTS), 2018 [pdf] [arXiv] [ppt] [code]
28
29
30
31
32
Acknowledgement: This slide is from the paper author’s slide Gdev: First-class gpu resource management in the operating system. In ATC, 2012.
33
ents,” in USENIX ATC, 2011
34
35
Acknowledgement: This slide is from the paper author’s slide Gdev: First-class gpu resource management in the operating system. In ATC, 2012.
36
Acknowledgement: This slide is from the paper author’s slide Gdev: First-class gpu resource management in the operating system. In ATC, 2012.
37
Acknowledgement: This slide is from the paper author’s slide Gdev: First-class gpu resource management in the operating system. In ATC, 2012.
38 (*) GPES: A Preemptive Execution System for GPGPU Computing, RTAS'14
39
http://www.ece.ucr.edu/~hyoseung/pdf/rtcsa17-gpu-server-slides.pdf
40
http://www.ece.ucr.edu/~hyoseung/pdf/rtcsa17-gpu-server-slides.pdf
41
http://www.ece.ucr.edu/~hyoseung/pdf/rtcsa17-gpu-server-slides.pdf
42
http://www.ece.ucr.edu/~hyoseung/pdf/rtcsa17-gpu-server-slides.pdf
43
http://www.ece.ucr.edu/~hyoseung/pdf/rtcsa17-gpu-server-slides.pdf
44
(*) AnandTech, “Preemption Improved: Fine-Grained Preemption for Time-Critical Tasks”
45
46
47
GPUSync: A Famework for R eal-Time GP Management Nvidia Tegra X2 Core1
PMC
Shared DRAM Shared Memory Controller GPU cores Core2
PMC
Core3
PMC
Core4
PMC
16 GB/s
48
49
50
51
52
Waqar Ali, Heechul Yun. Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms. Euromicro Conference on Real-Time Systems (ECRTS), 2018 [pdf] [arXiv] [ppt] [code]
53
cudaMalloc(...) cudaMemcpy(...) cudaMemcpy(...) kernel<<<...>>>(...) cudaFree(...) cudaLaunch () cudaSynchronize ()
54
55
56
57
58
59
60
61
62
63
64
65
66