Understanding The Security
- f Discrete GPUs
Zhiting Zhu1, Sangman Kim1, Yuri Rozhanski2, Yige Hu1, Emmett Witchel1, Mark Silberstein2 1.The University of Texas at Austin 2.Technion-Israel Institute of Technology
Understanding The Security of Discrete GPUs Zhiting Zhu 1 , Sangman - - PowerPoint PPT Presentation
Understanding The Security of Discrete GPUs Zhiting Zhu 1 , Sangman Kim 1 , Yuri Rozhanski 2 , Yige Hu 1 , Emmett Witchel 1 , Mark Silberstein 2 1.The University of Texas at Austin 2.Technion-Israel Institute of Technology Outline Can GPUs
Zhiting Zhu1, Sangman Kim1, Yuri Rozhanski2, Yige Hu1, Emmett Witchel1, Mark Silberstein2 1.The University of Texas at Austin 2.Technion-Israel Institute of Technology
○ PixelVault ○ Attacking PixelVault
○ GPU driver attack ○ GPU microcode attack ○ IOMMU mitigation
2
Can GPUs improve the security of a computing system?
3
CPU PCIe Bus GPU
SM SM SM
...
Register Global memory
Motivation: Dedicated hardware resources
Can GPUs improve the security of a computing system?
4
CPU PCIe Bus GPU
SM SM SM
...
Register Global memory
Independent computational resources Motivation: Dedicated hardware resources
Can GPUs improve the security of a computing system?
5
CPU PCIe Bus GPU
SM SM SM
...
Register Global memory
Independent computational resources Independent memory system Motivation: Dedicated hardware resources
Can GPUs improve the security of a computing system?
6
CPU PCIe Bus GPU
SM SM SM
... Motivation: Dedicated hardware resources
Register Global memory
Independent computational resources Independent memory system Physically partitioned from CPU
Can discrete GPUs enhance the security of a computing system?
7
CPU PCIe Bus GPU
Register Global memory
Can discrete GPUs enhance the security of a computing system?
8
CPU PCIe Bus GPU
Register Secret Data Global memory Secret Data
Can discrete GPUs enhance the security of a computing system?
9
CPU PCIe Bus GPU
Register Secret Data Global memory Secret Data …...
Can discrete GPUs enhance the security of a computing system?
10
CPU PCIe Bus GPU
Register Secret Data Global memory Secret Data …...
11
CPU GPU Plaintext Ciphertext
encryption in GPU.
Register Global memory
12
CPU GPU Plaintext Ciphertext
encryption in GPU.
are encrypted by a master key and are stored in GPU memory.
Register Global memory Enc key
13
CPU GPU Plaintext Ciphertext
encryption in GPU.
are encrypted by a master key and are stored in GPU memory.
GPU register.
Register Master key Global memory Enc key
14
CPU GPU Plaintext Ciphertext
encryption in GPU.
encrypted by a master key and are stored in GPU memory.
GPU register.
Register Master key Global memory Enc key Enc key
15
CPU GPU Plaintext Ciphertext
encryption in GPU.
encrypted by a master key and are stored in GPU memory.
GPU register.
Register Master key Global memory Enc key Enc key
16
CPU GPU Plaintext Ciphertext
encryption in GPU.
encrypted by a master key and are stored in GPU memory.
GPU register.
from accessing registers.
Register Master key Global memory Enc key Enc key
PixelVault execution environment on GPU.
17
PixelVault execution environment on GPU.
○ Execute code at any privilege. ○ Has access to all platform hardware.
18
19
Security guarantees depend on several NVIDIA GPU characteristics.
○ Experimentally verify.
20
Assumption PixelVault safety property Attack A running GPU kernel cannot be stopped and debugged. Secure register contents from CPU-based debugger. Debugger API. GPU registers can’t be read after kernel termination. Cannot get the master key after kernel termination. Concurrent kernel. Can’t replace code of GPU kernel executing from instruction cache. Cannot replace PixelVault code without stopping the kernel. Flush instruction cache using MMIO registers.
Assumption: A running GPU kernel cannot be stopped and debugged.
21
CUDA 4.2 CUDA 5.0 and newer
support.
is running. Stop a running kernel and inspect all GPU registers via debugger API.
22
CUDA 4.2 CUDA 5.0 and newer
support.
is running. Stop a running kernel and inspect all GPU registers via debugger API.
Assumption: A running GPU kernel cannot be stopped and debugged.
23
Assumption PixelVault safety property Attack A running GPU kernel cannot be stopped and debugged. Secure register contents from CPU-based debugger. Debugger API. GPU registers can’t be read after kernel termination. Cannot get the master key after kernel termination. Concurrent kernel. Can’t replace code of GPU kernel executing from instruction cache. Cannot replace PixelVault code without stopping the kernel. Flush instruction cache using MMIO registers.
24
25
Data Transfer Stream Computation Stream GPU Register Register CPU
Assumption: GPU registers can’t be read after kernel termination. Attack: Stream B Stream A
26
Register Register
Assumption: GPU registers can’t be read after kernel termination. Attack: If GPU kernel B is invoked in parallel with running kernel A, A’s register state can be retrieved using the debugger API even after A terminates, as long as B is still running. Stream B Stream A
27
Register Register Register Register Stream B Stream A Debugger API
28
GPU global memory GPU CPU PCIe Bus GPU Chipset Instruction cache
29
CPU PCIe Bus GPU Chipset Program GPU global memory GPU Instruction cache
30
GPU global memory GPU CPU PCIe Bus GPU Chipset Program Instruction cache
31
GPU global memory Instruction cache GPU CPU PCIe Bus GPU Chipset Program Program
32
GPU global memory Instruction cache GPU CPU PCIe Bus GPU Chipset Program Program
…...
Program
33
GPU global memory Instruction cache GPU CPU PCIe Bus GPU Chipset Program Program
…...
34
35
Assumption PixelVault safety property Attack A running GPU kernel cannot be stopped and debugged. Secure register contents from CPU-based debugger. Debugger API. GPU registers can’t be read after kernel termination. Cannot get the master key after kernel termination. Concurrent kernel. Can’t replace code of GPU kernel executing from instruction cache. Cannot replace PixelVault code without stopping the kernel. Flush instruction cache using MMIO registers.
which is poorly (often purposefully) publicly documented. ○ Some MMIO registers that flush the GPU instruction cache are not documented as flushing the cache. ○ Private debugger API.
36
which is poorly (often purposefully) publicly documented.
software and what’s implemented in hardware across generations. ○ Debugger API
37
which is poorly (often purposefully) publicly documented.
software and what’s implemented in hardware across generations.
security of systems based on GPU.
38
which is poorly (often purposefully) publicly documented.
software and what’s implemented in hardware across generations.
security of systems based on GPU.
system.
39
1. Threat Model 2. GPU driver attack 3. GPU microcode attack 4. IOMMU mitigation
40
Attacker:
unprivileged access after the malware is installed.
41
device.
compared to other DMA capable devices.
42
GPU Memory DMA request Device address = Physical address Kernel data structure IO Device
43
IO Device IO Device Memory Kernel data structure
physical addresses.
44
IOTLB IOMMU
IOTLB IO Device IO Page Table Device Address Miss Memory Physical Address Hit
45
IO page table by hardware.
flush the cached mappings when they are removed from the IO page table.
46
Mode Characteristics Disable
Pass through
Deferred Default mode when IOMMU enabled. Strict IOMMU enabled. Not secure Secure Security Fast Slow Performance
Mode Characteristics Disable
Pass through
Deferred Default mode when IOMMU enabled. Strict IOMMU enabled.
47
Not secure Secure Security Fast Slow Performance
Mode Characteristics Disable
Pass through
Deferred Default mode when IOMMU enabled. Strict IOMMU enabled.
48
Not secure Secure Security Fast Slow Performance
49
Clear the entry in IO page table
50
Clear the entry in IO page table
IOTLB Flush Deferred Mode Strict Mode Strategy Flush entire IOTLB. Flush individual entry in given domain. Timing When deferred list is full or 10 ms after the first entry, whichever comes first. Immediately after unmapping entry from IO page table.
51
Clear the entry in IO page table
IOTLB Flush Deferred Mode Strict Mode Strategy Flush entire IOTLB. Flush individual entry in given domain. Timing When deferred list is full or 10 ms after the first entry, whichever comes first. Immediately after unmapping entry from IO page table.
IO Page Table
1. Writes a malicious IO page table entry.
52
IOTLB Memory GPU
IO Page Table
1. Writes a malicious IO page table entry. 2. Launch a GPU kernel which accesses the device address of the mapping, causing the entry to be cached in IOTLB.
53
IOTLB Memory Kernel GPU
IO Page Table
1. Writes a malicious IO page table entry. 2. Launch a GPU kernel which accesses the device address of the mapping, causing the entry to be cached in IOTLB.
54
Miss IOTLB Memory Kernel GPU
IO Page Table
1. Writes a malicious IO page table entry. 2. Launch a GPU kernel which accesses the device address of the mapping, causing the entry to be cached in IOTLB.
55
IOTLB Memory Kernel GPU
1. Writes a malicious IO page table entry. 2. Launch a GPU kernel which accesses the device address of the mapping, causing the entry to be cached in IOTLB. 3. Overwrite the IO page table.
56
IOTLB IO Page Table Memory Kernel GPU
57
Workload Bit rate Stale period Idle ssh connection 10 bps 1 day Web radio 130 Kbps 1 hour Web video: Auto (480p) 2 Mbps 1 min
IOTLB IO Page Table Memory Kernel GPU
58
coprocessor.
59
60