Understanding The Security of Discrete GPUs Zhiting Zhu 1 , Sangman - - PowerPoint PPT Presentation

understanding the security of discrete gpus
SMART_READER_LITE
LIVE PREVIEW

Understanding The Security of Discrete GPUs Zhiting Zhu 1 , Sangman - - PowerPoint PPT Presentation

Understanding The Security of Discrete GPUs Zhiting Zhu 1 , Sangman Kim 1 , Yuri Rozhanski 2 , Yige Hu 1 , Emmett Witchel 1 , Mark Silberstein 2 1.The University of Texas at Austin 2.Technion-Israel Institute of Technology Outline Can GPUs


slide-1
SLIDE 1

Understanding The Security

  • f Discrete GPUs

Zhiting Zhu1, Sangman Kim1, Yuri Rozhanski2, Yige Hu1, Emmett Witchel1, Mark Silberstein2 1.The University of Texas at Austin 2.Technion-Israel Institute of Technology

slide-2
SLIDE 2

Outline

  • Can GPUs improve the security of a computing system?

○ PixelVault ○ Attacking PixelVault

  • Can GPUs subvert the security of a computing system?

○ GPU driver attack ○ GPU microcode attack ○ IOMMU mitigation

2

slide-3
SLIDE 3

Can GPUs improve the security of a computing system?

3

CPU PCIe Bus GPU

SM SM SM

...

Register Global memory

Motivation: Dedicated hardware resources

slide-4
SLIDE 4

Can GPUs improve the security of a computing system?

4

CPU PCIe Bus GPU

SM SM SM

...

Register Global memory

Independent computational resources Motivation: Dedicated hardware resources

slide-5
SLIDE 5

Can GPUs improve the security of a computing system?

5

CPU PCIe Bus GPU

SM SM SM

...

Register Global memory

Independent computational resources Independent memory system Motivation: Dedicated hardware resources

slide-6
SLIDE 6

Can GPUs improve the security of a computing system?

6

CPU PCIe Bus GPU

SM SM SM

... Motivation: Dedicated hardware resources

Register Global memory

Independent computational resources Independent memory system Physically partitioned from CPU

slide-7
SLIDE 7

Can discrete GPUs enhance the security of a computing system?

7

CPU PCIe Bus GPU

Register Global memory

slide-8
SLIDE 8

Can discrete GPUs enhance the security of a computing system?

8

CPU PCIe Bus GPU

Register Secret Data Global memory Secret Data

slide-9
SLIDE 9

Can discrete GPUs enhance the security of a computing system?

9

CPU PCIe Bus GPU

Register Secret Data Global memory Secret Data …...

slide-10
SLIDE 10

Can discrete GPUs enhance the security of a computing system?

10

CPU PCIe Bus GPU

Register Secret Data Global memory Secret Data …...

slide-11
SLIDE 11

PixelVault (CCS 14)

11

CPU GPU Plaintext Ciphertext

  • Runs AES/RSA

encryption in GPU.

Register Global memory

slide-12
SLIDE 12

PixelVault (CCS 14)

12

CPU GPU Plaintext Ciphertext

  • Runs AES/RSA

encryption in GPU.

  • Encryption(Enc) keys

are encrypted by a master key and are stored in GPU memory.

Register Global memory Enc key

slide-13
SLIDE 13

PixelVault (CCS 14)

13

CPU GPU Plaintext Ciphertext

  • Runs AES/RSA

encryption in GPU.

  • Encryption(Enc) keys

are encrypted by a master key and are stored in GPU memory.

  • Master key is stored in a

GPU register.

Register Master key Global memory Enc key

slide-14
SLIDE 14

PixelVault (CCS 14)

14

CPU GPU Plaintext Ciphertext

  • Runs AES/RSA

encryption in GPU.

  • Encryption(Enc) keys are

encrypted by a master key and are stored in GPU memory.

  • Master key is stored in a

GPU register.

Register Master key Global memory Enc key Enc key

slide-15
SLIDE 15

PixelVault (CCS 14)

15

CPU GPU Plaintext Ciphertext

  • Runs AES/RSA

encryption in GPU.

  • Encryption(Enc) keys are

encrypted by a master key and are stored in GPU memory.

  • Master key is stored in a

GPU register.

Register Master key Global memory Enc key Enc key

slide-16
SLIDE 16

PixelVault (CCS 14)

16

CPU GPU Plaintext Ciphertext

  • Runs AES/RSA

encryption in GPU.

  • Encryption(Enc) keys are

encrypted by a master key and are stored in GPU memory.

  • Master key is stored in a

GPU register.

  • Prevent any adversarial

from accessing registers.

Register Master key Global memory Enc key Enc key

slide-17
SLIDE 17

Threat model

  • System boots from a trusted configuration and sets up

PixelVault execution environment on GPU.

17

slide-18
SLIDE 18

Threat model

  • System boots from a trusted configuration and sets up

PixelVault execution environment on GPU.

  • After setup, attacker can have full control over the platform.

○ Execute code at any privilege. ○ Has access to all platform hardware.

  • Attack goal: Steal keys from GPU.

18

slide-19
SLIDE 19

Threat model

19

Security guarantees depend on several NVIDIA GPU characteristics.

  • Some of these characteristics are well known and confirmed.
  • Some are experimentally validated.
  • Others are only assumed to correct.

○ Experimentally verify.

slide-20
SLIDE 20

Assumption about NVIDIA GPU

20

Assumption PixelVault safety property Attack A running GPU kernel cannot be stopped and debugged. Secure register contents from CPU-based debugger. Debugger API. GPU registers can’t be read after kernel termination. Cannot get the master key after kernel termination. Concurrent kernel. Can’t replace code of GPU kernel executing from instruction cache. Cannot replace PixelVault code without stopping the kernel. Flush instruction cache using MMIO registers.

slide-21
SLIDE 21

Assumption: A running GPU kernel cannot be stopped and debugged.

21

CUDA 4.2 CUDA 5.0 and newer

  • Compiled with explicit debug

support.

  • Insert breakpoints before kernel

is running. Stop a running kernel and inspect all GPU registers via debugger API.

slide-22
SLIDE 22

22

CUDA 4.2 CUDA 5.0 and newer

  • Compiled with explicit debug

support.

  • Insert breakpoints before kernel

is running. Stop a running kernel and inspect all GPU registers via debugger API.

Assumption: A running GPU kernel cannot be stopped and debugged.

slide-23
SLIDE 23

Assumption about NVIDIA GPU

23

Assumption PixelVault safety property Attack A running GPU kernel cannot be stopped and debugged. Secure register contents from CPU-based debugger. Debugger API. GPU registers can’t be read after kernel termination. Cannot get the master key after kernel termination. Concurrent kernel. Can’t replace code of GPU kernel executing from instruction cache. Cannot replace PixelVault code without stopping the kernel. Flush instruction cache using MMIO registers.

slide-24
SLIDE 24

CUDA Stream

  • An operation sequence on a GPU device.
  • Every CUDA kernel is invoked on an independent stream.
  • Share the same address space.

24

slide-25
SLIDE 25

PixelVault

25

Data Transfer Stream Computation Stream GPU Register Register CPU

slide-26
SLIDE 26

Assumption: GPU registers can’t be read after kernel termination. Attack: Stream B Stream A

26

Register Register

slide-27
SLIDE 27

Assumption: GPU registers can’t be read after kernel termination. Attack: If GPU kernel B is invoked in parallel with running kernel A, A’s register state can be retrieved using the debugger API even after A terminates, as long as B is still running. Stream B Stream A

27

Register Register Register Register Stream B Stream A Debugger API

slide-28
SLIDE 28

Loading a program into the GPU

28

GPU global memory GPU CPU PCIe Bus GPU Chipset Instruction cache

slide-29
SLIDE 29

Loading a program into the GPU

29

CPU PCIe Bus GPU Chipset Program GPU global memory GPU Instruction cache

slide-30
SLIDE 30

Loading a program into the GPU

30

GPU global memory GPU CPU PCIe Bus GPU Chipset Program Instruction cache

slide-31
SLIDE 31

Loading a program into the GPU

31

GPU global memory Instruction cache GPU CPU PCIe Bus GPU Chipset Program Program

slide-32
SLIDE 32

32

GPU global memory Instruction cache GPU CPU PCIe Bus GPU Chipset Program Program

…...

If CPU writes to GPU instructions in memory while the GPU is running

Program

slide-33
SLIDE 33

33

GPU global memory Instruction cache GPU CPU PCIe Bus GPU Chipset Program Program

…...

If CPU writes to GPU instructions in memory while the GPU is running

slide-34
SLIDE 34

No public API for flushing the instruction cache.

34

slide-35
SLIDE 35

Assumption about NVIDIA GPU

35

Assumption PixelVault safety property Attack A running GPU kernel cannot be stopped and debugged. Secure register contents from CPU-based debugger. Debugger API. GPU registers can’t be read after kernel termination. Cannot get the master key after kernel termination. Concurrent kernel. Can’t replace code of GPU kernel executing from instruction cache. Cannot replace PixelVault code without stopping the kernel. Flush instruction cache using MMIO registers.

slide-36
SLIDE 36

Discussion

  • Security guarantees rely on proprietary hardware and software

which is poorly (often purposefully) publicly documented. ○ Some MMIO registers that flush the GPU instruction cache are not documented as flushing the cache. ○ Private debugger API.

36

slide-37
SLIDE 37

Discussion

  • Security guarantees rely on proprietary hardware and software

which is poorly (often purposefully) publicly documented.

  • Manufacturers are free to change what’s implemented in

software and what’s implemented in hardware across generations. ○ Debugger API

37

slide-38
SLIDE 38

Discussion

  • Security guarantees rely on proprietary hardware and software

which is poorly (often purposefully) publicly documented.

  • Manufacturers are free to change what’s implemented in

software and what’s implemented in hardware across generations.

  • Manufacturers can change the architecture that invalidates the

security of systems based on GPU.

38

slide-39
SLIDE 39

Discussion

  • Security guarantees rely on proprietary hardware and software

which is poorly (often purposefully) publicly documented.

  • Manufacturers are free to change what’s implemented in

software and what’s implemented in hardware across generations.

  • Manufacturers can change the architecture that invalidates the

security of systems based on GPU.

  • Discrete GPUs cannot enhance the security of the computing

system.

39

slide-40
SLIDE 40

GPU as a host for stealthy malware

1. Threat Model 2. GPU driver attack 3. GPU microcode attack 4. IOMMU mitigation

40

slide-41
SLIDE 41

Threat model

Attacker:

  • Load and unload kernel modules via module loading capability.
  • Access the GPU control interface i.e., MMIO register regions.
  • Loses the module loading capability and is allowed only

unprivileged access after the malware is installed.

Stealthiness

  • Originate with the GPU reading and writing CPU memory.

41

slide-42
SLIDE 42

DMA attack

  • GPU is a programmable

device.

  • Easier to launch DMA attack

compared to other DMA capable devices.

  • GPU driver attack.
  • GPU microcode attack.

42

GPU Memory DMA request Device address = Physical address Kernel data structure IO Device

slide-43
SLIDE 43

IOMMU

  • Hardware
  • Software management
  • IOMMU attack

43

slide-44
SLIDE 44

IOMMU

IO Device IO Device Memory Kernel data structure

  • Maps device addresses to CPU

physical addresses.

  • Check access permission.

44

IOTLB IOMMU

slide-45
SLIDE 45

IOTLB

IOTLB IO Device IO Page Table Device Address Miss Memory Physical Address Hit

45

  • Not kept coherent with the

IO page table by hardware.

  • Software must explicitly

flush the cached mappings when they are removed from the IO page table.

slide-46
SLIDE 46

IOMMU configurations

46

Mode Characteristics Disable

  • Default configuration for many linux distributions.
  • Reduce IO performance.
  • Incompatible with certain devices and features.

Pass through

  • Hardware IOMMU is turned off.
  • Device address is used as CPU physical address.

Deferred Default mode when IOMMU enabled. Strict IOMMU enabled. Not secure Secure Security Fast Slow Performance

slide-47
SLIDE 47

Mode Characteristics Disable

  • Default configuration for many linux distributions.
  • Reduce IO performance.
  • Incompatible with certain devices and features.

Pass through

  • Hardware IOMMU is turned off.
  • Device address is used as CPU physical address.

Deferred Default mode when IOMMU enabled. Strict IOMMU enabled.

IOMMU configurations

47

Not secure Secure Security Fast Slow Performance

slide-48
SLIDE 48

Mode Characteristics Disable

  • Default configuration for many linux distributions.
  • Reduce IO performance.
  • Incompatible with certain devices and features.

Pass through

  • Hardware IOMMU is turned off.
  • Device address is used as CPU physical address.

Deferred Default mode when IOMMU enabled. Strict IOMMU enabled.

IOMMU configurations

48

Not secure Secure Security Fast Slow Performance

slide-49
SLIDE 49

When system memory is unmapped from IO devices:

49

Clear the entry in IO page table

slide-50
SLIDE 50

When system memory is unmapped from IO devices:

50

Clear the entry in IO page table

IOTLB Flush Deferred Mode Strict Mode Strategy Flush entire IOTLB. Flush individual entry in given domain. Timing When deferred list is full or 10 ms after the first entry, whichever comes first. Immediately after unmapping entry from IO page table.

slide-51
SLIDE 51

When system memory is unmapped from IO devices:

51

Clear the entry in IO page table

IOTLB Flush Deferred Mode Strict Mode Strategy Flush entire IOTLB. Flush individual entry in given domain. Timing When deferred list is full or 10 ms after the first entry, whichever comes first. Immediately after unmapping entry from IO page table.

slide-52
SLIDE 52

IO Page Table

IOMMU attack

1. Writes a malicious IO page table entry.

52

IOTLB Memory GPU

slide-53
SLIDE 53

IO Page Table

IOMMU attack

1. Writes a malicious IO page table entry. 2. Launch a GPU kernel which accesses the device address of the mapping, causing the entry to be cached in IOTLB.

53

IOTLB Memory Kernel GPU

slide-54
SLIDE 54

IO Page Table

IOMMU attack

1. Writes a malicious IO page table entry. 2. Launch a GPU kernel which accesses the device address of the mapping, causing the entry to be cached in IOTLB.

54

Miss IOTLB Memory Kernel GPU

slide-55
SLIDE 55

IO Page Table

IOMMU attack

1. Writes a malicious IO page table entry. 2. Launch a GPU kernel which accesses the device address of the mapping, causing the entry to be cached in IOTLB.

55

IOTLB Memory Kernel GPU

slide-56
SLIDE 56

IOMMU attack

1. Writes a malicious IO page table entry. 2. Launch a GPU kernel which accesses the device address of the mapping, causing the entry to be cached in IOTLB. 3. Overwrite the IO page table.

56

IOTLB IO Page Table Memory Kernel GPU

slide-57
SLIDE 57

How long can a stale entry last in IOTLB?

57

Workload Bit rate Stale period Idle ssh connection 10 bps 1 day Web radio 130 Kbps 1 hour Web video: Auto (480p) 2 Mbps 1 min

IOTLB IO Page Table Memory Kernel GPU

slide-58
SLIDE 58

Stealthiness

  • IOTLB entry is not accessible by software.
  • IO page table can be monitored by security tools.

58

slide-59
SLIDE 59

Conclusion

  • Discrete GPUs are not an appropriate choice for a secure

coprocessor.

  • Discrete GPUs pose a security threat to computing platform.

59

slide-60
SLIDE 60

60