Memor ory D y Defen enses es The Elevation from Obscurity to - - PowerPoint PPT Presentation

memor ory d y defen enses es
SMART_READER_LITE
LIVE PREVIEW

Memor ory D y Defen enses es The Elevation from Obscurity to - - PowerPoint PPT Presentation

Memor ory D y Defen enses es The Elevation from Obscurity to Headlines Rajeev Balasubramonian School of Computing, University of Utah 2 Image sources: pinterest, gizmodo Spectre Overview x is controlled Thanks to bpred, x can be


slide-1
SLIDE 1

Memor

  • ry D

y Defen enses es

The Elevation from Obscurity to Headlines

Rajeev Balasubramonian School of Computing, University of Utah

slide-2
SLIDE 2

Image sources: pinterest, gizmodo

2

slide-3
SLIDE 3

Spectre Overview

if (x < array1_size) y = array2[ array1[x] ];

Victim Code

x is controlled by attacker array1[ ] is the secret Access pattern of array2[ ] betrays the secret Thanks to bpred, x can be anything

3

slide-4
SLIDE 4

What Did We Learn?

Speculation Specific Code No side channel defenses

+ +

4

slide-5
SLIDE 5

The Wake Up Call

Say Yes to Side Channel Defenses

5

slide-6
SLIDE 6

Overview

  • Memory timing channels
  • The Fixed Service memory controller [MICRO 2015]
  • Memory access patterns
  • Near-data ORAM [HPCA 2018]
  • Memory integrity
  • Improving SGX with VAULT [ASPLOS 2018]

6

slide-7
SLIDE 7

Memory Timing Channels

7

VM 1 CORE 1 VM 2 CORE 2

MC

Two VMs sharing a processor and memory channel Attacker Victim

slide-8
SLIDE 8

Possible Attacks

8

VM 1 CORE 1 VM 2 CORE 2

MC

Attack 1: Bits in a key influence memory accesses Attack 2: A victim can betray secrets through memory activity Attack 3: A covert channel attack

slide-9
SLIDE 9

Covert Channel Attack

9

VM 1 CORE 1 VM 2 CORE 2

MC

A covert channel Electronic health records 3rd party document reader Conspirator

slide-10
SLIDE 10

Fixed Service Memory Controller

10

VM-1 has its data in Rank-1 VM-2 has its data in Rank-2 … VM-8 has its data in Rank-8 Time (in cycles) VM-1 begins memory access VM-2 begins memory access VM-8 begins memory access VM-1 begins memory access

7 49 56

slide-11
SLIDE 11

Fixed Service Details

  • Deterministic schedule
  • No resource contention
  • Dummy accesses if nothing pending
  • Lower bandwidth, higher latency
  • Why 7? DRAM timing parameters, worst-case
  • Rank partitioning: 7 cycle gap
  • Bank partitioning: 15 cycle gap
  • No partitioning: 43 cycle gap

11

slide-12
SLIDE 12

Overcoming Worst-Case

  • In one batch of requests, schedule all reads, followed by

all writes (worst-case encountered once per batch)

  • Impose constraints on banks that can be accessed – triple

bank alternation

12

1 2 3 4 5 6 7

15

1 2 3 4 5 6 7

3x15 = 45 > 43

Red: Bank-id mod 3 = 0 Blue: Bank-id mod 3 = 1 Green: Bank-id mod 3 = 2

slide-13
SLIDE 13

Results

13

Increased OS complexity

RANK PARTITIONING NO PARTITIONING BANK PARTITIONING PERFORMANCE NON-SECURE BASELINE 1.0 0.74 0.48 0.43 0.20 0.40 FS FS: RD/WR-REORDER FS: TRIPLE ALTERNATION TP TP

slide-14
SLIDE 14

Overview

  • Memory timing channels
  • The Fixed Service memory controller [MICRO 2015]
  • Memory access patterns
  • Near-data ORAM [HPCA 2018]
  • Memory integrity
  • Improving SGX with VAULT [ASPLOS 2018]

14

slide-15
SLIDE 15

Oblivious RAM

  • Assumes that addresses are exposed
  • PHANTOM [CCS’13]: Memory bandwidth overhead of …

15

Image sources: vice.com

slide-16
SLIDE 16

Oblivious RAM

  • Assumes that addresses are exposed
  • PHANTOM [CCS’13]: Memory bandwidth overhead of …

2560x (about 280x today)

16

Image sources: vice.com

slide-17
SLIDE 17

Path-ORAM

17

Stash

slide-18
SLIDE 18

A Distributed ORAM

18

Authenticated buffer chip

MC

All buses are exposed Buffer chip and processor communication is encrypted

Processor

ORAM operations shift from Processor to SDIMM. ORAM traffic pattern shifts from the memory bus to on- SDIMM “private” buses.

slide-19
SLIDE 19

The Independent ORAM Protocol

19

MC Processor

  • 1. Each SDIMM handles a

subtree of the ORAM tree.

  • 2. Only traffic on shared

memory channel: CPU requests and leaf-id re- assignments.

  • 3. As much parallelism as the

number of SDIMMs.

slide-20
SLIDE 20

The Split ORAM Protocol

20

MC Processor

  • 1. Each SDIMM handles a

subset of every node.

  • 2. Only metadata is sent to the

processor.

  • 3. The processor tells the

SDIMMs how to shuffle data.

  • 4. Lower latency per ORAM

request, but lower parallelism as well.

slide-21
SLIDE 21

ORAM Results Summary

  • Can combine the Independent and Split protocols to find

the best balance of latency and parallelism

  • Bandwidth demands are reduced from 280x  35x

Execution time overheads from 5.2x  2.7x

  • Reduces memory energy by 2.5x

21

slide-22
SLIDE 22

Overview

  • Memory timing channels
  • The Fixed Service memory controller [MICRO 2015]
  • Memory access patterns
  • Near-data ORAM [HPCA 2018]
  • Memory integrity
  • Improving SGX with VAULT [ASPLOS 2018]

22

slide-23
SLIDE 23

Intel SGX Basics

23

Enclave 1 Intel SGX Enclave N … Memory EPC 96MB

Non-EPC Sen Non-EPC NSen

  • 1. Enclave data is protected

from malicious OS/operator.

  • 2. A per-block integrity tree

protects EPC.

  • 3. A per-page integrity tree

protects non-EPC Sen.

  • 4. This keeps overheads (bw

and capacity) of integrity tree low.

  • 5. Entails frequent paging

between EPC and non-EPC.

slide-24
SLIDE 24

Intel SGX Basics

24

Enclave 1 Intel SGX Enclave N … Memory EPC 96MB

Non-EPC Sen Non-EPC NSen

  • 1. Enclave data is protected

from malicious OS/operator.

  • 2. A per-block integrity tree

protects EPC.

  • 3. A per-page integrity tree

protects non-EPC Sen.

  • 4. This keeps overheads (bw

and capacity) of integrity tree low.

  • 5. Entails frequent paging

between EPC and non-EPC.

VAULT: Unify EPC and non-EPC to reduce paging. New integrity tree for low bw. Better metadata for capacity.

slide-25
SLIDE 25

SGX Overheads

25

slide-26
SLIDE 26

Bonsai Merkle Tree

26

Hash Hash Hash Hash Hash Hash 512 bits 64 bits

Leaf hashes Intermediate Hashes Data Block 64+512 bits

Data Block

… …

512 bits for 64 counters MAC MAC Arity=64 Arity=8 7b … 64b Shared global counter Local counter 7b

Root block in processor

… …

slide-27
SLIDE 27

VAULT

27

  • 1. Small linkage counters 

high arity, compact/shallow tree, better cacheability.

  • 2. Variable counter width to

manage overflow.

  • 3. Reduces bandwidth overhead

for integrity verification.

slide-28
SLIDE 28

VAULT+SMC

28

  • 1. MAC storage and bw
  • verheads are high.
  • 2. Sharing a MAC among

4 blocks reduces storage, but incr bw.

  • 3. A block is compressed

and the MAC is embedded in the block reduces bw and storage.

slide-29
SLIDE 29

Integrity Results Summary

  • 3.7x performance improvement over SGX – primarily

because of lower paging overheads

  • A large effective EPC is palatable – 4.7% storage overhead

and a more scalable tree (34% better than the SGX tree)

29

VAULT+SMC

slide-30
SLIDE 30

Big Finish

  • Memory defenses were purely academic pursuits
  • Integrity trees now a part of Intel SGX: overheads of 2x – 40x
  • VAULT improves integrity overhead to 1.5x – 2.5x
  • FS eliminates timing channels with overhead of 2x
  • SDIMM improves ORAM overhead to 2.7x
  • An array of memory defenses is now commercially viable

… and strategic given latent vulnerabilities

30

Acks: Ali Shafiee, Meysam Taassori, Akhila Gundu, Manju Shevgoor, Mohit Tiwari, Feifei Li, NSF, Intel.