the ascend secure processor
play

The Ascend Secure Processor Christopher Fletcher MIT 1 Joint work - PowerPoint PPT Presentation

The Ascend Secure Processor Christopher Fletcher MIT 1 Joint work with Srini Devadas, Marten van Dijk Ling Ren, Albert Kwon, Xiangyao Yu Elaine Shi & Emil Stefanov David Wentzlaff & Princeton Team (Mike, Tri, Jonathan,


  1. The Ascend Secure Processor Christopher Fletcher MIT 1

  2. Joint work with • Srini Devadas, Marten van Dijk • Ling Ren, Albert Kwon, Xiangyao Yu • Elaine Shi & Emil Stefanov • David Wentzlaff & Princeton Team (Mike, Tri, Jonathan, Alexey, Yaosheng) • Omer Khan 2

  3. Last talk: Intel SGX Data Integrity MEE Address Timing 3

  4. Ascend Processor This talk Data Integrity Memory controller Address Timing 4

  5. Outline • Motivation + Oblivious RAM (ORAM) primer • ORAM in Hardware • Demo  5

  6. If ( secret variable ) { Op Address Time R 0 1 … scan memory … W 1 10 } R 5 15 R 6 16 R 7 17 Binary search • SGX broken through page faults [Xu et al.’15] • Shared library usage [Zhuang et al.’04] • Search queries [Islam et al.’12] 6

  7. Oblivious RAM (ORAM) [Goldreich- Ostrovsky’96] Chip pins On-chip Cache miss ORAM Shuffled Controller Provably removes all access pattern leakage 7

  8. ORAM security definition • Access is 3 tuple: ( op = Read/write, address , data ) • Consider access sequences A and A’ A = [ (op 1 , address 1 , data 1 ), (op 2 , address 2 , data 2 ), … ] A’ = [ (op 1 ’, address 1 ’, data 1 ’), (op 2 ’, address 2 ’, data 2 ’), … ] • If |A| == |A’| then ORAM(A) ≈ ORAM(A’) 8

  9. Path ORAM [CCS’13] “The ORAM” Chip pins ORAM Controller (on-chip) Read/writes 9

  10. Block assigned to random path. Block lives on that path . A, 2 Off-chip B, 3 PosMap A, 2 B, 3 Path 1 2 3 4 10

  11. Chip pins Empty space = dummy encryptions Not Encrypted Encrypted Off-chip B, 3 PosMap A, 2 A, 2 dummy B, 3 dummy dummy dummy dummy Path 1 2 3 4 11

  12. Path ORAM Access: Read+write the path the block is assigned to. 12

  13. Path ORAM Access: Read+write the path the block is assigned to. Off-chip B, 3 PosMap dummy A, 2 4 A, 2 dummy dummy B, 3 A, 4 B, 3 dummy dummy dummy dummy Stash Path 1 2 3 4 13

  14. Typically, 4 slots per bucket “Z=4” Off-chip B, 3 Z=1 …for simplicity dummy dummy dummy dummy dummy Path 1 2 3 4 14

  15. Too big! Off-chip B, 3 A, 4 A, 4 B, 3 15

  16. Map recursion [Shi et al., 11] PosMap ORAM PosMap ORAM 2 Block On-chip On-chip Map’ Map’ Map A, 4 B, 3 Small enough Smaller 16

  17. Map recursion [Shi et al., 11] On-chip PosMap PosMap PosMap Data ORAM ORAM 1 ORAM 2 17

  18. Path ORAM summary Blocks assigned to paths. B Access block: Read+write path. Adversary sees: random paths. 4 2 3 1 18

  19. ORAM in Hardware 19

  20. First ORAM Ascend in silicon in silicon • Collaboration with David Wentzlaff’s group @ Princeton MIT Team 20

  21. First silicon fully functional @ 500 MHz & .9 V Design (Verilog) Open Source 21

  22. Blocks must live on assigned path or in stash . Can overflow Off-chip PosMap A, 4 B, 2 A, 4 B, 2 Stash Path 1 2 3 4 22

  23. Blocks must live on assigned path or in stash . Off-chip Bottleneck in prior work [Maas et al. ‘13] PosMap Causes 3 X avg. slowdown on SPEC. A, 4 B, 2 A, 4 B, 2 Stash Path 1 2 3 4 23

  24. Bit blasted stash eviction [FCCM’15] Bit vectors Path block , Path evict , occ Can be pipelined def evict( Path block , Path evict , occ ): t 1 = Path block ⊕ Path evict evict() t 2 = bit_reverse( ( (t 1 Ʌ – t 1 ) – 1) Ʌ occ ) ~ circuit ret bit_reverse( t 2 Ʌ – t 2 ) occ ’ ≈ greatest common prefix 24

  25. Simple design, no performance bottleneck. 25

  26. Integrity protection for ORAM ORAM Cache miss Shuffled Overlay Hash tree 26

  27. ORAM logic Test harness SHA-3 One SHA-3 unit FPGA Prototype 27

  28. Cheap Integrity Scheme [ASPLOS’15] • Per-block MAC { Block data , Hash(Block data , Block addr, counter ) } • Good: Hash 1 block, NOT path • Bad: Need to store counters on-chip • Replace entries in map with counters! 28

  29. Block A’’ MAC K ( Counter || A’’ || data) Want: Path P = PosMap[A] Counter for A’’’ + 1 A’’’+1 Counter for A’’’ A ’’’ Algorithm: Given A: derive A’, A’’, A’’’ Data PosMap P’ = PRF (A’ || PosMap [A’] = Counter ) ORAM PosMap ORAM 1 P’’ = PRF (A’’ || ORAMAccess (A’’, P’)) ORAM 2 P = PRF (A’’’ || ORAMAccess (A’’’, P’’)) Data = ORAMAccess (A, P) Block A’’’ P Block P’ P’’ Counter A’ Block A’’ (A, D) On-chip PosMap (root of trust) Problem: |C| > |P| More schemes … to get |C| < |P| Integrity checks

  30. Cheap Integrity Scheme [ASPLOS’15] Result: Hashing decreased by 68 X , simple design 30

  31. ORAM randomizes data layout. Computer architecture assumes data locality. Subtree size = row size ISCA’13 31

  32. # Row misses: tree height ~ tree height ~ subtree height 32

  33. Row misses: 60% overhead 13% overhead 33

  34. 460M transistors AES rounds Stash evict() PLL ORAM .5 mm Tile 0 Tile 1 Tile 2 Tile 3 Tile 4 Encryption Stash Recursion, PLB, Tile 5 Tile 6 Tile 7 Tile 8 Tile 9 Integrity 6 mm Tile Tile Tile Tile Tile 2 mm 10 11 12 13 14 Tile Tile Tile Tile Tile Hash unit 15 16 17 18 19 Tile Tile Tile Tile Tile 20 21 22 23 24 6 mm 34

  35. 2 DRAM channels, In-order core, Slowdown vs. 2-level cache hierarchy, 1 MByte last-level cache insecure ORAM = 1208 cycles / tree lookup 11 Slowdown (X) 1 tpcc ycsb astar bzip2 gcc gob h264 libq mcf omnet perl sjeng avg 35

  36. Demo 36

  37. Backup 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend