Secure Computation of MIPS Machine Code Gordon, Katz, McIntosh, - - PowerPoint PPT Presentation

secure computation of mips machine code
SMART_READER_LITE
LIVE PREVIEW

Secure Computation of MIPS Machine Code Gordon, Katz, McIntosh, - - PowerPoint PPT Presentation

Secure Computation of MIPS Machine Code Gordon, Katz, McIntosh, Wang Efficiency vs. Generality generality efficiency Domain specific languages that approximate certain high level languages. Constructions tailored towards particular


slide-1
SLIDE 1

Secure Computation of MIPS Machine Code

Gordon, Katz, McIntosh, Wang

slide-2
SLIDE 2

Efficiency vs. Generality

2

generality efficiency

Domain specific languages that approximate certain high level languages. Constructions tailored towards particular applications. Machine code / legacy code

slide-3
SLIDE 3

Legacy Code

Moving to the RAM model offers the possibility

  • f securely emulating real architectures.

In theory, we can support “real” languages, their existing libraries, and existing compilers. What would this take in practice?

Ideal world: the programmer has never heard the words “secure computation”.

slide-4
SLIDE 4

Oblivious RAM [GO96,…]

(v1, d1), (v2, d2) …. , (vn, dn)

4

(r, v5), (r, v2), (w, v2 ,d1) …. , (w, v7, d2) client server access pattern 1: (r, v1), (r, v1), (r, v1) …. , (r, v1) access pattern 2: ANY 2 access patterns are indistinguishable

slide-5
SLIDE 5

ORAM

ORAM in secure computation

x y

F(x,y)

Who should hold the ORAM? Recall, the client fetches items from the server. Alice shouldn’t see Bob’s items, and Bob Shouldn’t see Alice’s.

slide-6
SLIDE 6

ORAM

ORAM in Secure Computation

x y

F(x,y)

But even if Alice sees which of her own items are fetched she learns something about y. (Consider a binary search for y among the items in X)

ORAM

slide-7
SLIDE 7

Oblivious RAM (abstraction)

ORAM

(READ) V v(1), … ,v(logn) V D

ORAM

v(1) state(0) state(1)

ORAM

v(2) state(1) state(2)

ORAM

v(logn) state(logn-1) state(logn) … 1 of the log n will match, output D

slide-8
SLIDE 8

CCS 2012

v1 D1

ORAM

v(1) state1

(0)

state2

(0)

state1

(1)

ORAM

v(2) state1

(2)

ORAM

v(logn) state1

(logn)

… v2 state1

(1)

state2

(1)

state1

(log n-1)

state2

(log n-1)

state2

(1)

state2

(2)

state2

(logn)

D2 “secret shares” v1 v2 = v state1 state2 = state

⊕ ⊕

Y A O Y A O Y A O

slide-9
SLIDE 9

CCS 2012

RAM PROGRAM

(BINARY SEARCH)

(state1

(0), inst1)

(state2

(0), inst2)

(state1

(log n), D1)

(state2

(log n), D2)

inst1

(0)

ORAM

v(1) state1

(0)

state2

(0)

state1

(1)

ORAM

v(2) state1

(2)

ORAM

v(logn) state1

(logn)

… inst2

(0)

state1

(1)

state2

(1)

state1

(log n-1)

state2

(log n-1)

state2

(1)

state2

(2)

state2

(logn)

Y A O Y A O Y A O Y A O

D1 D2

slide-10
SLIDE 10

Current Work (DARPA: PROCEED)

CPU

MIPS ARCHITECTURE

new 32-registers new progCounter new 32-registers instruction progCounter

Y A O

INSTRUCTION FETCH LOAD/STORE WORD progCounter

Y A O YAO

new instruction 32-registers ObliVM

slide-11
SLIDE 11

Current Work

CPU

MIPS ARCHITECTURE

new 32-registers new progCounter new 32-registers instruction progCounter

Y A O

INSTRUCTION FETCH LOAD/STORE WORD progCounter

Y A O YAO

new instruction 32-registers

slide-12
SLIDE 12

Current Work

CPU

MIPS ARCHITECTURE

new 32-registers new progCounter new 32-registers instruction progCounter

Y A O

Why MIPS?

  • Fixed register space = fixed circuit.
  • With approximately 15 instructions, we

can compute: Djikstra, longest common sub-string, set- intersection, stable marriage, binary search, decision trees…

  • Easy to implement!
  • 15 instructions = small circuit.
  • We first proposed LLVM, but

instructions in LLVM are polymorphic

  • bjects.

On the other hand:

  • ARM or x86 would give bigger circuits,

but smaller programs. Ultimately, I don’t know which is best.

slide-13
SLIDE 13

Current Work

CPU

MIPS ARCHITECTURE

new 32-registers new progCounter new 32-registers instruction progCounter

Y A O

INSTRUCTION FETCH LOAD/STORE WORD progCounter

Y A O YAO

new instruction 32-registers

slide-14
SLIDE 14

Component Run-Times

ALU  15 instructions  7K AND gates Memory Fetch from 1024 32-bit words  43K AND gates

slide-15
SLIDE 15

Improvement #1: Instruction Mapping

Divide all instructions into separate “banks” Banki contains instructions that could be executed in the ith cycle.

slide-16
SLIDE 16

Instruction Mapping

If (x > 5) instr1 instr2 else instr3 instr4

If x is tainted: instr1 and instr3 must go in the same ORAM bank and instr2 and instr4 must go in the same ORAM bank

for (i = 1 to x) instr1 instr2 end for instr3 instr4 instr5

t = 1: instr1 t = 2: instr2 t = 3: instr1 or instr3 t = 4: instr2 or instr4 t = 5: instr1 or instr3 or intsr5 loop size t, program length n: n/t banks, each of size t.

slide-17
SLIDE 17

Instruction Mapping

CPU

MIPS ARCHITECTURE

Y A O

INSTRUCTION FETCH LOAD/STORE WORD

Y A O YAO

skip this on MANY steps! Reduce the number of instructions! Wade through fewer instructions!

slide-18
SLIDE 18

CPU

MIPS ARCHITECTURE

Y A O

Reduce the number of instructions!

Instruction Mapping

Set Intersection: Reduces the average ALU size from 6727 to 1848 AND gates. (3.5X)

slide-19
SLIDE 19

INSTRUCTION FETCH

Y A O

Wade through fewer instructions!

Instruction Mapping

Set Intersection: The full program has about 150 instructions. The largest instruction bank after mapping has 31 instructions. More than half the instruction banks have fewer than 20 instructions.

slide-20
SLIDE 20

LOAD/STORE WORD

YAO

skip this on MOST steps! 

Instruction Mapping

Unfortunately, even after instruction mapping, load/store operations still might occur in almost every time step.

slide-21
SLIDE 21

Improvement #2: padding

for (i = 1 to x) If (x > 5) instr1 instr2 else instr3 instr4 instr5

If two branches are relatively prime, one of length k1, the other k2, then in less than k1k2 time-steps, we will cover the entire loop. By padding branches such that the lengths are relatively composite, we can greatly reduce the number of instructions per bank: for set intersection, we go from  40 down to 4.

slide-22
SLIDE 22

LOAD/STORE WORD

YAO

skip this on MOST steps! 

Padding

We padded 2 of 3 branches that appear in the main loop using a total of 6 NOP instructions. Before padding we found that a load/store operation might be executed in almost every time step. After padding, we find that for only 1/10 of all time steps require a load/store operation.

slide-23
SLIDE 23

Set Intersection

Run-time decomposition for computing set- intersection size when each party's input consists of 64 32-bit integers. Run-time decomposition for computing set- intersection size when each party's input consists of 1024 32-bit integers.

slide-24
SLIDE 24

Set Intersection

slide-25
SLIDE 25

Binary Search

Comparing the performance of secure binary search. One party holds an array of 32-bit integers, while the other holds a value to search for.

slide-26
SLIDE 26

Decision Trees

slide-27
SLIDE 27

A True Universal Circuit

One more benefit of the general approach: We have a true universal circuit!

  • 1. Compile the private input function to MIPS,
  • 2. Supply a function pointer as input to the emulator.
  • 3. Our optimizations no longer apply: the analysis leaks

information.

slide-28
SLIDE 28

Thanks!