Verifiable ASICs: trustworthy hardware with untrusted components
Riad S. Wahby◦⋆, Max Howald†⋆, Siddharth Garg⋆, abhi shelat‡, and Michael Walfish⋆
- Stanford University
⋆New York University †The Cooper Union ‡The University of Virginia
May 25th, 2016
Verifiable ASICs: trustworthy hardware with untrusted components - - PowerPoint PPT Presentation
Verifiable ASICs: trustworthy hardware with untrusted components Riad S. Wahby , Max Howald , Siddharth Garg , abhi shelat , and Michael Walfish Stanford University New York University The Cooper Union
Riad S. Wahby◦⋆, Max Howald†⋆, Siddharth Garg⋆, abhi shelat‡, and Michael Walfish⋆
⋆New York University †The Cooper Union ‡The University of Virginia
May 25th, 2016
input
x y proof that y = F(x) input
x y proof that y = F(x) input
x y proof that y = F(x) input
x y proof that y = F(x) input
x y proof that y = F(x) input
generalized boolean circuit over Fp ∧ → × ∨ → +
x y proof that y = F(x) input
x y proof that y = F(x) input
Arguments [GGPR13, SBVBPW13, PGHR13, BCTV14] e.g., Zaatar, Pinocchio, libsnark IPs [GKR08, CMT12, VSBW13] e.g., Muggles, CMT, Allspice
x y proof that y = F(x) input
Arguments [GGPR13, SBVBPW13, PGHR13, BCTV14] e.g., Zaatar, Pinocchio, libsnark + F with RAM, complex control flow + Little V-P communication IPs [GKR08, CMT12, VSBW13] e.g., Muggles, CMT, Allspice – “Quasi–straight line” F – Lots of V-P communication
x y proof that y = F(x) input
Arguments [GGPR13, SBVBPW13, PGHR13, BCTV14] e.g., Zaatar, Pinocchio, libsnark + F with RAM, complex control flow + Little V-P communication Unsuited to hardware implementation IPs [GKR08, CMT12, VSBW13] e.g., Muggles, CMT, Allspice – “Quasi–straight line” F – Lots of V-P communication
x y proof that y = F(x) input
Arguments [GGPR13, SBVBPW13, PGHR13, BCTV14] e.g., Zaatar, Pinocchio, libsnark + F with RAM, complex control flow + Little V-P communication Unsuited to hardware implementation IPs [GKR08, CMT12, VSBW13] e.g., Muggles, CMT, Allspice – “Quasi–straight line” F – Lots of V-P communication Suited to hardware implementation
returns output y
returns output y
about the last layer
returns output y
about the last layer, ends up with claim about second-last layer
returns output y
about the last layer, ends up with claim about second-last layer
returns output y
about the last layer, ends up with claim about second-last layer
returns output y
about the last layer, ends up with claim about second-last layer
returns output y
about the last layer, ends up with claim about second-last layer
claim about inputs
returns output y
about the last layer, ends up with claim about second-last layer
claim about inputs
with the inputs V’s work ≈ O(depth · log width), so it saves work when width ≫ depth
Can V and P interact about all
correct order or P can cheat!
Can V and P interact about all
correct order or P can cheat! But: Zebra uses pipelining to parallelize several Fs.
V questions P about F(x1)’s output layer.
V questions P about F(x1)’s output layer. Simultaneously, P returns F(x2).
V questions P about F(x1)’s next layer
V questions P about F(x1)’s next layer, and F(x2)’s output layer.
V questions P about F(x1)’s next layer, and F(x2)’s output layer. Meanwhile, P returns F(x3).
This process continues until the pipeline is full.
This process continues until the pipeline is full.
This process continues until the pipeline is full. V and P can complete
step.
e.g., pipelined proving
e.g., pipelined proving
e.g., no RAM: data is kept close to places it is needed e.g., latency-insensitive design: distributed state machine avoids bottlenecks associated with central controller
e.g., pipelined proving
e.g., no RAM: data is kept close to places it is needed e.g., latency-insensitive design: distributed state machine avoids bottlenecks associated with central controller
e.g., computation: save energy by adding memoization to P e.g., hardware: save chip area by reusing the same circuits
Interaction between V and P requires a lot of bandwidth
✗ V and P on circuit board? Too much energy, circuit area
Protocol requires input-independent precomputation [Allspice13]
Interaction between V and P requires a lot of bandwidth
✗ V and P on circuit board? Too much energy, circuit area ✓ Zebra uses 3D integration
Protocol requires input-independent precomputation [Allspice13]
Interaction between V and P requires a lot of bandwidth
✗ V and P on circuit board? Too much energy, circuit area ✓ Zebra uses 3D integration
Protocol requires input-independent precomputation [Allspice13]
✓ Zebra amortizes precomputations over many V-P pairs
Interaction between V and P requires a lot of bandwidth
✗ V and P on circuit board? Too much energy, circuit area ✓ Zebra uses 3D integration
Protocol requires input-independent precomputation [Allspice13]
✓ Zebra amortizes precomputations over many V-P pairs
Several other details (see paper)
software or hardware V’s interactions with P
x y proof that y = F(x) input
Baseline: direct implementation of F in same technology as V
x y proof that y = F(x) input
Baseline: direct implementation of F in same technology as V Metrics: energy, chip size per throughput (see paper)
x y proof that y = F(x) input
Baseline: direct implementation of F in same technology as V Metrics: energy, chip size per throughput (see paper) Measurements: based on circuit synthesis and simulation, published chip designs, and CMOS scaling models Charge for V, P, communication; retrieving and decrypting precomputations; PRNG; Operator communicating with V
x y proof that y = F(x) input
Baseline: direct implementation of F in same technology as V Metrics: energy, chip size per throughput (see paper) Measurements: based on circuit synthesis and simulation, published chip designs, and CMOS scaling models Charge for V, P, communication; retrieving and decrypting precomputations; PRNG; Operator communicating with V Constraints: trusted fab = 350 nm; untrusted fab = 7 nm; 200 mm2 max chip area; 150 W max total power
350 nm: 1997 (Pentium II) 7 nm: ≈ 2017 [TSMC] ≈ 20 year gap between trusted and untrusted fab
Ratio of baseline energy to Zebra energy
6 7 8 9 10 11 12 13 0.1 0.3 1 3 log2(NTT size) baseline vs. Zebra (higher is better)
Curve25519: a commonly-used elliptic curve Point multiplication: primitive used for ECDH
Ratio of baseline energy to Zebra energy
84 170 340 682 1147 0.1 0.3 1 3 Parallel Curve25519 point multiplications baseline vs. Zebra (higher is better)
and trusted fab for V
Common to essentially all built proof systems
and trusted fab for V
Common to essentially all built proof systems
and trusted fab for V
Applies to IPs, but not arguments
Design principle IPs
[GKR08, CMT12, VSBW13]
Arguments
[GGPR13, SBVBPW13, PGHR13, BCTV14]
Extract parallelism ✓ ✓ Exploit locality ✓ Reduce, reuse, recycle ✓ Argument protocols seem friendly to hardware?
Design principle IPs
[GKR08, CMT12, VSBW13]
Arguments
[GGPR13, SBVBPW13, PGHR13, BCTV14]
Extract parallelism ✓ ✓ Exploit locality ✓ ✗ Reduce, reuse, recycle ✓ Argument protocols seem unfriendly to hardware: P computes over entire AC at once = ⇒ need RAM
Design principle IPs
[GKR08, CMT12, VSBW13]
Arguments
[GGPR13, SBVBPW13, PGHR13, BCTV14]
Extract parallelism ✓ ✓ Exploit locality ✓ ✗ Reduce, reuse, recycle ✓ ✗ Argument protocols seem unfriendly to hardware: P computes over entire AC at once = ⇒ need RAM P does crypto for every gate in AC = ⇒ special crypto circuits
Design principle IPs
[GKR08, CMT12, VSBW13]
Arguments
[GGPR13, SBVBPW13, PGHR13, BCTV14]
Extract parallelism ✓ ✓ Exploit locality ✓ ✗ Reduce, reuse, recycle ✓ ✗ Argument protocols seem unfriendly to hardware: P computes over entire AC at once = ⇒ need RAM P does crypto for every gate in AC = ⇒ special crypto circuits
x y proof that y = F(x) input
+ Verifiable ASICs: a new approach to building trustworthy hardware under a strong threat model + First hardware design for a probabilistic proof protocol + Improves performance compared to trusted baseline
x y proof that y = F(x) input
+ Verifiable ASICs: a new approach to building trustworthy hardware under a strong threat model + First hardware design for a probabilistic proof protocol + Improves performance compared to trusted baseline – Improvement compared to the baseline is modest – Applicability is limited:
precomputations must be amortized computation needs to be “big enough” large gap between trusted and untrusted technology does not apply to all computations
x y proof that y = F(x) input
+ Verifiable ASICs: a new approach to building trustworthy hardware under a strong threat model + First hardware design for a probabilistic proof protocol + Improves performance compared to trusted baseline – Improvement compared to the baseline is modest – Applicability is limited:
precomputations must be amortized computation needs to be “big enough” large gap between trusted and untrusted technology does not apply to all computations
https://www.pepper-project.org/