Verifiable ASICs: trustworthy hardware with untrusted components - PowerPoint PPT Presentation

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] F must be expressed as a layered arithmetic circuit.

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] 1. V sends inputs V P x

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] 1. V sends inputs 2. P evaluates thinking... V P x

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] 1. V sends inputs 2. P evaluates, returns output y V P x y y

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] 1. V sends inputs 2. P evaluates, returns output y 3. V constructs polynomial relating y to last layer’s input wires thinking... V P x y

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] 1. V sends inputs 2. P evaluates, returns output y 3. V constructs polynomial relating y to last layer’s input wires 4. V engages P in a sum-check V P x y . . sum-check . [LFKN90]

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] 1. V sends inputs 2. P evaluates, returns output y 3. V constructs polynomial relating y to last layer’s input wires 4. V engages P in a sum-check, gets claim about second-last layer V P x y . . sum-check . [LFKN90]

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] 1. V sends inputs 2. P evaluates, returns output y 3. V constructs polynomial relating y to last layer’s input wires 4. V engages P in a sum-check, gets claim about second-last layer 5. V iterates V P x y . . sum-check . [LFKN90] more sum-checks

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] 1. V sends inputs 2. P evaluates, returns output y 3. V constructs polynomial relating y to last layer’s input wires 4. V engages P in a sum-check, gets claim about second-last layer 5. V iterates, gets claim about inputs, which it can check V P x y . . sum-check . [LFKN90] more sum-checks

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] Soundness error ∝ p − 1 V P x y . . sum-check . [LFKN90] more sum-checks

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] Soundness error ∝ p − 1 Cost to execute F directly: O(depth · width) V ’s sequential running time: O(depth · log width + | x | + | y | ) (assuming precomputed queries) V P x y . . sum-check . [LFKN90] more sum-checks

Zebra builds on IPs of GKR [GKR08, CMT12, VSBW13] Soundness error ∝ p − 1 Cost to execute F directly: O(depth · width) V ’s sequential running time: O(depth · log width + | x | + | y | ) (assuming precomputed queries) P ’s sequential running time: O(depth · width · log width) V P x y . . sum-check . [LFKN90] more sum-checks

Extracting parallelism in Zebra P executing AC: layers are sequential, but all gates at a layer can be executed in parallel

Extracting parallelism in Zebra P executing AC: layers are sequential, but all gates at a layer can be executed in parallel Proving step: Can V and P interact about all of F’s layers at once?

Extracting parallelism in Zebra P executing AC: layers are sequential, but all gates at a layer can be executed in parallel Proving step: Can V and P interact about all of F’s layers at once? No. V must ask questions in order or soundness is lost.

Extracting parallelism in Zebra P executing AC: layers are sequential, but all gates at a layer can be executed in parallel Proving step: Can V and P interact about all of F’s layers at once? No. V must ask questions in order or soundness is lost. But: there is still parallelism to be extracted. . .

Extracting parallelism in Zebra’s P V questions P about F( x 1 )’s output layer. F( x 1 )

Extracting parallelism in Zebra’s P V questions P about F( x 1 )’s output layer. Simultaneously, P returns F( x 2 ). F( x 1 ) F( x 2 )

Extracting parallelism in Zebra’s P V questions P about F( x 1 )’s next layer F( x 1 )

Extracting parallelism in Zebra’s P V questions P about F( x 1 )’s next layer, and F( x 2 )’s output layer. F( x 1 ) F( x 2 )

Extracting parallelism in Zebra’s P V questions P about F( x 1 )’s next layer, and F( x 2 )’s output layer. Meanwhile, P returns F( x 3 ). F( x 1 ) F( x 2 ) F( x 3 )

Extracting parallelism in Zebra’s P This process continues. . . F( x 1 ) F( x 2 ) F( x 3 ) F( x 4 )

Extracting parallelism in Zebra’s P This process continues. . . F( x 1 ) F( x 2 ) F( x 3 ) F( x 4 ) F( x 5 )

Extracting parallelism in Zebra’s P F( x 1 ) This process continues F( x 2 ) until V and P interact about every layer F( x 3 ) simultaneously—but for different computations. F( x 4 ) V and P can complete one proof in each time F( x 5 ) step. F( x 6 ) F( x 7 ) F( x 8 )

Extracting parallelism in Zebra’s P with pipelining Input ( x ) P queries Sub-prover, layer d − 1 prove responses . . . . V . . queries Sub-prover, layer 1 prove responses queries Sub-prover, layer 0 prove responses Output ( y ) This approach is just a standard hardware technique, pipelining; it is possible because the protocol is naturally staged.

Extracting parallelism in Zebra’s P with pipelining Input ( x ) P queries Sub-prover, layer d − 1 prove responses . . . . V . . queries Sub-prover, layer 1 prove responses queries Sub-prover, layer 0 prove responses Output ( y ) This approach is just a standard hardware technique, pipelining; it is possible because the protocol is naturally staged. There are other opportunities to leverage the protocol’s structure.

Per-layer computations For each sum-check round, P sums over each gate in a layer

Per-layer computations � H[ k ] = δ ( g , k ) g ∈ layer For each sum-check round, P layer: sums over each gate in a layer, evaluating H[ k ], k ∈ { 0, 1, 2 }

Per-layer computations � H[ k ] = δ ( g , k ) g ∈ layer For each sum-check round, P layer: sums over each gate in a layer, evaluating H[ k ], k ∈ { 0, 1, 2 } In software: // compute H[0], H[1], H[2] for k ∈ {0, 1, 2}: H[k] ← 0 for g ∈ layer: H[k] ← H[k] + δ (g, k) // δ uses state[g] // update lookup table // with V ’s random coin for g ∈ layer: state[g] ← δ (g, r j )

Per-layer computations � H[ k ] = δ ( g , k ) g ∈ layer For each sum-check round, P layer: sums over each gate in a layer, evaluating H[ k ], k ∈ { 0, 1, 2 } In hardware: In software: gate gate gate gate // compute H[0], H[1], H[2] δ (3, 0) . . . prover prover prover prover for k ∈ {0, 1, 2}: δ (0, 0) δ (1, 0) δ (2, 0) H[k] ← 0 for g ∈ layer: H[k] ← H[k] + δ (g, k) // δ uses state[g] // update lookup table // with V ’s random coin for g ∈ layer: state[g] ← δ (g, r j )

Per-layer computations � H[ k ] = δ ( g , k ) g ∈ layer For each sum-check round, P layer: sums over each gate in a layer, evaluating H[ k ], k ∈ { 0, 1, 2 } In hardware: In software: gate gate gate gate // compute H[0], H[1], H[2] δ (3, 0) . . . prover prover prover prover for k ∈ {0, 1, 2}: δ (0, 0) δ (1, 0) δ (2, 0) H[k] ← 0 for g ∈ layer: H[k] ← H[k] + δ (g, k) // δ uses state[g] // update lookup table // with V ’s random coin for g ∈ layer: RAM state[g] ← δ (g, r j )

Per-layer computations � H[ k ] = δ ( g , k ) g ∈ layer For each sum-check round, P layer: sums over each gate in a layer, evaluating H[ k ], k ∈ { 0, 1, 2 } In hardware: In software: gate gate gate gate // compute H[0], H[1], H[2] δ (3, 0) . . . prover prover prover prover for k ∈ {0, 1, 2}: δ (0, 0) δ (1, 0) δ (2, 0) H[k] ← 0 for g ∈ layer: H[k] ← H[k] + δ (g, k) // δ uses state[g] // update lookup table + + // with V ’s random coin + for g ∈ layer: RAM Adder tree state[g] ← δ (g, r j )

Per-layer computations � H[ k ] = δ ( g , k ) g ∈ layer For each sum-check round, P layer: sums over each gate in a layer, evaluating H[ k ], k ∈ { 0, 1, 2 } In hardware: In software: gate gate gate gate // compute H[0], H[1], H[2] prover prover prover prover for k ∈ {0, 1, 2}: δ (0, 0) δ (1, 0) δ (2, 0) δ (3, 0) δ (3, 1) . . . H[k] ← 0 δ (0, 1) δ (1, 1) δ (2, 1) for g ∈ layer: H[k] ← H[k] + δ (g, k) // δ uses state[g] // update lookup table + + // with V ’s random coin + for g ∈ layer: RAM Adder tree state[g] ← δ (g, r j )

Per-layer computations � H[ k ] = δ ( g , k ) g ∈ layer For each sum-check round, P layer: sums over each gate in a layer, evaluating H[ k ], k ∈ { 0, 1, 2 } In hardware: In software: gate gate gate gate // compute H[0], H[1], H[2] prover prover prover prover for k ∈ {0, 1, 2}: δ (0, 0) δ (1, 0) δ (2, 0) δ (3, 0) δ (3, 2) . . . H[k] ← 0 δ (0, 1) δ (1, 1) δ (2, 1) δ (3, 1) for g ∈ layer: δ (0, 2) δ (1, 2) δ (2, 2) H[k] ← H[k] + δ (g, k) // δ uses state[g] // update lookup table + + // with V ’s random coin + for g ∈ layer: RAM Adder tree state[g] ← δ (g, r j )

Per-layer computations � H[ k ] = δ ( g , k ) g ∈ layer For each sum-check round, P layer: sums over each gate in a layer, evaluating H[ k ], k ∈ { 0, 1, 2 } In hardware: In software: gate gate gate gate // compute H[0], H[1], H[2] prover prover prover prover for k ∈ {0, 1, 2}: δ (0, 0) δ (1, 0) δ (2, 0) δ (3, 0) . . . H[k] ← 0 δ (0, 1) δ (1, 1) δ (2, 1) δ (3, 1) for g ∈ layer: δ (0, 2) δ (1, 2) δ (2, 2) δ (3, 2) H[k] ← H[k] + δ (g, k) δ (0, r j ) δ (1, r j ) δ (2, r j ) δ (3, r j ) // δ uses state[g] // update lookup table + + // with V ’s random coin + for g ∈ layer: RAM Adder tree state[g] ← δ (g, r j )

Per-layer computations � H[ k ] = δ ( g , k ) g ∈ layer For each sum-check round, P layer: sums over each gate in a layer, evaluating H[ k ], k ∈ { 0, 1, 2 } In hardware: In software: state[0] state[1] state[2] state[3] // compute H[0], H[1], H[2] for k ∈ {0, 1, 2}: gate gate gate gate H[k] ← 0 prover prover prover prover δ (0, 0) δ (1, 0) δ (2, 0) δ (3, 0) for g ∈ layer: . . . δ (0, 1) δ (1, 1) δ (2, 1) δ (3, 1) H[k] ← H[k] + δ (g, k) δ (0, 2) δ (1, 2) δ (2, 2) δ (3, 2) // δ uses state[g] δ (0, r j ) δ (1, r j ) δ (2, r j ) δ (3, r j ) // update lookup table // with V ’s random coin + + for g ∈ layer: + state[g] ← δ (g, r j ) Adder tree

Zebra’s design approach ✓ Extract parallelism e.g., pipelined proving e.g., parallel evaluation of δ by gate provers ✓ Exploit locality: distribute data and control e.g., no RAM: data is kept close to places it is needed

Zebra’s design approach ✓ Extract parallelism e.g., pipelined proving e.g., parallel evaluation of δ by gate provers ✓ Exploit locality: distribute data and control e.g., no RAM: data is kept close to places it is needed e.g., latency-insensitive design: localized control

Zebra’s design approach ✓ Extract parallelism e.g., pipelined proving e.g., parallel evaluation of δ by gate provers ✓ Exploit locality: distribute data and control e.g., no RAM: data is kept close to places it is needed e.g., latency-insensitive design: localized control ✓ Reduce, reuse, recycle e.g., computation: save energy by adding memoization to P e.g., hardware: save chip area by reusing the same circuits

Architectural challenges Interaction between V and P requires a lot of bandwidth ✗ V and P on circuit board? Too much energy, circuit area

Architectural challenges Interaction between V and P requires a lot of bandwidth ✗ V and P on circuit board? Too much energy, circuit area ✓ Zebra uses 3D integration

Architectural challenges Interaction between V and P requires a lot of bandwidth ✗ V and P on circuit board? Too much energy, circuit area ✓ Zebra uses 3D integration Protocol requires input-independent precomputation [VSBW13]

Architectural challenges Interaction between V and P requires a lot of bandwidth ✗ V and P on circuit board? Too much energy, circuit area ✓ Zebra uses 3D integration Protocol requires input-independent precomputation [VSBW13] ✓ Zebra amortizes precomputations over many V - P pairs

Architectural challenges Interaction between V and P requires a lot of bandwidth ✗ V and P on circuit board? Too much energy, circuit area ✓ Zebra uses 3D integration Protocol requires input-independent precomputation [VSBW13] ✓ Zebra amortizes precomputations over many V - P pairs Precomputations need secrecy, integrity ✗ Give V trusted storage? Cost would be prohibitive V input x y P output pre i proof that y = F( x )

Architectural challenges Interaction between V and P requires a lot of bandwidth ✗ V and P on circuit board? Too much energy, circuit area ✓ Zebra uses 3D integration Protocol requires input-independent precomputation [VSBW13] ✓ Zebra amortizes precomputations over many V - P pairs Precomputations need secrecy, integrity ✗ Give V trusted storage? Cost would be prohibitive ✓ Zebra uses untrusted storage + authenticated encryption input x V y P E k (pre i ) output proof that y = F( x )

Implementation Zebra’s implementation includes • a compiler that produces synthesizable Verilog for P • two V implementations • hardware (Verilog) • software (C++) • library to generate V ’s precomputations • Verilog simulator extensions to model software or hardware V ’s interactions with P

. . . and it seemed to work really well! Zebra can produce 10k–100k proofs per second, while existing systems take tens of seconds per proof!

. . . and it seemed to work really well! Zebra can produce 10k–100k proofs per second, while existing systems take tens of seconds per proof! But that’s not a serious evaluation. . .

Evaluation method input x F V y P vs. output proof that y = F( x ) Baseline: direct implementation of F in same technology as V

Evaluation method input x F V y P vs. output proof that y = F( x ) Baseline: direct implementation of F in same technology as V Metrics: energy, chip size per throughput (discussed in paper)

Evaluation method input x F V y P vs. output proof that y = F( x ) Baseline: direct implementation of F in same technology as V Metrics: energy, chip size per throughput (discussed in paper) Measurements: based on circuit synthesis and simulation, published chip designs, and CMOS scaling models Charge for V , P , communication; retrieving and decrypting precomputations; PRNG; Operator communicating with V

Evaluation method input x F V y P vs. output proof that y = F( x ) Baseline: direct implementation of F in same technology as V Metrics: energy, chip size per throughput (discussed in paper) 350 nm: 1997 (Pentium II) 7 nm: ≈ 2017 [TSMC] Measurements: based on circuit synthesis and simulation, ≈ 20 year gap between published chip designs, and CMOS scaling models trusted and untrusted fab Charge for V , P , communication; retrieving and decrypting precomputations; PRNG; Operator communicating with V Constraints: trusted fab = 350 nm; untrusted fab = 7 nm; 200 mm 2 max chip area; 150 W max total power

Application #1: number theoretic transform NTT: a Fourier transform over F p Widely used, e.g., in computer algebra

Application #1: number theoretic transform Ratio of baseline energy to Zebra energy baseline vs. Zebra (higher is better) 3 1 0.3 0.1 6 7 8 9 10 11 12 13 log 2 (NTT size)

Application #2: Curve25519 point multiplication Curve25519: a commonly-used elliptic curve Point multiplication: primitive, e.g., for ECDH

Application #2: Curve25519 point multiplication Ratio of baseline energy to Zebra energy baseline vs. Zebra (higher is better) 3 1 0.3 0.1 84 170 340 682 1147 Parallel Curve25519 point multiplications

A qualified success Zebra: a hardware design that saves costs. . . . . . sometimes .

Summary of Zebra’s applicability 1. Computation F must have a layered, shallow, deterministic AC 2. Must have a wide gap between cutting-edge fab (for P ) and trusted fab (for V ) 3. Amortizes precomputations over many instances 4. Computation F must be very large for V to save work 5. Computation F must be efficient as an arithmetic circuit

Summary of Zebra’s applicability Applies to IPs, but not arguments 1. Computation F must have a layered, shallow, deterministic AC 2. Must have a wide gap between cutting-edge fab (for P ) and trusted fab (for V ) 3. Amortizes precomputations over many instances 4. Computation F must be very large for V to save work 5. Computation F must be efficient as an arithmetic circuit

Arguments versus IPs, redux IPs Arguments Design principle [GKR08, CMT12, [GGPR13, SBVBPW13, VSBW13] PGHR13, BCTV14] Extract parallelism ✓ ✓ Exploit locality ✓ Reduce, reuse, recycle ✓ Argument protocols seem friendly to hardware?

Arguments versus IPs, redux IPs Arguments Design principle [GKR08, CMT12, [GGPR13, SBVBPW13, VSBW13] PGHR13, BCTV14] Extract parallelism ✓ ✓ Exploit locality ✓ ✗ Reduce, reuse, recycle ✓ Argument protocols seem unfriendly to hardware: P computes over entire AC at once = ⇒ need RAM

Arguments versus IPs, redux IPs Arguments Design principle [GKR08, CMT12, [GGPR13, SBVBPW13, VSBW13] PGHR13, BCTV14] Extract parallelism ✓ ✓ Exploit locality ✓ ✗ Reduce, reuse, recycle ✓ ✗ Argument protocols seem unfriendly to hardware: P computes over entire AC at once = ⇒ need RAM P does crypto for every gate in AC = ⇒ special crypto circuits

Arguments versus IPs, redux IPs Arguments Design principle [GKR08, CMT12, [GGPR13, SBVBPW13, VSBW13] PGHR13, BCTV14] Extract parallelism ✓ ✓ Exploit locality ✓ ✗ Reduce, reuse, recycle ✓ ✗ Argument protocols seem unfriendly to hardware: P computes over entire AC at once = ⇒ need RAM P does crypto for every gate in AC = ⇒ special crypto circuits . . . but we hope these issues are surmountable!

Summary of Zebra’s applicability 1. Computation F must have a layered, shallow, deterministic AC 2. Must have a wide gap between cutting-edge fab (for P ) and trusted fab (for V ) 3. Amortizes precomputations over many instances 4. Computation F must be very large for V to save work 5. Computation F must be efficient as an arithmetic circuit Common to essentially all built proof systems

Verifiable ASICs: trustworthy hardware with untrusted components - PowerPoint PPT Presentation

Verifiable ASICs: trustworthy hardware with untrusted components Riad S. Wahby , Max Howald , Siddharth Garg , abhi shelat , and Michael Walfish Stanford University New York University The Cooper Union

When NOT to Use ASICs When NOT to Use ASICs Rick Van Berg HEPIC2013 When NOT to Use ASICs When

Verifiable ASICs: trustworthy hardware with untrusted components Riad S. Wahby , Max

Verifiable ASICs: trustworthy hardware with untrusted components Riad S. Wahby , Max

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

Confinement (Running Untrusted Programs) Chester Rebeiro Indian Institute of Technology Madras

ECON ASICs Gregory Deptuch, Zoltan Gecse, Jim Hirschauer, Sandeep Miryala, Paul Rubinov ASICs

VERIFIABLE DELAY FUNCTIONS Benjamin Wesolowski VERIFIABLE DELAY FUNCTIONS How to slow things

Hardware Observability Framework Hardware Observability Framework Hardware Observability

TCIPG TECHNICAL CLUSTERS AND THREADS Trustworthy Trustworthy Technologies for Wide Technologies

Get to ASICs Faster Get to ASICs Faster A Novel Mixed Signal Design Methodology Dr Greg

Placement Challenges for Structured Placement Challenges for Structured g g ASICs ASICs

SoC Design SoC Design g Lecture 4: Programmable ASICs L Lecture 4: Programmable ASICs L 4 P

ECON ASICs Jim Hirschauer, Ralph Wickwire ASICs PMG 11 Nov 2019 DOE CD-1 IPR and CERN P2UG

Verifiable ASICs Aarhus Workshop on Secure Multiparty Computation 1 June 2016 Michael Walfish

Generating Verifiable Java Code from Verified PVS Specifications NFM2012 Generating Verifiable

Verifiable Random Functions and Verifiable Delay Functions Caleb Smith University of

Zebrackets: A Score of Years and Delimiters Michael Cohen, Blanca Mancilla, and John Plaice

ZEBRA TECHNOLOGIES ZEBRA TECHNOLOGIES Upcoming Dev Events October 2020 - Zebra DevTalk Community

Pitfalls in Designing Zero-Effort Deauthentication: Opportunistic Human Observation Attacks Otto

Zebra Technologies Second Quarter 2019 Results July 30, 2019 1 Safe Harbor Statement

Generics: Inference & Accommodation Greg Restall constructing social hierarchy 2 mit 6

Constraint Satisfaction Search techniques make choices in an often Problems (CSP) arbitrary

Composer Best Practices 2018 Nils Adermann @naderman Private Packagist https://packagist.com

Condit itio ional Generativ ive Adversaria ial Networks (cGANs) Prof. Leal-Taix and Prof.

Verifiable ASICs: trustworthy hardware with untrusted components - PowerPoint PPT Presentation

Verifiable ASICs: trustworthy hardware with untrusted components Riad S. Wahby , Max Howald , Siddharth Garg , abhi shelat , and Michael Walfish Stanford University New York University The Cooper Union

When NOT to Use ASICs When NOT to Use ASICs Rick Van Berg HEPIC2013 When NOT to Use ASICs When

Verifiable ASICs: trustworthy hardware with untrusted components Riad S. Wahby , Max

Verifiable ASICs: trustworthy hardware with untrusted components Riad S. Wahby , Max

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

Confinement (Running Untrusted Programs) Chester Rebeiro Indian Institute of Technology Madras

ECON ASICs Gregory Deptuch, Zoltan Gecse, Jim Hirschauer, Sandeep Miryala, Paul Rubinov ASICs

VERIFIABLE DELAY FUNCTIONS Benjamin Wesolowski VERIFIABLE DELAY FUNCTIONS How to slow things

Hardware Observability Framework Hardware Observability Framework Hardware Observability

TCIPG TECHNICAL CLUSTERS AND THREADS Trustworthy Trustworthy Technologies for Wide Technologies

Get to ASICs Faster Get to ASICs Faster A Novel Mixed Signal Design Methodology Dr Greg

Placement Challenges for Structured Placement Challenges for Structured g g ASICs ASICs

SoC Design SoC Design g Lecture 4: Programmable ASICs L Lecture 4: Programmable ASICs L 4 P

ECON ASICs Jim Hirschauer, Ralph Wickwire ASICs PMG 11 Nov 2019 DOE CD-1 IPR and CERN P2UG

Verifiable ASICs Aarhus Workshop on Secure Multiparty Computation 1 June 2016 Michael Walfish

Generating Verifiable Java Code from Verified PVS Specifications NFM2012 Generating Verifiable

Verifiable Random Functions and Verifiable Delay Functions Caleb Smith University of

Zebrackets: A Score of Years and Delimiters Michael Cohen, Blanca Mancilla, and John Plaice

ZEBRA TECHNOLOGIES ZEBRA TECHNOLOGIES Upcoming Dev Events October 2020 - Zebra DevTalk Community

Pitfalls in Designing Zero-Effort Deauthentication: Opportunistic Human Observation Attacks Otto

Zebra Technologies Second Quarter 2019 Results July 30, 2019 1 Safe Harbor Statement

Generics: Inference &amp; Accommodation Greg Restall constructing social hierarchy 2 mit 6

Constraint Satisfaction Search techniques make choices in an often Problems (CSP) arbitrary

Composer Best Practices 2018 Nils Adermann @naderman Private Packagist https://packagist.com

Condit itio ional Generativ ive Adversaria ial Networks (cGANs) Prof. Leal-Taix and Prof.

Generics: Inference & Accommodation Greg Restall constructing social hierarchy 2 mit 6