SLIDE 1 Measuring Long Wire Leakage with Ring Oscillators in Cloud FPGAs
Ilias Giechaskiel
†‡
Kasper B. Rasmussen
†
Jakub Szefer
‡
9 September 2019
†University of Oxford ‡Yale University
SLIDE 2
Cloud FPGAs
FPGAs now offered by cloud providers → Virtex UltraScale+ on Alibaba, Amazon, Huawei → Kintex UltraScale on Baidu, Tencent → Intel Arria 10 on Alibaba, OVH What about malicious designs? → Hide physical aspects (DRAM, PCIe, Clock, . . .) → Prohibit combinatorial loops (e.g., ring oscillators)
1
SLIDE 3
Latch-Based RO
Latches are level-sensitive, so they act as buffers: when the gate G is active, the output Q mirrors the input D.
2
SLIDE 4
Flip-Flop-Based RO
For a fmip-fmop-based buffer, use a Flip-Flop with Asynchronous Preset PRE: when PRE is high, the output Q is also high. When the clock C rises, Q mirrors the input D.
3
SLIDE 5
Long Wire Leakage
Earlier work: Virtex 5 & 6, Artix & Spartan 7 covert channels This work: Virtex UltraScale+ leakage (on the cloud!)
4
SLIDE 6 Latch-Based Results
Experiments with 1 Local, 8 Amazon, 2 Huawei FPGAs
2 4 6 8 AWS 0 AWS 1 AWS 2 AWS 3 AWS 4 AWS 5 AWS 6 AWS 7 Huawei 0 Huawei 1 VCU118
FPGA Board Latch Per-Long Delay Difference Δd L
LD (fs)
Super Logic Region
1 2
→ ∆dLD
L > 0 =
⇒ leakage detectable on all FPGAs → Process variations between FPGAs → Variations within FPGAs (between Super Logic Regions)
5
SLIDE 7 Flip-Flop-Based Results
Estimates with Flip-Flop ROs are very close::
2 4 6 8 AWS 0 AWS 1 AWS 2 AWS 3 AWS 4 AWS 5 AWS 6 AWS 7 Huawei 0 Huawei 1 VCU118
FPGA Board
- Reg. Per-Long Delay Difference Δd L
FF (fs)
Super Logic Region
1 2
Same with Lookup-Table ROs (all within 10%)
6
SLIDE 8
Conclusions
→ Latch-based and fmip-fmop-based ROs can overcome combinatorial loop restrictions → Virtex UltraScale+ FPGA long wires different from earlier generations, but still leak information about their state → The three RO designs provide identical leakage estimates → Comparison among 33 super logic regions in local, Amazon, and Huawei FPGAs revealed process variations → Questions? ilias.giechaskiel@cs.ox.ac.uk
7
SLIDE 9
Super Logic Regions
SLIDE 10
Routing Example
SLIDE 11
Virtex UltraScale+ Leakage Example
200 400 600 800 1,000 1,200 1,400 Sample i 7,235,500 7,236,000 7,236,500 7,237,000 7,237,500 7,238,000 Ring Oscillator Count ci Transmitted Value 1
SLIDE 12 Virtex UltraScale+ Leakage Characterization
1 2 3 4 5 6 7 8 9 Number of Buffer Longs vt 1 2 3 4 5 6 Absolute Delay Difference ∆dRO (s) ×10−14 Number of RO Longs vr 1 2 3 4 5 6 7 8 9
Femtosecond-scale change in delay is proportional to the
- verlap between the receiver and the transmitter
SLIDE 13 Flip-Flop- and Lookup-Table-Based Ratios
Flip−Flop: ∆dL
FDPE
∆dL
LD
LUT: ∆dL
LUT
∆dL
LD 0.85 0.90 0.95 1.00 1.05 1.10 AWS 0 AWS 1 AWS 2 AWS 3 AWS 4 AWS 5 AWS 6 AWS 7 Huawei 0 Huawei 1 VCU118 Huawei 0 Huawei 1 VCU118
FPGA Board Ring Oscillator Per−Long Delay Ratio Super Logic Region
1 2
SLIDE 14 Property Virtex 5 Virtex 6 Series 7 Virtex US+ Node Size (nm) 65 40 28 16 VLONG Length 18 16 18 12 VLONG Taps 2 1 1 VLONG Bidirectional?
VLONGs/CLB 2 2 2 2 × 8
SLIDE 15 Metrics
∆RC = C1
RO − C0 RO
C1
RO
(1) ∆dRO = 1 2 1 f 0
RO
− 1 f 1
RO
RO − f 0 RO
2f 0
ROf 1 RO
(2) ∆dL = ∆dRO n = 1 n · CCLK 2fCLK · C1
RO − C0 RO
C0
ROC1 RO
(3)
SLIDE 16 Relative Count Difference
1 2 3 4 5 6 7 8 9 Number of Transmitter Longs vt 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Relative Count Difference ∆RC ×10−5
# Receiver Longs vr
1 2 3 4 5 6 7 8 9
SLIDE 17
Countermeasures
→ Routing Restrictions: Enforce physical isolation between users and potentially-malicious cores. → Design Rule Checks: Place restrictions on the generated bitstreams, including prohibiting combinatorial loops, latches, and non-shell clocks. → Runtime Protections: Gate clocks and clear the FPGA in response to detected malicious designs.