SW/HW Codesign of the Post-Quantum Cryptography Algorithm - - PowerPoint PPT Presentation

sw hw codesign of the post quantum cryptography algorithm
SMART_READER_LITE
LIVE PREVIEW

SW/HW Codesign of the Post-Quantum Cryptography Algorithm - - PowerPoint PPT Presentation

SW/HW Codesign of the Post-Quantum Cryptography Algorithm NTRUEncrypt Using HLS and RTL Design Methodologies Farnoud Farahmand, Duc Tri Nguyen, Viet B. Dang*, Ahmed Ferozpuri and Kris Gaj George Mason University Post st-Quantum Quantum


slide-1
SLIDE 1

SW/HW Codesign of the Post-Quantum Cryptography Algorithm NTRUEncrypt Using HLS and RTL Design Methodologies

Farnoud Farahmand, Duc Tri Nguyen, Viet B. Dang*, Ahmed Ferozpuri and Kris Gaj George Mason University

slide-2
SLIDE 2

Post st-Quantum Quantum Crypt ptograph graphy y (PQC QC)

Ongoi

  • ing

ng NIST PQC standa andardiz dizat ation ion proc

  • ces

ess Total 69 submis missions sions in Round 1 and 26 submis missions ions qualified to Round 2

Challen enges ges

Math athema ematic tical al co comple plexity xity Large amount of ma man-power er New types of basic sic operations erations Constant stant-time time implementations Need for new w SCA (Side de-Ch Chann nnel el Attac ack) k) co countermea ermeasures ures against power and electromagnetic analysis

2

slide-3
SLIDE 3

Risk sks s of Ea Early ly Hardw dware are Impl plementations ementations

3

GMU implemen lementat tation n of DAGS S develope

  • ped

d in Fall l 2017-Spring pring 2018 18. Prelim elimin inar ary results sults present esented ed at the Code-Based Based Crypt yptograph

  • graphy

y (CBC BC) works rkshop

  • p in April

il 2018. 18. Attac tack k against inst DAGS S announce unced d on May 16, 2018. 18. DAGS S not qualif ifie ied d to Round d 2

slide-4
SLIDE 4

Softw tware/Har are/Hardw dware are Codesign esign

Most t time me-crit critic ical al

  • perat

ration ion Softw tware are RTL RTL or HLS LS-generat generated ed Hardw dware are

4

slide-5
SLIDE 5

SW/HW Codesign esign for PQC QC: : Advantages antages

5

Focus us on a few (typical pically y 1-3) ) major

  • r operati

tions ns, , known wn to be e easily ly paral alle leliz izab able le

muc uch h shor

  • rter

er developme elopment nt time me (at least t by a fa factor

  • r of 10)

gua uarant nteed ed sub ubsta tantial ntial speed ed-up up

Insight ight regardin ding performanc rmance of future ure instru ruct ctio ion n set et extensio sions ns

  • f modern

n micropr process cessor

  • rs

Possibili bility ty of impleme lement nting ing multipl iple e candid didat ates s by the same research ch group, , eliminating inating the influence uence of different rent design gn skills ls

  • peratio

tion n subset et (e.g., ., includin uding or excluding ding key generatio ation) n) interfac ace & prot

  • tocol

col

  • ptimi

mizatio ation n target et platform rm

slide-6
SLIDE 6

Two Major jor Types pes of Platf atforms

  • rms

6

FPGA A Fabric ric & Hard-core

  • re Proces

essor sors FPGA A Fabric ric, , including uding Soft-core

  • re Processor

cessors

Examples:

  • Xilinx Zynq 7000 System on Chip (SoC)
  • Xilinx Zynq UltraScale+ MPSoC
  • Intel Arria 10 SoC FPGAs
  • Intel Stratix 10 SoC FPGAs

Examples: Xilinx Virtex UltraScale+ FPGAs Intel Stratix 10 FPGAs, including

  • Xilinx MicroBlaze
  • Intel Nios II
  • RISC-V, originally UC Berkeley

Processor w/ Memory & I/O FPGA Fabric FPGA Fabric Soft-core Processor

slide-7
SLIDE 7

Sel elect ected ed Platf atform

  • rm

7

FPGA A Famil ily: Xilinx inx Zynq UltraS traScale ale+ + MPSoC SoC Device: e: XCZU9E U9EG-2FF FFVB1 VB1156E 6E Prototy typing ping Board: d: ZCU102 2 Evalu luation ation Kit t from m Xilinx inx Processing cessing Syst stem em: Qua uad-cor

  • re ARM Cortex-A53

A53 Applic ication ation Proc

  • cessing

essing Unit Unit, running at the frequency of 1.2 GHz (only one core used for benchmarking) Progr gramm ammable ble Logic ic: Config igura urable ble Logic ic Bloc

  • cks

ks (CLB), LB), Block ck RAMs, , DSP P units ts

slide-8
SLIDE 8

Ex Expe perim rimental ental Setup etup

8

Output FIFO Input FIFO Hardware Accelerator Zynq Processing System AXI DMA

FIFO Interface FIFO Interface AXI Stream Interface AXI Stream Interface AXI Lite Interface AXI Full Interface AXI Lite Interface IRQ

Clocking wizard

rd_clk wr_clk wr_clk rd_clk clk

UUT_clk

Main Clock

AXI Lite Interface

AXI Timer

AXI Lite Interface

slide-9
SLIDE 9

Sel elect ected ed Algorit

  • rithm

hm

NTRUEncrypt ypt is one of the most well-known PQC algorithms that has withstood cryptanalysis. The speed of NTRUEncrypt in software, especially on embedded software platforms, is limited by the long execution time of polyno nomia mial l multip iplicatio lication. We implement two variants of the NIST Round 1 PQC candidate NTRUEncrypt ypt: ntru-pke-443 and ntru-pke-743 in bare-met metal al mode. Polynomial multiplication is implemented in the Programmable Logic (PL) of Zynq using two approaches RT RTL and HLS HLS

slide-10
SLIDE 10

Accelerat celerator

  • r De

Desi sign gn

10 Target: t: Minimum mum Ex Execut cution ion Time me

Register-Transfer Level methodology with VHDL Block diagram of the Datapath and Algorithmic State Machine (ASM) chart of the Controller High Level Synthesis methodology with C Goal: The same or comparable number of clock cycles as in the Register-Transfer Level (manual) implementation in VHDL Attem empt pt 1: Reference implementation based on the grade school algorithm for multiplication (a.k.a. schoolbook, paper-and-pencil, etc.) Attem empt pt 2: Optimized implementation based on rotation Multiple attempts at optimization using Vivado HLS directives (pragmas) and minor code changes Outcome come 1: Tens s of thousa usands nds of clock ck cycles es, compared to the expected n=743 clock cycles Soluti tion:

  • n: Rewriting the code in C in such a way to match the

block diagram used to generate VHDL code Outcome come 2: Expected functionality Around d n clock ck cycles es of the execution time

slide-11
SLIDE 11

Speed-up achieved for Polynomial Multiplication

11

89.1 82.8 128.5 119.8 81.9 76.1 106.8 99.6

20 40 60 80 100 120 140

ntru-pke-443 ENC Speed up ntru-pke-443 DEC Speed up ntru-pke-743 ENC Speed up ntru-pke-743 DEC Speed up

RTL HLS

slide-12
SLIDE 12

Total Speed-up achieved for entire ENC/DEC

12

2.4 4 3.9 6.8 2.3 4 3.9 6.8

1 2 3 4 5 6 7 8 ntru-pke-443 ENC Total Speed-up ntru-pke-443 DEC Total Speed up ntru-pke-743 ENC Total Speed-up ntru-pke-743 DEC Total Speed up

RTL HLS

slide-13
SLIDE 13

Resource Utilization

13

44,257 51,953 76,972 95,329 29,655 49,293 49,674 82,221 7,802 9,413 11,425 16,686 1 1 1 1 20,000 40,000 60,000 80,000 100,000

RTL ntru-pke-443 HLS ntru-pke-443 RTL ntru-pke-743 HLS ntru-pke-743

LUTs FFs Slices BRAMs

slide-14
SLIDE 14

Q&A

14

Sug uggestions gestions?

CERG: G: http:// tp://cr cryp yptograph graphy.g .gmu.ed mu.edu ATHE HENa Na: : http:/ ttp://cr crypt yptogr graph phy.g .gmu.edu mu.edu/ath athen ena

Questions uestions? Comments? mments? Th Thank ank Yo You! u!