Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
DATA FLOW ORIENTED HARDWARE DESIGN OF RNS-BASED POLYNOMIAL - - PowerPoint PPT Presentation
DATA FLOW ORIENTED HARDWARE DESIGN OF RNS-BASED POLYNOMIAL - - PowerPoint PPT Presentation
Jol Cathbras Alexandre Carbon Peter Milder Renaud Sirdey Nicolas Ventroux DATA FLOW ORIENTED HARDWARE DESIGN OF RNS-BASED POLYNOMIAL MULTIPLICATION FOR SHE ACCELERATION Conference on Cryptographic Hardware and Embedded Systems 2018 |
| 2
IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
- Handling polynomial of πΊ = β€ π /(πΊ(π)) and πΊπ = πΊ/ππΊ:
- Modulus π ~ several hundred of bits
- deg(πΊ) ~ several thousand
Security Multiplicative depth
Impact
| 3
IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
π =
π=1 π
ππ
- Residue Number System:
ππ1 ππ1 π
π1
Γ
π1 πππ πππ π
ππ
Γ
ππ πππ πππ π
ππ
Γ
ππ
β¦ β¦
Γ
π π π
β
π
- Handling polynomial of πΊ = β€ π /(πΊ(π)) and πΊπ = πΊ/ππΊ:
- Modulus π ~ several hundred of bits
- deg(πΊ) ~ several thousand
Security Multiplicative depth
Impact
| 4
IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
π =
π=1 π
ππ
- Residue Number System:
ππ1 ππ1 π
π1
Γ
π1 πππ πππ π
ππ
Γ
ππ πππ πππ π
ππ
Γ
ππ
β¦ β¦
Γ
π π π
β
π
- Handling polynomial of πΊ = β€ π /(πΊ(π)) and πΊπ = πΊ/ππΊ:
- Modulus π ~ several hundred of bits
- deg(πΊ) ~ several thousand
- Bajard et al. in 2016, further simplified by Halevi et al. in 2018 :
- RNS compatible FV. Decπππ and FV. Mult&Relinπππ.
- New ππππππ: pair of π Γ πβmatrices with elements in πππ for π in 1, β¦ , π.
- Performance bottleneck: Residue Polynomial Multiplication (πππβs products)
Security Multiplicative depth
Impact
| 5
IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES
- Negative Wrapped Convolution over πππ = β€ππ π /(πΊ(π)):
- No polynomial modular reduction.
- Restrict the choice of πΊ π = ππ + 1 with π a power of 2.
- Restrict the choice of ππ: ππ β‘ 1 mod 2π.
- 2ππ precomputed values: (ππ
π)0β€π<2π, where ππ a π-th primitive root of -1 in β€ππ β .
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
π =
π=1 π
ππ
- Residue Number System:
ππ1 ππ1 π
π1
Γ
π1 πππ πππ π
ππ
Γ
ππ πππ πππ π
ππ
Γ
ππ
β¦ β¦
Γ
π π π
β
π
- Handling polynomial of πΊ = β€ π /(πΊ(π)) and πΊπ = πΊ/ππΊ:
- Modulus π ~ several hundred of bits
- deg(πΊ) ~ several thousand
- Bajard et al. in 2016, further simplified by Halevi et al. in 2018 :
- RNS compatible FV. Decπππ and FV. Mult&Relinπππ.
- New ππππππ: pair of π Γ πβmatrices with elements in πππ for π in 1, β¦ , π.
- Performance bottleneck: Residue Polynomial Multiplication (πππβs products)
Security Multiplicative depth
Impact
| 6
RELATED WORKS (HARDWARE ACCELERATION)
- Migliore et al. 2018: Karatsuba rather than NWC (no RNS)
- Finer choice of πΊ(π) allowing batching of binary messages.
- Asymptotic complexity in π(π1,585) Vs π(π log π): turning point (π = 6144, log2 π = 512).
Not sufficient to target large multiplicative depth.
- ΓztΓΌrk et al. 2015: RNS and NTT approach for LTV scheme (no NWC)
- Memory-access iterative NTT.
- External pre-computation of NTT twiddle factors.
Use communication bandwidth for non-payload data.
- Cousins et al. 2017: RNS and NTT approach for LTV scheme
- Dataflow oriented pipelined NTT.
- Local storage of all twiddle factors at compile time.
Storage cost in O(ππ), dependent of RNS basis size.
- Sinha Roy et al. 2015: RNS and NTT (no NWC) approach for RLWE-based scheme
- Memory-access iterative NTT.
- Local storage of a subset of the twiddle factors, and computation on-the-fly of the others.
Better storage in O(π log π), but still dependent of RNS basis size.
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
Dataflow oriented NWC with on-the-fly computation of twiddle factors
| 7
NWC ARCHITECTURE PRINCIPLE
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
GEN ITW
ππ π€π ππ
1
GEN PCTW GEN TW
ππ
β1
ππ
π₯
β¦
ππ
β1
(ππ, π€π, Ξ¨π) (ππ, π€π, Ξ©π) (ππ, π€π) (ππ, π€π, Ξ©π
β1)
(ππ, π€π, ππ
β1 β Ξ¨π β1)
VEC PW MM π΅π πΆπ VEC NTT PW MM NTT PW MM π·π twiddle flow data flow Ξ©π β Ξ¨π and Ξ©π
β1 β Ξ¨π β1 (ππ = ππ 2 mod ππ)
One NWC over πΊ βΊ O(π) smaller NWC over the πΊππβs : π·π = NWCπ π΅π, πΆπ
- Architecture principle:
- Required values for NWCπ:
- ππ: a π-th primitive root of -1 over β€ππ
β β ππ = ππ 2 mod ππ is a π-th primitive root of 1 over β€ππ β
| 8
NWC ARCHITECTURE PRINCIPLE
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
π π₯ seeds βͺ π π twiddles Generation of Ξ¨π = ππ
π π=0 πβ1
. One set every π =
π π₯ cycles.
GEN ITW
ππ π€π ππ
1
GEN PCTW GEN TW
ππ
β1
ππ
π₯
β¦
ππ
β1
(ππ, π€π, Ξ¨π) (ππ, π€π, Ξ©π) (ππ, π€π) (ππ, π€π, Ξ©π
β1)
(ππ, π€π, ππ
β1 β Ξ¨π β1)
VEC PW MM π΅π πΆπ VEC NTT PW MM NTT PW MM π·π twiddle flow data flow Ξ©π β Ξ¨π and Ξ©π
β1 β Ξ¨π β1 (ππ = ππ 2 mod ππ)
One NWC over πΊ βΊ O(π) smaller NWC over the πΊππβs : π·π = NWCπ π΅π, πΆπ
- Architecture principle:
- Required values for NWCπ:
- ππ: a π-th primitive root of -1 over β€ππ
β β ππ = ππ 2 mod ππ is a π-th primitive root of 1 over β€ππ β
| 9
NWC ARCHITECTURE PRINCIPLE
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
π π₯ seeds βͺ π π twiddles Generation of Ξ¨π = ππ
π π=0 πβ1
. One set every π =
π π₯ cycles.
GEN ITW
ππ π€π ππ
1
GEN PCTW GEN TW
ππ
β1
ππ
π₯
β¦
ππ
β1
(ππ, π€π, Ξ¨π) (ππ, π€π, Ξ©π) (ππ, π€π) (ππ, π€π, Ξ©π
β1)
(ππ, π€π, ππ
β1 β Ξ¨π β1)
VEC PW MM π΅π πΆπ VEC NTT PW MM NTT PW MM π·π twiddle flow data flow Ξ©π β Ξ¨π and Ξ©π
β1 β Ξ¨π β1 (ππ = ππ 2 mod ππ)
One NWC over πΊ βΊ O(π) smaller NWC over the πΊππβs : π·π = NWCπ π΅π, πΆπ
- Architecture principle:
- Required values for NWCπ:
- ππ: a π-th primitive root of -1 over β€ππ
β β ππ = ππ 2 mod ππ is a π-th primitive root of 1 over β€ππ β
Computation of Ξ¨π
β1 = ππ βπ π=0 πβ1
Ξ¨π
β1 = Reorder(ππ β Ξ¨π)
(ππ β ππ
π = ππ β πβπ mod ππ)
| 10
NWC ARCHITECTURE PRINCIPLE
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
π π₯ seeds βͺ π π twiddles Generation of Ξ¨π = ππ
π π=0 πβ1
. One set every π =
π π₯ cycles.
Scale Ξ¨π
β1 by ππ β1
(ππ
β1 = πβ1 mod ππ)
GEN ITW
ππ π€π ππ
1
GEN PCTW GEN TW
ππ
β1
ππ
π₯
β¦
ππ
β1
(ππ, π€π, Ξ¨π) (ππ, π€π, Ξ©π) (ππ, π€π) (ππ, π€π, Ξ©π
β1)
(ππ, π€π, ππ
β1 β Ξ¨π β1)
VEC PW MM π΅π πΆπ VEC NTT PW MM NTT PW MM π·π twiddle flow data flow Computation of Ξ¨π
β1 = ππ βπ π=0 πβ1
Ξ¨π
β1 = Reorder(ππ β Ξ¨π)
(ππ β ππ
π = ππ β πβπ mod ππ)
Ξ©π β Ξ¨π and Ξ©π
β1 β Ξ¨π β1 (ππ = ππ 2 mod ππ)
One NWC over πΊ βΊ O(π) smaller NWC over the πΊππβs : π·π = NWCπ π΅π, πΆπ
- Architecture principle:
- Required values for NWCπ:
- ππ: a π-th primitive root of -1 over β€ππ
β β ππ = ππ 2 mod ππ is a π-th primitive root of 1 over β€ππ β
| 11
NWC ARCHITECTURE PRINCIPLE
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
π π₯ seeds βͺ π π twiddles Generation of Ξ¨π = ππ
π π=0 πβ1
. One set every π =
π π₯ cycles.
Scale Ξ¨π
β1 by ππ β1
(ππ
β1 = πβ1 mod ππ)
GEN ITW
ππ π€π ππ
1
GEN PCTW GEN TW
ππ
β1
ππ
π₯
β¦
ππ
β1
(ππ, π€π, Ξ¨π) (ππ, π€π, Ξ©π) (ππ, π€π) (ππ, π€π, Ξ©π
β1)
(ππ, π€π, ππ
β1 β Ξ¨π β1)
VEC PW MM π΅π πΆπ VEC NTT PW MM NTT PW MM π·π twiddle flow data flow Computation of Ξ¨π
β1 = ππ βπ π=0 πβ1
Ξ¨π
β1 = Reorder(ππ β Ξ¨π)
(ππ β ππ
π = ππ β πβπ mod ππ)
Ξ©π β Ξ¨π and Ξ©π
β1 β Ξ¨π β1 (ππ = ππ 2 mod ππ)
One NWC over πΊ βΊ O(π) smaller NWC over the πΊππβs : π·π = NWCπ π΅π, πΆπ
- Architecture principle:
- Required values for NWCπ:
- ππ: a π-th primitive root of -1 over β€ππ
β β ππ = ππ 2 mod ππ is a π-th primitive root of 1 over β€ππ β
| 12
- SPIRAL tool: DFT hardware generator.
- Design space exploration.
AUTOMATIC GENERATION OF MULTI FIELD NTT DESIGN (1)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
| 13
- SPIRAL tool: DFT hardware generator.
- Design space exploration.
- Complex arithmetic β β€ππ modular arithmetic.
AUTOMATIC GENERATION OF MULTI FIELD NTT DESIGN (1)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
ππ β΅ NFLlib prime selection. Barrett modular reduction. (π€π =
22(π‘+2) ππ
mod 2s+2)
| 14
- SPIRAL tool: DFT hardware generator.
- Design space exploration.
- Complex arithmetic β β€ππ modular arithmetic.
- Modifying twiddle factor handling.
AUTOMATIC GENERATION OF MULTI FIELD NTT DESIGN (1)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
ππ β΅ NFLlib prime selection. Barrett modular reduction. (π€π =
22(π‘+2) ππ
mod 2s+2)
Example of NTT data path (π = 2, π = 16, π₯ = 4):
ππ ππ, π€π ππ, π€π ππ ππ ππ ππ ππ, π€π ππ, π€π ππ, π€π ππ ππ ππ ππ
0, ππ 2, ππ 4, ππ 6
ππ
1, ππ 3, ππ 5, ππ 7
ππ
0, ππ 4
ππ
2, ππ 6
ππ
4
Init Perm NTT 2 Perm NTT 2 Perm NTT 2 Perm NTT 2 Perm NTT 2 NTT 2 NTT 2 NTT 2 Stage 0 Stage 1 Stage 2 Stage 3
Characteristics:
- π = logπ π stages.
- π₯ words per cycles.
- One transform every π =
π π₯ cycles.
| 15
- SPIRAL tool: DFT hardware generator.
- Design space exploration.
- Complex arithmetic β β€ππ modular arithmetic.
- Modifying twiddle factor handling.
AUTOMATIC GENERATION OF MULTI FIELD NTT DESIGN (1)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
ππ β΅ NFLlib prime selection. Barrett modular reduction. (π€π =
22(π‘+2) ππ
mod 2s+2)
Example of NTT data path (π = 2, π = 16, π₯ = 4):
Init Perm NTT 2 Perm NTT 2 Perm NTT 2 Perm NTT 2 Perm NTT 2 NTT 2 NTT 2 NTT 2 Stage 0 Stage 1 Stage 2 Stage 3
Characteristics:
- π = logπ π stages.
- π₯ words per cycles.
- One transform every π =
π π₯ cycles.
ππ ππ
2
ππ
4
ππ
6
ππ
1
ππ
3
ππ
5
ππ
7
ππ ππ
4
ππ
2
ππ
6
ππ
4
ππ, π€π
- RNS channel specific
- Reprogrammable
Twiddle Bank (TWB) (1,0) (2,0)(2,1) (3,0)(3,1)
| 16
AUTOMATIC GENERATION OF MULTI FIELD NTT DESIGN (2)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
TWB 1 TWB π»
β¦
NTT DP
next_in next_out data_in data_out
π» = LatπΈπ π + 1
π₯ words π₯ words
read addresses write addresses write enables twiddle flow π’ : way index (in 0, β¦ ,
π₯ 2 β 1)
π : stage index Cyclic access and reprogramming
- f TWB
Init Perm
NTT 2
Perm
NTT 2
Perm
NTT 2
Perm
NTT 2
Perm
NTT 2 NTT 2 NTT 2 NTT 2
| 17
AUTOMATIC GENERATION OF MULTI FIELD NTT DESIGN (2)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
TWB 1 TWB π»
β¦ β¦
CTRL
next_[0: π]
INTERCONNECT DP NTT DP
next_in next_out data_in data_out
π» = LatπΈπ π + 1
π₯ words π₯ words
read addresses write addresses write enables twiddle flow π’ : way index (in 0, β¦ ,
π₯ 2 β 1)
π : stage index Cyclic access and reprogramming
- f TWB
Init Perm
NTT 2
Perm
NTT 2
Perm
NTT 2
Perm
NTT 2
Perm
NTT 2 NTT 2 NTT 2 NTT 2
| 18
AUTOMATIC GENERATION OF MULTI FIELD NTT DESIGN (2)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
TWB 1 TWB π»
β¦ β¦
CTRL
next_[0: π]
INTERCONNECT DP NTT DP
next_in next_out data_in data_out
π» = LatπΈπ π + 1
π₯ words π₯ words
read addresses write addresses write enables twiddle flow π’ : way index (in 0, β¦ ,
π₯ 2 β 1)
π : stage index Cyclic access and reprogramming
- f TWB
Init Perm
NTT 2
Perm
NTT 2
Perm
NTT 2
Perm
NTT 2
Perm
NTT 2 NTT 2 NTT 2 NTT 2
INTERCONNECT PRG GA
| 19
AUTOMATIC GENERATION OF MULTI FIELD NTT DESIGN (2)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
TWB 1 TWB π»
β¦ β¦
CTRL
next_[0: π]
INTERCONNECT DP NTT DP
next_in next_out data_in data_out
π» = LatπΈπ π + 1
π₯ words π₯ words
PRG
next_prg twiddle les
π₯/2 words
read addresses write addresses write enables twiddle flow π’ : way index (in 0, β¦ ,
π₯ 2 β 1)
π : stage index Cyclic access and reprogramming
- f TWB
TWB π reg (π, π’) mem (π, π’)
1
β¦ β¦ β¦
prg_tw_* prg_tw_* tw_(π, π’) tw_(π, π’) we_(π, π’) we_(π, π’) rd_addr_π wr_addr_(π, π’)
- Reprogramming a TWB:
Init Perm
NTT 2
Perm
NTT 2
Perm
NTT 2
Perm
NTT 2
Perm
NTT 2 NTT 2 NTT 2 NTT 2
INTERCONNECT PRG GA
| 20
AUTOMATIC GENERATION OF MULTI FIELD NTT DESIGN (2)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
TWB 1 TWB π»
β¦ β¦
CTRL
next_[0: π]
INTERCONNECT DP NTT DP
next_in next_out data_in data_out
π» = LatπΈπ π + 1
π₯ words π₯ words
PRG
next_prg twiddle les
π₯/2 words
- Example of reprogram counters (π = 2, π = 16, π₯ = 4):
Counter for mem(3,1) : offset 0, step 1, index 1 Counter for mem(2,1) : offset 1, step 2, index 0 ππ
1, ππ 3, ππ 5, ππ 7
ππ
2, ππ 6
twidd iddle les s β π₯/2 words per cycles prg_tw_0 prg_tw_1 ππ
1, ππ 3, ππ 5, ππ 7
ππ
0, ππ 2, ππ 4, ππ 6
Select from the flow β Update we_(π, π’) and wr_addr_(π, π’) read addresses write addresses write enables twiddle flow π’ : way index (in 0, β¦ ,
π₯ 2 β 1)
π : stage index Cyclic access and reprogramming
- f TWB
TWB π reg (π, π’) mem (π, π’)
1
β¦ β¦ β¦
prg_tw_* prg_tw_* tw_(π, π’) tw_(π, π’) we_(π, π’) we_(π, π’) rd_addr_π wr_addr_(π, π’)
- Reprogramming a TWB:
Init Perm
NTT 2
Perm
NTT 2
Perm
NTT 2
Perm
NTT 2
Perm
NTT 2 NTT 2 NTT 2 NTT 2
INTERCONNECT PRG GA
| 21
RPM CHARACTERIZATION PROOF-OF-CONCEPT INTEGRATION (1)
RPM WRAP AXI + FIFOs BCHI
DMA 2 DMA 1 DS DMA 0 PCIe 3 x8 Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
- Preliminary integration:
- Alpha-Data ADM-PCIE 7v3.
- Xilinx Virtex 7: XC7VX690T-2-FFG1157C.
- PCIe Gen3, 8 lanes.
- Vivado 2016.3: placed and routed.
π = 212, log ππ = 30, π₯ = 2
12.5% 8.3% 7.7% 14.1% 14.4% LUT LUTRAM FF BRAM DSP 6.4% 3.1% 4.6% 10.4% 1.3% LUT LUTRAM FF BRAM DSP
π
πππ = 200 MHz
Test PCIe Ok!
| 22
RPM CHARACTERIZATION PROOF-OF-CONCEPT INTEGRATION (1)
RPM WRAP AXI + FIFOs BCHI
DMA 2 DMA 1 DS DMA 0 PCIe 3 x8
- Preliminary integration:
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
π = 212, log ππ = 30, π₯ = 2
- Alpha-Data ADM-PCIE 7v3.
- Xilinx Virtex 7: XC7VX690T-2-FFG1157C.
- PCIe Gen3, 8 lanes.
- Vivado 2016.3: placed and routed.
RPM more constraining resources:
- BRAM slices
- DSP slices
- PCIe bandwidth
How does RPM scale in SHE context?
12.5% 8.3% 7.7% 14.1% 14.4% LUT LUTRAM FF BRAM DSP 6.4% 3.1% 4.6% 10.4% 1.3% LUT LUTRAM FF BRAM DSP
π
πππ = 200 MHz
Test PCIe Ok!
| 23
RPM CHARACTERIZATION PROJECTIONS (1)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
- Impact of the polynomial degree π (π₯ = 2 and log2 ππ = 30 ):
Xilinx Virtex 7: XC7VX690T-2-FFG1157C Slight increase in DSP utilization. Resource limitation (FPGA / PCIe Gen3 x8) Required bandwidth is acheivable BRAM is restrictive for π > 215 ([58-65]% for NTT permutations)
DSP BRAM Required bandwidth (π = 200MHz)
| 24
RPM CHARACTERIZATION PROJECTIONS (2)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
- Impact of the streaming width π₯ (π = 214 and log2 ππ = 30 ):
Xilinx Virtex 7: XC7VX690T-2-FFG1157C Resource limitation (FPGA / PCIe Gen3 x8)
DSP BRAM
Great increase in DSP utilization.
Required bandwidth (π = 200MHz)
Required bandwidth is prohibitive Increase of BRAM utilization.
| 25
RPM CHARACTERIZATION PROJECTIONS (3)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
- Impact of the RNS prime size log2 ππ (π = 214 and π₯ = 2 ):
Xilinx Virtex 7: XC7VX690T-2-FFG1157C Resource limitation (FPGA / PCIe Gen3 x8)
DSP BRAM
Required Bandwidth may become restrictive. Balanced impact on DSP and BRAM utilization.
Required bandwidth (π = 200MHz)
| 26
PERFORMANCE PROJECTIONS: FV-RNS APPLICATION
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
- Performance projection @200MHz:
With respect to timing from [HPS18] (π > 128)
- :
- Raw performances:
~ π
πππ3π₯ log2 ππ
π
πππ
π π₯
Required bandwidth RPM / s
| 27
- Performance projection @200MHz:
With respect to timing from [HPS18] (π > 128)
- :
PERFORMANCE PROJECTIONS: FV-RNS APPLICATION
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
Scalability w.r.t. multiplicative depth:
- Speedup (su) is scalable.
- Realistic bandwidth usage.
- Timing after RPM speedup:
- Basis ext. & Scaling: [77-86] %
- RPMs: [9-16] %
- RPM Vs NTT implementation?
- Raw performances:
~ π
πππ3π₯ log2 ππ
π
πππ
π π₯
Required bandwidth RPM / s
| 28
PERFORMANCE PROJECTIONS: FV-RNS APPLICATION
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
Increasing parallelism:
- Greatly improves speedup.
- Bandwidth and DSPs may be
quickly restrictive.
- Performance projection @200MHz:
With respect to timing from [HPS18] (π > 128)
- :
- Raw performances:
~ π
πππ3π₯ log2 ππ
π
πππ
π π₯
Required bandwidth RPM / s Scalability w.r.t. multiplicative depth:
- Speedup (su) is scalable.
- Realistic bandwidth usage.
- Timing after RPM speedup:
- Basis ext. & Scaling: [77-86] %
- RPMs: [9-16] %
- RPM Vs NTT implementation?
| 29
- Performance projection @200MHz:
With respect to timing from [HPS18] (π > 128)
- :
PERFORMANCE PROJECTIONS: FV-RNS APPLICATION
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
Increasing parallelism:
- Greatly improves speedup.
- Bandwidth and DSPs are
quickly restrictive.
Increasing prime size:
- Slightly improves speedup.
- Balanced cost on DSP and
BRAM usage.
- Bandwidth may be restrictive.
- Raw performances:
~ π
πππ3π₯ log2 ππ
π
πππ
π π₯
Required bandwidth RPM / s Scalability w.r.t. multiplicative depth:
- Speedup (su) is scalable.
- Realistic bandwidth usage.
- Timing after RPM speedup:
- Basis ext. & Scaling: [77-86] %
- RPMs: [9-16] %
- RPM Vs NTT implementation?
| 30
- Hardware implementation for SHE should be flexible:
- Refinement of parameter range still in progress.
- Multiplicative depth has significant impact on both π and log2 π.
CONCLUSION & PERSPECTIVES
- Our response:
- Dataflow RNS-based NWC with on-the-fly generation of twiddles.
- Exploiting DSP knowledge on DFT implementation.
- Minimize the impact of log2 π on hardware design.
- Research perspectives:
- NTT Vs RPM?
- Proper system integration
- Design space exploration with SPIRAL
- Application perspectives:
- Hybrid architecture for SHE acceleration
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
Centre de Saclay Nano-Innov PC 172 - 91191 Gif sur Yvette Cedex
Conference on Cryptographic Hardware and Embedded Systems 2018 Amsterdam, The Netherlands | 09-10-18
Thanks! Questions?
| 32 Mid-term evaluation | JoΓ«l CathΓ©bras
Homomorphic encryption has to be secure β¦ and correct ! INTRODUCTION : HOMOMORPHIC ENCRYPTION
π1 π2 πsum
+π =
πmul π1 π2
Γπ =
π Decrypt ππ π Decrypt πe π Error distribution πππ π Usually πππ π = π(0, πΒ²) ππ π π1 β π2 βΊ π1 β π2 Dec π1 β Dec π2 = Dec(π1 β π2) π1, π2 two ciphertexts such that π1 = Enc π1 and π2 = Enc π2
- Decryption function is an homomorphism:
π β β³ message space ππ β β cleartext space π β π ciphertext space π Encode Encrypt ππ π π Decrypt Decode ππ π
- Semantic security : noise in ciphertexts
| 33
MODULAR ARITHMETIC
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
- Modular Addition:
- Modular Subtraction:
- Modular Multiplication (NFLlib):
| 34
GENERATION OF TWIDDLE FACTORS (1)
π0 = π΅0 β π΅0
Local Storage
π΅0 π΅1
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 4 3 2 3 6 5 8 7 4 5 6 7 22 21 20 19 18 17 28 27 26 25 24 23 32 31 30 29 9 8 9 10 11 10 12 11 13 12 15 14 14 13 16 15 4 3 4 6 5 6 8 7 8 7 8 10 9 16 15 16 10 11 12 14 15 16 11 12 13 14 13 14 15 16 12 13 14 15 16
Latππ
1 2
Inputs
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
- Example of Ξ¨ generation (π = 32, π₯ = 2):
- Problematic of twiddle generation:
- Data dependencies.
- Modular multiplication latency.
- Required throughput π =
π π₯.
- Example of recurrence relation:
- π2π = ππ β ππ and π2π+1 = ππ β ππ+1
- Intermediate storage in π
π 4
- Compute βat the earliestβ
π1 = π΅0 β π΅1
| 35
GENERATION OF TWIDDLE FACTORS (2)
Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
INTERCONNECT OUT INTERCONNECT IN GH 1 GH πΌ MMB CTRL COMPUTE
β¦
ππ π€π ππ
1
ππ
π₯
β¦
BUF πΌ
β¦
BUF 1 CTRL SORT num valid
next_in twiddle les next_out
Sequential access to MMB (π₯ MMs) with cyclic priority order πΌ =
Latπ»πΉπ π
+ 1 (πΌ = 3 when π β« Latππ)
- Data flow twiddle generation:
- Minimize Generation Handler local storage:
ππ’+1 ππ’+2 ππ’+π₯ β¦ bunchπ’ twiddle set β (bunchπ’)π’=0
πβ1
bunchπ’πππ¦π’ = π
π β bunchπ’πππ‘π’
π
π = πππ₯
(π’πππ¦π’ = π + π’πππ‘π’) π is upper bounded
by design parameter
- nly