data flow oriented hardware design of rns based
play

DATA FLOW ORIENTED HARDWARE DESIGN OF RNS-BASED POLYNOMIAL - PowerPoint PPT Presentation

Jol Cathbras Alexandre Carbon Peter Milder Renaud Sirdey Nicolas Ventroux DATA FLOW ORIENTED HARDWARE DESIGN OF RNS-BASED POLYNOMIAL MULTIPLICATION FOR SHE ACCELERATION Conference on Cryptographic Hardware and Embedded Systems 2018 |


  1. Joël Cathébras Alexandre Carbon Peter Milder Renaud Sirdey Nicolas Ventroux DATA FLOW ORIENTED HARDWARE DESIGN OF RNS-BASED POLYNOMIAL MULTIPLICATION FOR SHE ACCELERATION Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18

  2. IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES • Handling polynomial of 𝑺 = ℤ 𝑌 /(𝐺(𝑌)) and 𝑺 𝑟 = 𝑺/𝑟𝑺 : • Modulus 𝑟 ~ several hundred of bits Security • deg( 𝐺 ) ~ several thousand Impact Multiplicative depth Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 2

  3. IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES • Handling polynomial of 𝑺 = ℤ 𝑌 /(𝐺(𝑌)) and 𝑺 𝑟 = 𝑺/𝑟𝑺 : • Modulus 𝑟 ~ several hundred of bits Security • deg( 𝐺 ) ~ several thousand Impact Multiplicative depth • Residue Number System: 𝑏 𝑟 1 𝑐 𝑟 1 𝑏 𝑟 𝑗 𝑐 𝑟 𝑗 𝑏 𝑟 𝑙 𝑐 𝑟 𝑙 𝑏 𝑐 𝑙 … … × ⇔ × × × 𝑟 = 𝑟 𝑗 𝑟 𝑟 1 𝑟 𝑗 𝑟 𝑙 𝑗=1 𝑠 𝑠 𝑠 𝑠 𝑟 1 𝑟 𝑗 𝑟 𝑙 Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 3

  4. IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES • Handling polynomial of 𝑺 = ℤ 𝑌 /(𝐺(𝑌)) and 𝑺 𝑟 = 𝑺/𝑟𝑺 : • Modulus 𝑟 ~ several hundred of bits Security • deg( 𝐺 ) ~ several thousand Impact Multiplicative depth • Residue Number System: 𝑏 𝑟 1 𝑐 𝑟 1 𝑏 𝑟 𝑗 𝑐 𝑟 𝑗 𝑏 𝑟 𝑙 𝑐 𝑟 𝑙 𝑏 𝑐 𝑙 … … × ⇔ × × × 𝑟 = 𝑟 𝑗 𝑟 𝑟 1 𝑟 𝑗 𝑟 𝑙 𝑗=1 𝑠 𝑠 𝑠 𝑠 𝑟 1 𝑟 𝑗 𝑟 𝑙 • Bajard et al. in 2016, further simplified by Halevi et al. in 2018 : • RNS compatible FV. Dec 𝑆𝑂𝑇 and FV. Mult&Relin 𝑆𝑂𝑇 . • New 𝒔𝒎𝒍 𝑆𝑂𝑇 : pair of 𝑙 × 𝑙 – matrices with elements in 𝑆 𝑟 𝑗 for 𝑗 in 1, … , 𝑙 . • Performance bottleneck: Residue Polynomial Multiplication ( 𝑆 𝑟 𝑗 ’s products) Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 4

  5. IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES • Handling polynomial of 𝑺 = ℤ 𝑌 /(𝐺(𝑌)) and 𝑺 𝑟 = 𝑺/𝑟𝑺 : • Modulus 𝑟 ~ several hundred of bits Security • deg( 𝐺 ) ~ several thousand Impact Multiplicative depth • Residue Number System: 𝑏 𝑟 1 𝑐 𝑟 1 𝑏 𝑟 𝑗 𝑐 𝑟 𝑗 𝑏 𝑟 𝑙 𝑐 𝑟 𝑙 𝑏 𝑐 𝑙 … … × ⇔ × × × 𝑟 = 𝑟 𝑗 𝑟 𝑟 1 𝑟 𝑗 𝑟 𝑙 𝑗=1 𝑠 𝑠 𝑠 𝑠 𝑟 1 𝑟 𝑗 𝑟 𝑙 • Bajard et al. in 2016, further simplified by Halevi et al. in 2018 : • RNS compatible FV. Dec 𝑆𝑂𝑇 and FV. Mult&Relin 𝑆𝑂𝑇 . • New 𝒔𝒎𝒍 𝑆𝑂𝑇 : pair of 𝑙 × 𝑙 – matrices with elements in 𝑆 𝑟 𝑗 for 𝑗 in 1, … , 𝑙 . • Performance bottleneck: Residue Polynomial Multiplication ( 𝑆 𝑟 𝑗 ’s products) • Negative Wrapped Convolution over 𝑆 𝑟 𝑗 = ℤ 𝑟 𝑗 𝑌 /(𝐺(𝑌)) : • No polynomial modular reduction. Restrict the choice of 𝐺 𝑌 = 𝑌 𝑜 + 1 with 𝑜 a power of 2. • • Restrict the choice of 𝑟 𝑗 : 𝑟 𝑗 ≡ 1 mod 2𝑜 . 𝑘 ) 0≤𝑘<2𝑜 , where 𝜔 𝑗 a 𝑜 -th primitive root of - 1 in ℤ 𝑟 𝑗 • ∗ . 2𝑜𝑙 precomputed values: (𝜔 𝑗 Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 5

  6. RELATED WORKS (HARDWARE ACCELERATION) • Migliore et al. 2018: Karatsuba rather than NWC (no RNS) • Finer choice of 𝐺(𝑌) allowing batching of binary messages. • Asymptotic complexity in 𝑃(𝑜 1,585 ) Vs 𝑃(𝑜 log 𝑜) : turning point ( 𝑜 = 6144 , log 2 𝑟 = 512 ). Not sufficient to target large multiplicative depth. • Öztürk et al. 2015: RNS and NTT approach for LTV scheme (no NWC) • Memory-access iterative NTT. • External pre-computation of NTT twiddle factors. Use communication bandwidth for non-payload data. • Cousins et al. 2017: RNS and NTT approach for LTV scheme • Dataflow oriented pipelined NTT. • Local storage of all twiddle factors at compile time. Storage cost in O( 𝑙𝑜 ), dependent of RNS basis size. • Sinha Roy et al. 2015: RNS and NTT (no NWC) approach for RLWE-based scheme • Memory-access iterative NTT. • Local storage of a subset of the twiddle factors, and computation on-the-fly of the others. Better storage in O( 𝑙 log 𝑜 ), but still dependent of RNS basis size. Dataflow oriented NWC with on-the-fly computation of twiddle factors Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 6

  7. NWC ARCHITECTURE PRINCIPLE • Architecture principle: One NWC over 𝑺 ⟺ O(𝑙) smaller NWC over the 𝑺 𝑟 𝑗 ’s : 𝐷 𝑗 = NWC 𝑗 𝐵 𝑗 , 𝐶 𝑗 • Required values for NWC 𝑗 : ∗ ⇒ 𝜕 𝑗 = 𝜔 𝑗 • 2 mod 𝑟 𝑗 is a 𝑜 -th primitive root of 1 over ℤ 𝑟 𝑗 ∗ 𝜔 𝑗 : a 𝑜 -th primitive root of - 1 over ℤ 𝑟 𝑗 −1 𝑜 𝑗 −1 𝑜 𝑗 𝑟 𝑗 𝑤 𝑗 GEN GEN GEN 1 𝜔 𝑗 ITW PCTW TW … 𝑥 𝜔 𝑗 −1 ⊂ Ψ 𝑗 −1 ( 𝜕 𝑗 = 𝜔 𝑗 2 mod 𝑟 𝑗 ) Ω 𝑗 ⊂ Ψ 𝑗 and Ω 𝑗 twiddle flow −1 ⋅ Ψ 𝑗 data flow −1 ) −1 ) (𝑟 𝑗 , 𝑤 𝑗 , Ψ 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 (𝑟 𝑗 , 𝑤 𝑗 , 𝑜 𝑗 (𝑟 𝑗 , 𝑤 𝑗 ) VEC 𝐵 𝑗 PW PW VEC NTT PW 𝐷 𝑗 MM MM NTT 𝐶 𝑗 MM Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 7

  8. NWC ARCHITECTURE PRINCIPLE • Architecture principle: One NWC over 𝑺 ⟺ O(𝑙) smaller NWC over the 𝑺 𝑟 𝑗 ’s : 𝐷 𝑗 = NWC 𝑗 𝐵 𝑗 , 𝐶 𝑗 • Required values for NWC 𝑗 : ∗ ⇒ 𝜕 𝑗 = 𝜔 𝑗 • 2 mod 𝑟 𝑗 is a 𝑜 -th primitive root of 1 over ℤ 𝑟 𝑗 ∗ 𝜔 𝑗 : a 𝑜 -th primitive root of - 1 over ℤ 𝑟 𝑗 𝑃 𝑥 seeds ≪ 𝑃 𝑜 twiddles 𝑜−1 𝑘 Generation of Ψ 𝑗 = 𝜔 𝑗 . 𝑘=0 𝑜 One set every 𝑈 = 𝑥 cycles. −1 𝑜 𝑗 −1 𝑜 𝑗 𝑟 𝑗 𝑤 𝑗 GEN GEN GEN 1 𝜔 𝑗 ITW PCTW TW … 𝑥 𝜔 𝑗 −1 ⊂ Ψ 𝑗 −1 ( 𝜕 𝑗 = 𝜔 𝑗 2 mod 𝑟 𝑗 ) Ω 𝑗 ⊂ Ψ 𝑗 and Ω 𝑗 twiddle flow −1 ⋅ Ψ 𝑗 data flow −1 ) −1 ) (𝑟 𝑗 , 𝑤 𝑗 , Ψ 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 (𝑟 𝑗 , 𝑤 𝑗 , 𝑜 𝑗 (𝑟 𝑗 , 𝑤 𝑗 ) VEC 𝐵 𝑗 PW PW VEC NTT PW 𝐷 𝑗 MM MM NTT 𝐶 𝑗 MM Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 8

  9. NWC ARCHITECTURE PRINCIPLE • Architecture principle: One NWC over 𝑺 ⟺ O(𝑙) smaller NWC over the 𝑺 𝑟 𝑗 ’s : 𝐷 𝑗 = NWC 𝑗 𝐵 𝑗 , 𝐶 𝑗 • Required values for NWC 𝑗 : ∗ ⇒ 𝜕 𝑗 = 𝜔 𝑗 • 2 mod 𝑟 𝑗 is a 𝑜 -th primitive root of 1 over ℤ 𝑟 𝑗 ∗ 𝜔 𝑗 : a 𝑜 -th primitive root of - 1 over ℤ 𝑟 𝑗 𝑜−1 𝑃 𝑥 seeds ≪ 𝑃 𝑜 twiddles −1 = 𝜔 𝑗 −𝑘 Computation of Ψ 𝑗 𝑜−1 𝑘=0 𝑘 Generation of Ψ 𝑗 = 𝜔 𝑗 −1 = Reorder(𝑟 𝑗 − Ψ 𝑗 ) . Ψ 𝑗 𝑘=0 𝑘 = 𝜔 𝑗 − 𝑜−𝑘 mod 𝑟 𝑗 ) 𝑜 One set every 𝑈 = 𝑥 cycles. ( 𝑟 𝑗 − 𝜔 𝑗 −1 𝑜 𝑗 −1 𝑜 𝑗 𝑟 𝑗 𝑤 𝑗 GEN GEN GEN 1 𝜔 𝑗 ITW PCTW TW … 𝑥 𝜔 𝑗 −1 ⊂ Ψ 𝑗 −1 ( 𝜕 𝑗 = 𝜔 𝑗 2 mod 𝑟 𝑗 ) Ω 𝑗 ⊂ Ψ 𝑗 and Ω 𝑗 twiddle flow −1 ⋅ Ψ 𝑗 data flow −1 ) −1 ) (𝑟 𝑗 , 𝑤 𝑗 , Ψ 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 (𝑟 𝑗 , 𝑤 𝑗 , 𝑜 𝑗 (𝑟 𝑗 , 𝑤 𝑗 ) VEC 𝐵 𝑗 PW PW VEC NTT PW 𝐷 𝑗 MM MM NTT 𝐶 𝑗 MM Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend