sidh on arm
play

SIDH on ARM: Faster Modular Multiplications for Faster Post-Quantum - PowerPoint PPT Presentation

SIDH on ARM: Faster Modular Multiplications for Faster Post-Quantum Supersingular Isogeny Key Exchange. Hwajeong Seo (Hansung University), Zhe Liu (Nanjing University of Aeronautics and Astronautics), Patrick Longa (Microsoft Research), Zhi


  1. SIDH on ARM: Faster Modular Multiplications for Faster Post-Quantum Supersingular Isogeny Key Exchange. Hwajeong Seo (Hansung University), Zhe Liu (Nanjing University of Aeronautics and Astronautics), Patrick Longa (Microsoft Research), Zhi Hu (Central South University)

  2. Outline • Short Overview • Post-quantum supersingular isogeny Diffie-Hellman (SIDH) key exchange • Supersingular isogeny key encapsulation (SIKE) protocol • Our implementation • Optimized implementations for 32-bit ARMv7 • Optimized implementations for 64-bit ARMv8 • Implementation results • Conclusion 1

  3. Post-Quantum Cryptography (Isogeny) • RSA and ECC: integer factorization and ECDLP • Hard problems can be solved by Shor’s algorithm in a quantum computer. • Quantum-Resistant Cryptography • NIST launches the post-quantum cryptography standardization project. “The goal of this process is to select a number of acceptable candidate cryptosystems for standardization.” • Code, Lattice, Hash, Multivariate, Isogeny … • Isogeny-based cryptography: (conjectured to be) hard for quantum computers • Supersingular isogeny Diffie-Hellman (SIDH) key exchange was proposed by Jao and De Feo in 2011 . • Among all the submitted post-quantum candidates, SIDH uses the smallest keys 2

  4. Mobile Platform (32-bit/64-bit ARM) Platform ARM Cortex-A15 ARM Cortex-A53 ARM Cortex-A72 Architecture 32-bit ARMv7 64-bit ARMv8 64-bit ARMv8 Frequency 2.0 GHz 1.512 GHz 1.992 GHz No. registers 15 31 31 No. registers (NEON) 16 32 32 Application Wearable devices Smartphones 3

  5. Previous Works • Hardware Implementation • FPGA: • Koziel et al. [INDOCRYPT’16, TCAS’17] • Software Implementation • 64-bit Intel processor: • Costello et al. [CRYPTO’16, EUROCRYPT’17], Faz- Hernández et al. [ToC’17], Zanon et al. [PQCrypto’18] • 64-bit ARM processor: • Jalali et al. [SAC’17]  this work [CHES’18] • 32-bit ARM processor: • Koziel et al. [CANS’16]  this work [CHES’18] 4

  6. Motivation Type Algorithm Advantage Disadvantage Code McEliece  Fast computation  Long key size Hash XMSS, SPHINCS  Security proof  Long signature size  Difficulty of Lattice (ring)-LWE  Fast computation parameter selection  Short signature size Multivariate UOV, Rainbow  Long key size  Fast computation Isogeny SIDH, SIKE  Short key size  Slow computation • All PQC candidates have their own pros and cons . • Disadvantage of SIDH/SIKE is slow computation. • In this talk, we address this problem on 32-bit and 64-bit ARM processors. 5

  7. Contribution • Unified ARM/NEON multiplication: instruction level parallelism • New Montgomery reduction: “ UMAAL ” + “ hybrid-scanning ” • Efficient Implementation of SIDH: • p503 ( 88 msec ) / p751 ( 292 msec ) on 32-bit ARMv7-A @2.0GHz • p503 ( 45 msec ) on 64-bit ARMv8-A @1.992GHz 6

  8. Outline • Short Overview • Post-quantum supersingular isogeny Diffie-Hellman (SIDH) key exchange • Supersingular isogeny key encapsulation (SIKE) protocol • Our implementation • Optimized implementations for 32-bit ARMv7 • Optimized implementations for 64-bit ARMv8 • Implementation results • Conclusion 7

  9. Post-quantum key exchange algorithm • Supersingular Isogeny Diffie-Hellman (SIDH) • Shared key generation between two parties over an insecure communication channel. • SIDH works with the set of supersingular elliptic curves over 𝔾 𝑞 2 and their isogenies. 𝐹 𝐵𝐶 = Φ ′ 𝐵 + 𝑡 𝐵 𝑅 𝐵 , 𝑄 𝐶 + 𝑡 𝐶 𝑅 𝐶 ≅ 𝐹 𝐶𝐵 = Φ ′ 𝐶 Φ 𝐵 𝐹 0 ≅ 𝐹 0 / 𝑄 𝐵 Φ 𝐶 𝐹 0 8

  10. Supersingular Isogeny Key Encapsulation (SIKE) • SIDH is not secure when keys are reused (Galbraith-Petit-Shani-Ti 2016) • SIKE: (Costello – De Feo – Jao – Longa – Naehrig – Renes 2017) • IND-CCA secure key encapsulation based on SIDH. • Uses a variant of Hofheinz – Hövelmanns – Kiltz (HHK) transform: IND-CPA PKE → IND-CCA KEM • For a starting curve 𝐹 0 / 𝔾 𝑞 2 : 𝑧 2 = 𝑦 3 + 𝑦 , where 𝑞 = 2 𝑓𝐵 3 𝑓𝐶 − 1 Scheme classicalsec. quantumsec. Securitylevel 𝑓 𝐵 , 𝑓 𝐶 (SIKEp + log 2 𝑞 ) SIKEp503 (250,159) 126 bits 84 bits AES-128 (NIST level 1) SIKEp751 (372,239) 188 bits 125 bits AES-192 (NIST level 3) 9 SIKEp964 (486,301) 241 bits 161 bits AES-256 (NIST level 5)

  11. Outline • Short Overview • Post-quantum supersingular isogeny Diffie-Hellman (SIDH) key exchange • Supersingular isogeny key encapsulation (SIKE) protocol • Our implementation • Optimized implementations for 32-bit ARMv7 • Optimized implementations for 64-bit ARMv8 • Implementation results • Conclusion 10

  12. Multiplication Instruction (32-bit ARMv7) 32 bits 32 bits ARM NEON R0 V0 a0 a3 a2 a1 a0 × × × R1 V1 b0 b3 b2 b1 b0 + V2 R2 a1b0 a0b0 c0 64 bits + R3 d0 UMULL R3, R2 a0b0 + c0 + d0 64 bits 11 UMAAL

  13. Previous Multiprecision Multiplication (32-bit ARMv7) C[14] C[7] C[14] C[7] C[0] C[0] A[7]B[0] A[7]B[0] 1 4 A[0]B[0] A[7]B[7] A[7]B[7] A[0]B[0] 3 2 A[0]B[7] A[0]B[7] Consecutive Operand Caching ( COC ) for ARM Cascade Operand Scanning ( COS ) for NEON Bitlength Method Instruction Timings [ 𝒅𝒅 ] COC ARM (UMAAL) 158 BEST 256-bit COS NEON (UMULL) 188 COC ARM (UMAAL) 596 BEST 512-bit COS NEON (UMULL) 632 12 Target processor: 32-bit ARM Cortex-A15

  14. Proposed Multiprecision Multiplication (32-bit ARMv7) • Instruction level parallelism • ARM and NEON instructions are issued together • Karatsuba multiplication: m -bit multiplication ( 𝐵 𝐼 ∙ 𝐶 𝐼 ∙ 2 𝑛 + 𝐵 𝐼 ∙ 𝐶 𝐼 + 𝐵 𝑀 ∙ 𝐶 𝑀 − 𝐵 𝐼 − 𝐵 𝑀 ∙ 𝐶 𝐼 − 𝐶 𝑀 ∙ 2 𝑛/2 + 𝐵 𝑀 ∙ 𝐶 𝑀 ) • Two 𝒏/𝟑 -bit multiplication in ARM • One 𝒏/𝟑 -bit multiplication in NEON 13

  15. ARM Operand Operand subtraction 1 passing NEON 15

  16. ARM C[0] C[14] C[7] Operand Operand subtraction 1 passing NEON 3 3 C[6] 2 C[10] C[4] C[0] C[14] 4 4 2 A[7]B[7] A[0]B[0] C[8] 16

  17. ARM C[0] C[14] C[7] Operand Operand subtraction 1 passing NEON 3 3 C[6] Result 2 C[10] C[4] passing C[0] C[14] 4 4 2 A[7]B[7] A[0]B[0] C[8] Result accumulation 5 17

  18. Proposed Multiprecision Multiplication (32-bit ARMv7) Bitlength Method Instruction Timings [ 𝒅𝒅 ] COC ARM 596 GMP-6.1.2 ARM 1,138 512-bit 1.26x COS NEON 632 This work ARM/NEON 470 GMP-6.1.2 ARM 2,408 2.64x 768-bit This work ARM/NEON 912 Target processor: 32-bit ARM Cortex-A15 18

  19. Proposed Modular Reduction (32-bit ARMv7) • m -bit modular reduction using Montgomery reduction • Two 𝒏/𝟑 -bit multiplication in ARM • Two 𝒏/𝟑 -bit multiplication in NEON 19

  20. Operand ARM passing NEON 20

  21. T[14] T[7] T[0] Operand Q[7]M[0] ARM passing NEON 3 T[6] 2 1 T[0] 1 4 T[10] T[4] T[10] Operand Q[7]M[7] Q[0]M[0] 4 3 passing 2 T[4] T[14] T[8] 21 Q[0]M[7]

  22. T[14] T[7] T[0] Operand Q[7]M[0] ARM passing NEON 3 T[6] 2 1 T[0] 1 4 T[10] T[4] T[10] Q[7]M[7] Operand Q[0]M[0] 4 3 passing 2 T[4] T[14] T[8] Result Result 22 5 passing Accumulation Q[0]M[7]

  23. Modular Reduction for SIDH • Efficient Montgomery reduction: Montgomery-friendly modulus • The lower word of the modulus is 𝟑 𝒙 − 𝟐  Montgomery constant is equal to 1. • Multiplications with an all-ones word ( 𝑈 × 0𝑦𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺  𝑈 × 2 32 − 𝑈 ): shifts and subtractions • (e.g., 𝑞503 = 2 250 3 159 − 1 ) 0x4066F541811E1E6045C6BDDA77A4D01B9BF6C87B7E7DAF13085BDA2211E7A0AB FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF (in hexadecimal) • A modulus M+1 turns the lower part of the modulus into all-zero words • (e.g., 𝑞503 + 1 = 2 250 3 159 ) 0x4066F541811E1E6045C6BDDA77A4D01B9BF6C87B7E7DAF13085BDA2211E7A0AC 00000000000000000000000000000000000000000000000000000000000000 (in hexadecimal) 23

  24. Proposed Modular Reduction for SIDH (32-bit ARMv7) • m -bit modular reduction using Montgomery reduction • One 𝒏/𝟑 -bit multiplication in ARM • One 𝒏/𝟑 -bit multiplication in NEON 24

  25. Operand ARM passing NEON 25

  26. T[10] T[3] T[14] Operand ARM passing Q[7]M[3] NEON T[10] 2 1 2 T[3] Q[7]M[7] T[14] T[7] 1 Q[0]M[3] 26 Q[0]M[7]

  27. T[10] T[3] T[14] Operand ARM passing Q[7]M[3] NEON T[10] 2 1 2 T[3] Q[7]M[7] T[14] T[7] 1 Result Q[0]M[3] 3 Accumulation 27 Q[0]M[7]

  28. Outline • Short Overview • Post-quantum supersingular isogeny Diffie-Hellman (SIDH) key exchange • Supersingular isogeny key encapsulation (SIKE) protocol • Our implementation • Optimized implementations for 32-bit ARMv7 • Optimized implementations for 64-bit ARMv8 • Implementation results • Conclusion 28

  29. Multiplication Instruction (64-bit ARMv8) X0 X0 a0 a0 × × X1 X1 b0 b0 a1b0 a0b0 a0b0 a0b0 64 bits 64 bits X3 X2 X3 X2 MUL UMULH 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend