Single Base Modular Multiplication for Efficient Hardware RNS - PowerPoint PPT Presentation

Single Base Modular Multiplication for Efficient Hardware RNS Implementations of ECC Karim Bigou and Arnaud Tisserand CNRS, IRISA, INRIA Centre Rennes - Bretagne Atlantique and Univ. Rennes 1 CHES 2015, Sept. 13 – 16 Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 1 / 21

Context Design efficient hardware implementations of asymmetric cryptosystems using fast arithmetic techniques: RSA [RSA78] Discrete Logarithm Cryptosystems: Diffie-Hellman [DH76] (DH), ElGamal [Elg85] Elliptic Curve Cryptography (ECC) [Mil85] [Kob87] The residue number system (RNS) is a representation which enables fast computations for cryptosystems requiring large integers or F P elements Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 2 / 21

Residue Number System (RNS) [SV55] [Gar59] X a large integer of ℓ bits ( ℓ ≈ 160–4096) is represented by: − → X = ( x 1 , . . . , x n ) = ( X mod m 1 , . . . , X mod m n ) RNS base B = ( m 1 , . . . , m n ), n pairwise co-primes of w bits, n × w � ℓ channel 1 channel 2 channel n x 1 x 2 x n X . . . y 1 y 2 y n Y . . . w w w w w w ±× ±× ±× . . . mod m 1 mod m 2 mod m n w w w z 1 z 2 z n Z . . . RNS relies on the Chinese remainder theorem (CRT) EMM = w -bit elementary modular multiplication in one channel Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 3 / 21

RNS Properties Pros: Carry free between channels each channel is independant Fast parallel + , − , × and some exact divisions computations over all channels can be performed in parallel an RNS multiplication requires n EMM s Flexibility for hardware implementations the number of hardware channels and logical channels can be different various area/time trade-offs and multi-size support Non-positional number system randomization of internal computations (SCA countermeasures) Cons: Non-positional number system comparison, modular reduction and division are much harder modular reduction : RNS version of Montgomery reduction MR Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 4 / 21

Montgomery and Pseudo-Mersenne Reductions in RNS Classical binary positional representation: in practice, standards use special primes to perform faster reduction: the pseudo-Mersenne primes P = 2 ℓ − c where c < 2 ℓ/ 2 has a small Hamming weight: fast reduction using 2 ℓ ≡ c mod P In RNS, no equivalent to pseudo-Mersenne number in state-of-the-art Approaches in RNS literature to speed up modular arithmetic: reduce the number of MR ( e.g. [BDE13, BT13]): for instance computing pattern of the form AB + CD mod P improves MR in specific context ( e.g. [Gui10, GLP + 12, BT14]): for example RSA or ECC choose carefully some parameters of the representation to reduce the internal computation cost of MR s [BKP09, BM14, YFCV14] Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 5 / 21

RNS Montgomery Reduction ( MR ) [PP95] Input : − → X , − → X ′ with X < α P 2 < PM and 2 P < M ′ Output : ( − → ω , − → ω ′ ) with ω ≡ X × M − 1 mod P B ′ B 0 � ω < 2 P × − → − − → X × ( −− → • P − 1 ) Q ← (in base B ) BE • → − − BE ( − → Q ′ ← Q , B , B ′ ) ( n × n EMM s) × − → − − → X ′ + − → Q ′ × − → S ′ ← P ′ (in base B ′ ) + − − → S ′ × − → × − → ω ′ ← M − 1 (in base B ′ ) • BE → − − BE ( − → ω ′ , B ′ , B ) • ω ← ( n × n EMM s) where M = � n i =1 m i BE : base extension ( i.e. conversion) MR cost: 2 n 2 + O ( n ) EMM s Note: MM = 1 RNS mult. + MR Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 6 / 21

Size of Elements Using MM B ′ B � �� X × × × × × × × × 2 n EMM s Y XY 2 n 2 + O ( n ) EMM s RNS Montgomery Reduction MR Z (= | XY | P ) Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 7 / 21

A New RNS Modular Multiplication Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 8 / 21

First Step: Changing the Representation We split field elements in 2 parts of the same size B a B b How? � �� B = B a | b using half-bases : n × w = ℓ � �� n/ 2 × w i =1 m a , i , we split − X into ( − → K x , − → → Using M a = � n a R x ) such that: − → − → − − → − → X = K x M a + R x K x and R x are ℓ/ 2 bits long F P elements are now represented by ( K , R ) : we add a little positional information We call Split the function to get ( − K x , − → R x ) from − → → X Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 9 / 21

Decomposition with Split Algorithm Input : − − → X a | b − − − − − − → � M − 1 � Precomp. : a b Output : − ( K x ) a | b , − − − − − → ( R x ) a | b with − − − − − → X a | b = − − → ( K x ) a | b × − − − − − → ( M a ) a | b + − − − − − → − − − − → ( R x ) a | b − − − → � − − − → � ( n 2 × n ( R x ) b ← BE ( R x ) a , B a , B b 2 ) EMM s − − − − − − → − − − − → � − X b − − → − − → � M − 1 � � ( K x ) b ← ( R x ) b × a b if − ( K x ) b = − − − − → → − 1 then ( K x ) b ← − − − − − → → 0 /*with Kawamura BE correction [KKSS00] */ ( R x ) b ← − − − − → ( R x ) b − − − − → − − − → ( M a ) b − − − → � − − − − → � ( n 2 × n ( K x ) a ← BE ( K x ) b , B b , B a 2 ) EMM s return − ( K x ) a | b , − − − − − → − − − − → ( R x ) a | b Note: the cost of Split is dominated by the 2 BE s on half bases : n 2 2 + O ( n ) when n a = n b = n / 2 Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 10 / 21

A New Choice for P Second step: we propose the form P = M 2 a − c with P prime and c small Some remarks P = M 2 a − 1 is never prime in practice, we choose P = M 2 a − 2 with M a odd i.e. M 2 a ≡ 2 mod P One can find a lot of P for a given size (probabilistic primality tests using isprime from Maple, for instance generating 10 000 P of 512 bits in 15 s) P is an equivalent for RNS to pseudo-Mersenne numbers for the radix 2 standard representation (for instance P = 2 521 − 1) Our Single Base Modular Multiplication SBMM combines: P = M 2 a − 2 ( K x , R x ) representation Split function Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 11 / 21

SBMM Algorithm Parameters : B a such that M 2 a = P + 2 and B b such that M b > 6 M a Input : − ( K x ) a | b , − − − − − → ( R x ) a | b , − − − − − → ( K y ) a | b , − − − − − → − − − − → ( R y ) a | b with K x , R x , K y , R y < M a Output : − ( K z ) a | b , − − − − − → − − − → ( R z ) a | b with K z < 5 M a and R z < 6 M a U a | b ← − − − → − − − − − − − − − − → 2 K x K y + R x R y − V a | b ← − − → − − − − − − − − − → K x R y + R x K y � − ( K u ) a | b , − − − − − → − − − − → ← Split ( − − → � ( R u ) a | b U a | b ) } in parallel � − ( K v ) a | b , − − − − − → − − − − → ← Split ( − − → � ( R v ) a | b V a | b ) � − ( K z ) a | b , − − − − − → − − − → � − ( K u + R v ) a | b , − − − − − − − − − → − − − − − − − − − − → � � ( R z ) a | b ← (2 · K v + R u ) a | b � − ( K z ) a | b , − − − − − → − − − → � ( R z ) a | b return Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 12 / 21

SBMM Principle 1/2 B a B b B a B b � �� X : K x R x × × × × × × × × 2 n EMM s Y : K y R y K x K y R x R y X : K x R x × × × × × × × × 2 n EMM s Y : R y K y K x R y R x K y XY ≡ 2 K x K y + ( K x R y + K y R x ) M a + R x R y ≡ U + V M a mod P Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 13 / 21

SBMM Principle 2/2 XY ≡ U + V M a ≡ ( K u + R v ) M a + ( R u + 2 K v ) ≡ K z M a + R z mod P 2 K x K y R x K y + + + + + + + + R x R y R x K y U V � 2 + O ( n ) � � 2 2 � n Split Split 2 = n 2 + O ( n ) EMM s K u + R v = K z R u +2 K v = R z Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 14 / 21

SBMM Architecture with n / 2 Rower s channel n channel n 2 + 1 channel 1 channel 2 2 CTRL x n y n x n 2 +1 y n x 1 y 1 x 2 y 2 2 +1 2 2 w w w 6 . . . w w w w w w w 6 6 1 rower rower n cox rower 1 rower 2 . . . 2 n 2 + 1 6 w w w 6 w Output Karim Bigou and Arnaud Tisserand SBMM Modular Multiplication CHES 2015, Sept. 13 – 16 15 / 21

Single Base Modular Multiplication for Efficient Hardware RNS - PowerPoint PPT Presentation

Single Base Modular Multiplication for Efficient Hardware RNS Implementations of ECC Karim Bigou and Arnaud Tisserand CNRS, IRISA, INRIA Centre Rennes - Bretagne Atlantique and Univ. Rennes 1 CHES 2015, Sept. 13 16 Karim Bigou and Arnaud

RNS Modular Multiplication through Reduced Base Extensions Karim Bigou and Arnaud Tisserand

Efficient multiplication 2 Matrix multiplication If you have square matrices A and B, then C =

High performance and efficient single-chip small cell base station SoC Kin-Yip Liu Cavium, Inc.

Welcome to Base 10 for Parents Multiplication Write the times down Write the times across Fill

Efficient and Secure (H)ECC Scalar Multiplication with Twin Multipliers T. Lange* * and P. K.

Efficient Modular SAT Solving for IC3 Sam Bayless , Celina G. Val , Thomas Ball ,

Factoring using 2n+2 qubits with Toffoli based modular multiplication aner 1 , 2 Martin Roetteler

Matrix multiplication over word-size modular rings using Binis approximate formula Brice

1 1000 0000 0100 0000 Fast multiplication Division hardware 0010 0000 01000 11000 0001

DATA FLOW ORIENTED HARDWARE DESIGN OF RNS-BASED POLYNOMIAL MULTIPLICATION FOR SHE ACCELERATION

Modular Neural Networks CPSC 533 Franco Lee Ian Ko Modular Neural Networks What is it ? Dif

High-performance Elliptic Curve Cryptography by Using the CIOS Method for Modular Multiplication A

Efficient and secure modular operations using the Polynomial Modular Number System (Part 1)

McCaskill: Efficient Base Pair Probabilities Idea: Compute p kl := Pr[( k , l ) | S ] recursively

IRNAS OPEN HARDWARE INSTITUTE Modular open hardware design for electronics and 3D printing

HW/SW Codesign w/ FPGAs The Nature of HW/SW III ECE 522 The Dualism of Hardware and Software The

SpArch: Efficient Architecture for Sparse Matrix Multiplication Zhekai Zhang* 1 , Hanrui Wang * 1

Efficient Leak Resistant Modular Exponentiation in RNS Andrea Lesavourey (1) , Christophe Negre

Modular Assembly: An Efficient Approach for Creation and Maintenance of Persistent Space Assets

Modular Hardware Architecture for Somewhat Homomorphic Function Evaluation CHES 2015 Sujoy Sinha

9. Hardware-Aware Numerics Approaching supercomputing ... 9. Hardware-Aware Numerics Numerical

Integrated Modular Integrated Modular Federal Aviation Administration Avionics Approval

Hardware Design and Analysis of Efficient Loop Coarsening and Border Handling for Image

Peregreen modular database for efficient storage of historical time series in cloud

Single Base Modular Multiplication for Efficient Hardware RNS - PowerPoint PPT Presentation

Single Base Modular Multiplication for Efficient Hardware RNS Implementations of ECC Karim Bigou and Arnaud Tisserand CNRS, IRISA, INRIA Centre Rennes - Bretagne Atlantique and Univ. Rennes 1 CHES 2015, Sept. 13 16 Karim Bigou and Arnaud

RNS Modular Multiplication through Reduced Base Extensions Karim Bigou and Arnaud Tisserand

Efficient multiplication 2 Matrix multiplication If you have square matrices A and B, then C =

High performance and efficient single-chip small cell base station SoC Kin-Yip Liu Cavium, Inc.

Welcome to Base 10 for Parents Multiplication Write the times down Write the times across Fill

Efficient and Secure (H)ECC Scalar Multiplication with Twin Multipliers T. Lange* * and P. K.

Efficient Modular SAT Solving for IC3 Sam Bayless , Celina G. Val , Thomas Ball ,

Factoring using 2n+2 qubits with Toffoli based modular multiplication aner 1 , 2 Martin Roetteler

Matrix multiplication over word-size modular rings using Binis approximate formula Brice

1 1000 0000 0100 0000 Fast multiplication Division hardware 0010 0000 0*1000 1*1000 0001

DATA FLOW ORIENTED HARDWARE DESIGN OF RNS-BASED POLYNOMIAL MULTIPLICATION FOR SHE ACCELERATION

Modular Neural Networks CPSC 533 Franco Lee Ian Ko Modular Neural Networks What is it ? Dif

High-performance Elliptic Curve Cryptography by Using the CIOS Method for Modular Multiplication A

Efficient and secure modular operations using the Polynomial Modular Number System (Part 1)

McCaskill: Efficient Base Pair Probabilities Idea: Compute p kl := Pr[( k , l ) | S ] recursively

IRNAS OPEN HARDWARE INSTITUTE Modular open hardware design for electronics and 3D printing

HW/SW Codesign w/ FPGAs The Nature of HW/SW III ECE 522 The Dualism of Hardware and Software The

SpArch: Efficient Architecture for Sparse Matrix Multiplication Zhekai Zhang* 1 , Hanrui Wang * 1

Efficient Leak Resistant Modular Exponentiation in RNS Andrea Lesavourey (1) , Christophe Negre

Modular Assembly: An Efficient Approach for Creation and Maintenance of Persistent Space Assets

Modular Hardware Architecture for Somewhat Homomorphic Function Evaluation CHES 2015 Sujoy Sinha

9. Hardware-Aware Numerics Approaching supercomputing ... 9. Hardware-Aware Numerics Numerical

Integrated Modular Integrated Modular Federal Aviation Administration Avionics Approval

Hardware Design and Analysis of Efficient Loop Coarsening and Border Handling for Image

Peregreen modular database for efficient storage of historical time series in cloud

1 1000 0000 0100 0000 Fast multiplication Division hardware 0010 0000 01000 11000 0001