Hybrid Position-Residues Number System Karim Bigou and Arnaud - - PowerPoint PPT Presentation

hybrid position residues number system
SMART_READER_LITE
LIVE PREVIEW

Hybrid Position-Residues Number System Karim Bigou and Arnaud - - PowerPoint PPT Presentation

Hybrid Position-Residues Number System Karim Bigou and Arnaud Tisserand CNRS, IRISA, INRIA Centre Rennes - Bretagne Atlantique and Univ. Rennes 1 ARITH 23, July 10 13 Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10


slide-1
SLIDE 1

Hybrid Position-Residues Number System

Karim Bigou and Arnaud Tisserand

CNRS, IRISA, INRIA Centre Rennes - Bretagne Atlantique and Univ. Rennes 1

ARITH 23, July 10 – 13

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 1 / 25

slide-2
SLIDE 2

Context

Work on the design of efficient hardware implementations of asymmetric cryptosystems using advanced arithmetic techniques: RSA [RSA78] Discrete Logarithm Cryptosystems: Diffie-Hellman [DH76] (DH), ElGamal [Elg85] Elliptic Curve Cryptography (ECC) [Mil85] [Kob87] The residue number system (RNS) is a representation which enables fast computations for cryptosystems requiring large integers or FP elements through internal parallelism

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 2 / 25

slide-3
SLIDE 3

Residue Number System (RNS) [SV55] [Gar59]

X a large integer of ℓ bits (ℓ > 200) is represented by: X = (x1, . . . , xn) = (X mod m1, . . . , X mod mn) RNS base B = (m1, . . . , mn), n pairwise co-primes of w bits, n × w ℓ

channel 1 ±× mod m1 w z1 w y1 w x1 channel 2 ±× mod m2 w z2 w y2 w x2

. . . . . . . . . . . .

channel n ±× mod mn w zn w yn w xn X Y Z

RNS relies on the Chinese remainder theorem (CRT) EMM = w-bit elementary modular multiplication in one channel

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 3 / 25

slide-4
SLIDE 4

RNS vs Positional Number Systems

+ × β0 + × β1 + × β2 + × β3 + × β4 + × β5 + × β6 + × β7 Positional + × m0 + × m1 + × m2 + × m3 + × m4 + × m5 + × m6 + × m7 RNS

involves data dependencies involves hard access to positional information Remark: here, one assumes a high radix positional representation of w bits

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 4 / 25

slide-5
SLIDE 5

RNS vs Positional Number Systems

  • peration/feature

RNS Positional Representation multiplication easier harder modular reduction harder easier modular multiplication equivalent equivalent expansion of values harder easier comparisons harder easier parallelism easier harder flexibility easier harder internal randomization easier harder

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 5 / 25

slide-6
SLIDE 6

Proposed Representation: Hybrid Position-Residues HPR

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 6 / 25

slide-7
SLIDE 7

Main principle of HPR (finite field case)

+ × β0 + × β1 + × β2 + × β3 + × β4 + × β5 + × β6 + × β7 Positional + × + × (m0m1)3 + × + × (m0m1)2 + × + × (m0m1)1 + × + × (m0m1)0 HPR d = 4 + × + × + × + × (m0m1m2m3)1 + × + × + × + × (m0m1m2m3)0 HPR d = 2 + × m0 + × m1 + × m2 + × m3 + × m4 + × m5 + × m6 + × m7 RNS

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 7 / 25

slide-8
SLIDE 8

Hybrid Position-Residues Representation HPR

Formally: XHPR =

  • Xd−1a|b, . . . , X0a|b
  • HPR

with X =

d−1

  • i=0

Xi Mi

a

where Ba = (ma,0, . . . , ma, n

d −1) and Bb = (mb,0, . . . , mb, n d −1),

Ma = n

d −1

i=0 ma,i and βminMa Xi βmaxMa (βmax − βmin > 1)

2 RNS bases are required to contain temporary sub-products of HPR words during a full multiplication Remark 1: conversions are made using classical methods (radix conversions and RNS conversions) Remark 2: internal conversions between both RNS bases are made using state-of-the-art base extension methods (e.g using CRT)

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 8 / 25

slide-9
SLIDE 9

Example for 1 HPR-word Multiplication (1/2)

Parameters: Ba = (2, 7, 13), Bb = (3, 5, 11), Ma = 182, Mb = 165 Inputs: X = 141, Y = 101 XHPR =

  • 1, 1, 11, 0, 1, 9a|b
  • YHPR =
  • 1, 3, 10, 2, 1, 2a|b
  • X × Y = 14241

XHPR × YHPR =

  • 1 × 1, 1 × 3, 11 × 10, 0 × 2, 1 × 1, 9 × 2a|b
  • =
  • 1, 3, 6, 0, 1, 7a|b
  • The high part of the product must be propagated :

14241 = 78 × 182 + 45 = 78 × Ma + 45

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 9 / 25

slide-10
SLIDE 10

Decomposition algorithm (Split)

Split decomposes a double word value into 2 HPR-words (i.e radix Ma) Input: Xa|b with X < (βmaxMa)2 and Mb > β2

maxMa

Precomp.: M−1

a b

Output:

  • Qa|b, Ra|b
  • Ra ← Xa

(virtual operation) Rb ← BE (Ra, Ba, Bb) (n/d) × (n/d) EMMs Qb ← (Xb − Rb) × M−1

a b

if Qb = −1b then Qb ← 0b /*using Kawamura BE [KKSS00] */ Rb ← Rb − Mab Qa ← BE (Qb, Bb, Ba) (n/d) × (n/d) EMMs return Qa|b , Ra|b Split becomes faster when d increases (but it reduces the parallelism)

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 10 / 25

slide-11
SLIDE 11

“High” part propagation in HPR

This algorithm uses Split to propagate the high parts (”MSBs” in radix Ma) of subproducts Input: XHPR = (Xd−1, . . . , X0) with Xi < (βmaxMa)2 Output: XHPR = (Xd, . . . , X0) with Xi < (β2

max + 1)Ma

C−1 ← 0, Xd−1 ← 0 for i from 0 to d − 1 parallel do (Ci, Xi) ← Split(Xi) for i from 0 to d parallel do Xi ← Xi + Ci−1 return (Xd, . . . , X0) Remark: to propagate a carry after an addition, we use a small carry propagation algorithm (details in the paper)

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 11 / 25

slide-12
SLIDE 12

Example for 1 HPR-Word Multiplication (2/2)

X × Y = 14241 = 78 × 182 + 45 = 78 × Ma + 45 XHPR × YHPR =

  • 1 × 1, 1 × 3, 11 × 10, 0 × 2, 1 × 1, 9 × 2a|b
  • =
  • 1,3,6,0,1,7a|b
  • =
  • 0,1,0,0,3,1a|b, 1,3,6,0,0,1a|b
  • High part propagation

Using BE, convert 45 from Ba to Bb : 1,3,6a − → 0,0,1b In Bb perform the division by Ma: XY b − |XY |Mab Mab =

  • 0,1,7b − 0,0,1b
  • × 2, 3, 2b

=

  • 0, 1, 6b
  • × 2, 3, 2b

= 0,3,1b Finally one performs another BE from Bb to Ba : 0,3,1b − → 0,1,0a

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 12 / 25

slide-13
SLIDE 13

Application 1: A New Modular Multiplication Algorithm

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 13 / 25

slide-14
SLIDE 14

Principle

Proposition:well-chosen finite fields FP for fast modular multiplications example of application: finite field for ECC P prime with P = Q(Ma) and Q(X) = X d − Q′(X) where Q′ is sparse FP is a d × (n/d) × w = nw bits finite field Md

a ≡ Q′(Ma) mod P

toy example 1: P1 = (2 × 7 × 13)2 − 5 = M2

a − 5 = 33119 is prime

toy example 2: P2 = (3 × 5 × 11)3 − 2 = M3

b − 2 = 27225 is prime

Main Idea: Adapt pseudo-Mersenne modular multiplication for HPR representation

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 14 / 25

slide-15
SLIDE 15

HPR Modular Multiplication

Positional Modular Reduction: reduction using Md

a ≡ Q′(Ma) mod P

example: Q′ = 2 then Zi = Zi + 2Zi+d for i ∈ [0, d − 1]

M 0

a

M 1

a

M 2

a

M 3

a

M 4

a

M 5

a

d = 3 ×2 ×2 ×2 + + +

Parameters: Ba with P = Q(Ma) and Q of degree d Input: XHPR, YHPR Output: ZHPR with Z = XY mod P ZHPR ← HPR Product(XHPR, YHPR) d2(n/d) = 2nd EMMs ZHPR ← Positional Modular Reduction(ZHPR, Q) (n EMAs) ZHPR ← HPR “High” Part Propagation (ZHPR) 2n2

d + 2n EMMs

ZHPR ← Positional Modular Reduction(ZHPR, Q) (n/d EMAs) ZHPR ← HPR Small Carry Propagation (ZHPR) 2n EMMs return ZHPR

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 15 / 25

slide-16
SLIDE 16

Cost of modular multiplication in RNS and HPR for various fixed d

Operation cost: trade-off between HPR product and HPR High part propagation

500 1000 1500 2000 2500 3000 5 10 15 20 25 30 35 40 45 50 55 60 65 70 Operation Cost [EMM] number of moduli (n) RNS HPR d=2 HPR d=3 HPR d=4 HPR d=8

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 16 / 25

slide-17
SLIDE 17

Impact of d for n and the field size fixed

Using schoolbook multiplication, d = √n is the best trade-off

400 800 1200 1600 2000 2400 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Operation Cost [EMM] number of RNS digits (d) n=32 n=24 n=20 n=16 n=12 n=8

d 2 4 8 16 cost (EMM) n2 + 8n

n2 2 + 12n n2 4 + 20n n2 8 + 36n

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 17 / 25

slide-18
SLIDE 18

Sources of Parallelism

Two main sources of parallelism: parallelism due to digits in RNS: decreases while d increases parallelism due to HPR algorithms Assuming n hardware channels (as in usual RNS architectures): High Part propagation: 2 n

d + 2 EMMs by parallel channel

HPR multiplication (schoolbook): 2d EMMs by parallel channel ... ... but number additions increases with d, increasing dependencies between the sub-products Summary: when d is small our algorithm is as parallel as RNS algorithms in practice, d is small (≈ √n)

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 18 / 25

slide-19
SLIDE 19

Application 2: A New Exponentiation Algorithm

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 19 / 25

slide-20
SLIDE 20

Application 2: Modular Exponentiation

Idea: take benefit of positional information of HPR to accelerate some specific, but usual computation patterns Input: k = (kℓ−1, . . . , k1, k0)2, G ∈ Z/NZ Output: G k mod N Z ← 1 for i from ℓ − 1 to 0 do Z ← Z 2 mod N if ki = 1 then S ← Z · G mod N return Z One can observe: Z 2G ≡

  • Z 2

1 M2 a + 2Z1Z0Ma + Z 2

  • G mod N

≡ Z 2

1 |M2 aG|N + Z1Z0|2MaG|N + Z 2 0 |G|N mod N

≡ Z1

  • Z1|M2

aG|N + Z0|2MaG|N

  • + Z 2

0 |G|N mod N

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 20 / 25

slide-21
SLIDE 21

Proposed Regular Modular Exponentiation

For a general d : |Z 2G|N ≡ 2(d−1)

k=0

d−1

i=0 ZiZk−i|Mk a G|N

Parameters: Ba, Bb, Bc with na = nb = n/d and nc = n Input: Ga|b|c, e the exponent Output: Za|b|c with Z = G e mod N, Z < 3P Za|b|c ← |Ma|b|Pa|b|c for i from ℓ − 1 to 0 do ZHPR ← RNStoHPR(Za|b|c,Ba, Bb, Bc) O(n2) if ei = 0 then Za|b|c ← SubProducts(ZHPR, |Ma|b|P) O(d2) Za|b|c ← RNS-MR(Za|b|c, Ba|b, Bc) O(n2) else Za|b|c ← SubProducts(ZHPR, G) O(d2) Za|b|c ← RNS-MR(Za|b|c, Ba|b, Bc) O(n2) return Za|b|c Remark: conversions HPR to RNS are implicits

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 21 / 25

slide-22
SLIDE 22

Comparison with state-of-the-art RNS exponentiation

0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 10 20 30 40 50 60 70 80 90 100 110 120 Cost ratio our/state−of−art number of moduli (n) d=2 d=3 d=4

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 22 / 25

slide-23
SLIDE 23

Conclusion

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 23 / 25

slide-24
SLIDE 24

HPR Conclusion and Further Work

The representation HPR reduces the cost of some RNS modular arithmetic algorithms with a high level of parallelism (for small d) enables to use positional properties (as the extensibility of the representation) or tricks (as pseudo-Mersenne like numbers) provides more flexibility with a lot of new trade-off possibilities Examples of applications: modular multiplication: HPR offers a reduction of computation cost

  • f 40 to 60% reduction (for ECC 256 – 512)

modular multiplication: HPR offers a reduction of computation cost

  • f 20 to 40% (for RSA 2048 – 4096)

Further works: adaptation of other usual arithmetic algorithms (e.g. Montgomery or Barrett Reduction Algorithms) application to very large values (e.g. homomorphic encryption)

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 24 / 25

slide-25
SLIDE 25

Thank you for your attention

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 25 / 25

slide-26
SLIDE 26

References I

[DH76]

  • W. Diffie and M. E. Hellman.

New directions in cryptography. IEEE Transactions on Information Theory, 22(6):644–654, November 1976. [Elg85]

  • T. Elgamal.

A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Transactions on Information Theory, 31(4):469–472, July 1985. [Gar59]

  • H. L. Garner.

The residue number system. IRE Transactions on Electronic Computers, EC-8(2):140–147, June 1959. [KKSS00] S. Kawamura, M. Koike, F. Sano, and A. Shimbo. Cox-Rower architecture for fast parallel Montgomery multiplication. In Proc. 19th International Conference on the Theory and Application of Cryptographic (EUROCRYPT), volume 1807 of LNCS, pages 523–538. Springer, May 2000. [Kob87]

  • N. Koblitz.

Elliptic curve cryptosystems. Mathematics of computation, 48(177):203–209, 1987.

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 26 / 25

slide-27
SLIDE 27

References II

[Mil85]

  • V. Miller.

Use of elliptic curves in cryptography. In Proc. 5th International Cryptology Conference (CRYPTO), volume 218 of LNCS, pages 417–426. Springer, 1985. [RSA78]

  • R. L. Rivest, A. Shamir, and L. Adleman.

A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2):120–126, February 1978. [SV55]

  • A. Svoboda and M. Valach.

Oper´ atorov´ e obvody (operator circuits in czech). Stroje na Zpracov´ an´ ı Informac´ ı (Information Processing Machines), 3:247–296, 1955.

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 27 / 25

slide-28
SLIDE 28

Small Carry Propagation

Input: XHPR with Xi < (mγ − 2)Ma ∀i ∈ [0, d − 1] Parameters: Q′ such that P = Q(Ma) with Q = X d − Q′ Precomp.:

  • M−1

a

Output: XHPR, Xi < 2Ma + (mγ − 2) ∀i ∈ [0, d − 1] for i from 0 to d − 1 do |Ri|mγ ← BE (Xia, Ba, mγ) |Ci|mγ ←

  • (Xi − Ri)M−1

a

if |Ci|mγ = mγ − 1 then |Ci|mγ = 0 Xib ← Xib − |Ci,H|mγ × Mab for i from 1 to d − 1 parallel do Xia|b ← Xia|b + Ci−1a|b return XHPR

Karim Bigou and Arnaud Tisserand HPR Representation ARITH 23, July 10 – 13 28 / 25