rns arithmetic for linear algebra of discrete logarithm
play

RNS Arithmetic for Linear Algebra of Discrete Logarithm Computations - PowerPoint PPT Presentation

RNS Arithmetic for Linear Algebra of Discrete Logarithm Computations Using Parallel Architectures Hamza Jeljeli CARAMEL project-team, LORIA, INRIA / CNRS / Universit e de Lorraine, Hamza.Jeljeli@loria.fr RAIM 2015, Rennes, April 8 th , 2015


  1. RNS Arithmetic for Linear Algebra of Discrete Logarithm Computations Using Parallel Architectures Hamza Jeljeli CARAMEL project-team, LORIA, INRIA / CNRS / Universit´ e de Lorraine, Hamza.Jeljeli@loria.fr RAIM 2015, Rennes, April 8 th , 2015 /* EPI CARAMEL */ C,A, /* Cryptologie, Arithmétique : */ R,a, /* Matériel et Logiciel */ M,E, L,i= 5,e, d[5],Q[999 ]={0};main(N ){for (;i--;e=scanf("%" "d",d+i));for(A =*d; ++i<A ;++Q[ i*i% A],R= i[Q]? R:i); for(;i --;) for(M =A;M --;N +=!M*Q [E%A ],e+= Q[(A +E*E- R*L* L%A) %A]) for( E=i,L=M,a=4;a;C= i*E+R*M*L,L=(M*E +i*L) %A,E=C%A+a --[d]);printf ("%d" "\n", (e+N* N)/2 /* cc caramel.c; echo f3 f2 f1 f0 p | ./a.out */ -A);}

  2. Discrete Logarithm Problem (DLP) Discrete Logarithm Given a cyclic group G = � g � written multiplicatively, the discrete logarithm of h ∈ G is the unique k in [0 , # G − 1] s.t. h = g k . In some groups, DLP is computationally hard. The inverse problem (discrete exponentiation) is easy. Security of cryptographic primitives relies on difficulty of DLP: key agreement: Diffie–Hellman key exchange, encryption: ElGamal encryption, signature: DSA signature, pairing-based cryptography, . . . Evaluate security level of primitives = ⇒ DLP attacks. 1

  3. Linear Algebra Issued from DLP Attacks Focus on DLP in multiplicative subgroups of finite fields GF( q ) . To attack DLP in finite fields, index-calculus methods: solve DLP in time sub-exponential or quasi-polynomial in the size of the finite field; require solving large sparse systems of linear equations over finite fields. Linear Algebra Problem Inputs: a prime ℓ that divides q − 1 and a matrix A . Output: a non-trivial vector w s.t. Aw mod ℓ = 0 . Linear Algebra for Factorization Linear Algebra for DLP Arithmetic over GF(2). Arithmetic over GF( ℓ ). 10% of overall time. 50% of overall time. Bottleneck for computation. 2

  4. Characteristics of the Inputs ℓ between 100 and 1000 bits. A is an N -by- N matrix, N ranges from 10 5 to 10 8 . A is sparse, each row of A contains ∼ 100 of non-zero coefficients. The very first columns are relatively dense, then the column density decreases gradually. The row density does not change significantly. Non-zero coefficients in GF( ℓ ). Example: Resolution of DLP in GF( 2 619 ) × Size of ℓ 217 bits Size of matrix ( N ) 650k Average row weight 100 3

  5. Linear Algebra Harder linear algebra = ⇒ heavy computations, exploit parallelism: Algorithmic level: Sparse linear algebra algorithms 1 Wiedemann : Sequence of O ( N ) iterative Sparse-Matrix–Vector products (SpMV) t xy, t xAy, t xA 2 y, . . . , t xA 2 N y Euro-Par 2014 Block Wiedemann : Distribute in many parallel sequences SpMV level: 2 parallelize SpMV over many nodes. Per-node level: 3 Hardware: GPU, multi-core CPU, many-core, . . . ? WAIFI 2014 Format for sparse matrix? How to map partial SpMV on the architecture? Arithmetic level: arithmetic over GF( ℓ ). 4 Representation: Residue Number System (RNS), Multi-precision? Accelerate arithmetic over SIMD architectures. 4

  6. Table of Contents SpMV: v ← Au mod ℓ 1 RNS for SpMV over Parallel Architectures 2 Experimental Results 3 4

  7. Nature of the Coefficients of A FFS-like matrices NFS-like matrices Composed of 2 parts: A is sparse. A 0 : a sparse N -by- ( N − r ) All coefficients are “small” sub-matrix containing “small” ( | . | ∈ [0 , 2 10 ] ). coefficients (majority of ± 1 ). ∼ 90% are ± 1 . A 1 : a dense N -by- r sub-matrix composed of “large” ( ∈ [0 , ℓ ] ) coefficients. r is between 0 and 10. 5

  8. Required Operations for SpMV SpMV level: v ← Au mod ℓ Row i level: NFS-like matrices FFS-like matrices N N − r N � � � v i ← a ij u j mod ℓ v i ← a ij u j + a ij u j mod ℓ j =1 j =1 j = N − r +1 v i ← v i ± u j , ( a ij = ± 1 ) v i ← v i ± u j , ( a ij = ± 1 ) frequent frequent v i ← v i + a ij × u j , ( | a ij | < 2 10 ) v i ← v i + a ij × u j , ( | a ij | < 2 10 ) less frequent less frequent v i ← v i + a ij × u j , ( 0 ≤ a ij < ℓ ) less frequent v i ← v i mod ℓ (lazy reduction) v i ← v i mod ℓ (lazy reduction) not frequent not frequent 6

  9. Table of Contents SpMV: v ← Au mod ℓ 1 RNS for SpMV over Parallel Architectures 2 Experimental Results 3 6

  10. A Brief Reminder on Residue Number System (RNS) RNS basis: set of n co-prime integers ( p 1 , . . . , p n ) , P = � n i =1 p i . RNS representation of x ∈ [0 , P − 1] : � x = ( | x | p 1 , . . . , | x | p n ) . Usual operations in RNS: − − − → Addition: x + y = ( | x 1 + y 1 | p 1 , . . . , | x n + y n | p n ) Multiplication by scalar λ < p i : − − − → x × λ = ( | x 1 × λ | p 1 , . . . , | x n × λ | p n ) − − − → Multiplication: x × y = ( | x 1 × y 1 | p 1 , . . . , | x n × y n | p n ) ! ). Operations are mod P (final result should not exceed P △ ⇒ Fully independent parallel computations on the components. Comparison, Division in RNS are more tricky. p i chosen of pseudo-Mersenne form 2 k − c i to speed up | . | p i : 2 k a power of a machine word : 2 32 , 2 64 , . . . c i small compared to 2 k . 7

  11. RNS Addition and Multiplication - Algorithms x + y needs that 2 × ( ℓ − 1) < P − 1 : Input : � x , � y : RNS representations of x , y ∈ Z /ℓ Z . Output : � z : RNS representation of z = x + y for each component i do z i ← | x i + y i | p i x + λ × y , with λ < 2 10 , needs that 2 10 × ( ℓ − 1) < P − 1 : y : RNS representations of x , y ∈ Z /ℓ Z and λ ∈ [2 , 2 10 [ . Input : � x , � Output : � z : RNS representation of z = x + λ × y for each component i do z i ← | x i + λ × y i | p i x + λ × y , with λ < ℓ , needs that ℓ × ( ℓ − 1) < P − 1 : y , � Input : � x , � λ : RNS representations of x , y , λ ∈ Z /ℓ Z Output : � z : RNS representation of z = x + λ × y for each component i do z i ← | x i + λ i × y i | p i 8

  12. RNS Reduction Modulo ℓ [Bernstein 94] Problem: We have an x in RNS, x mod ℓ ? Chinese Remainder Theorem (CRT) reconstruction: x = � n i =1 γ i · P i mod P , where P i = P � − 1 | p i � p i , γ i � � x i · | P i � p i � n � �� n � i =1 γ i P i γ i x = � n � i =1 γ i P i − αP , where α � = P p i i =1 n � If α known ⇒ z � γ i | P i | ℓ − | αP | ℓ i =1  ≡ x (mod ℓ ) z  n  z satisfies � ∈ [0 , ℓ p i [ z   i =1 ⇒ Full RNS computation of z . ⇒ z is not exact reduction of x . However, approximate reduction guarantees that intermediate results of SpMV computation do not exceed a bound that we impose less than P . 9

  13. RNS Approximate Reduction Modulo ℓ - Algorithm � − 1 � Pre calculation : Vector ( � P i p i ) for i ∈ { 1 , . . . , n } � Table of RNS representations − − → | P j | ℓ for j ∈ { 1 , . . . , n } Table of RNS representations − − − → | αP | ℓ for α ∈ { 1 , . . . , n − 1 } Input : � x : RNS representation of x , with 0 ≤ x < P n � Output : � z : RNS representation of z ≡ x (mod ℓ ) , with z < ℓ p i i =1 for each component i do � � � − 1 � γ i ← � x i × � P i /* 1 RNS product */ � � � p i � p i broadcast γ i compute α /* addition of n s -bit terms */ for each component i do � � n � � � � � z i ← γ j × � | P j | ℓ /* ( n − 1) RNS additions & n RNS products */ � � � p i � � � j =1 � p i � � � � z i ← � z i − � | αP | ℓ /* 1 RNS subtraction */ � � � p i � p i 10

  14. Required Operations for SpMV in RNS SpMV level: v ← Au mod ℓ Row i level: NFS-like matrices FFS-like matrices N N − r N � � � v i ← a ij u j mod ℓ v i ← a ij u j + a ij u j mod ℓ j =1 j =1 j = N − r +1 v i ← v i ± u j , ( a ij = ± 1 ) v i ← v i ± u j , ( a ij = ± 1 ) frequent and easy frequent and easy v i ← v i + a ij × u j , ( | a ij | < 2 10 ) v i ← v i + a ij × u j , ( | a ij | < 2 10 ) less frequent and easy less frequent and easy v i ← v i + a ij × u j , ( 0 ≤ a ij < ℓ ) less frequent and easy but binding v i ← v i mod ℓ (lazy reduction) v i ← v i mod ℓ (lazy reduction) not frequent and hard not frequent and hard 11

  15. How long is the RNS Basis? FFS-like matrices: Take a basis B ( n, k ) that handles the product by A . 1 Let s be the maximal norm of the rows of A : sℓ � n i =1 p i < P (Recall that Wiedemann is iterative). NFS-like matrices: Take a minimal-length basis B ( n, k ) when multiplying by A 0 1 Extend to a larger basis B|| ˆ B ( n + ˆ n, k ) when multiplying by A 1 2 � sℓ ( � n i =1 p i + � ˆ n i =1 ˆ p i ) < P ( product by A 0 ) P ˆ rℓ × sℓ ( � n i =1 p i + � ˆ n i =1 ˆ p i ) < P ( product by A 1 ) . Basis extension: approach similar to reduction modulo ℓ For each modulus ˆ p j of the new basis: � n � � � � x j = | x | ˆ ˆ p j = γ i | P i | ˆ p j − | αP | ˆ . � � p j � � � � i =1 p j ˆ 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend