fast integer multiplication with sch onhage strassen s
play

Fast Integer Multiplication with Sch onhage-Strassens Algorithm - PowerPoint PPT Presentation

Fast Integer Multiplication with Sch onhage-Strassens Algorithm Alexander Kruppa CACAO team at LORIA, Nancy s eminaire Algorithms INRIA Rocquencourt Contents Contents Contents of this talk: 1. Basics of Integer Multiplication (a)


  1. Fast Integer Multiplication with Sch¨ onhage-Strassen’s Algorithm Alexander Kruppa CACAO team at LORIA, Nancy s´ eminaire Algorithms INRIA Rocquencourt

  2. Contents Contents Contents of this talk: 1. Basics of Integer Multiplication (a) by polynomial multiplication (b) Evaluation/Interpolation (c) Karatsuba’s method (d) Toom-Cook method (e) FFT and weighted transforms (f) Sch¨ onhage-Strassen’s Algorithm Alexander Kruppa 2 INRIA Rocquencourt

  3. Contents 2. Motivation for Improving SSA 3. Sch¨ onhage-Strassen’s Algorithm 4. High-Level Improvements (a) Mersenne Transform (b) CRT Reconstruction √ 2 Trick (c) 5. Low-Level Improvements (a) Arithmetic modulo 2 n + 1 (b) Cache Locality (c) Fine-Grained Tuning 6. Timings, Comparisons and Untested Ideas Alexander Kruppa 3 INRIA Rocquencourt

  4. Integer Multiplication Integer Multiplication • Problem: given two n -word (word base β ) integers n − 1 � a i β i , a = i =0 0 ≤ a i < β and likewise for b , compute 2 n − 1 � c i β i c = ab = i =0 n − 1 n − 1 � � a i b j β i + j , = i =0 j =0 with 0 ≤ c i < β . Alexander Kruppa 4 INRIA Rocquencourt

  5. by Polynomial Multiplication by Polynomial Multiplication • We can rewrite the problem as polynomial arithmetic: n − 1 � a i x i A ( x ) = i =0 so that a = A ( β ) , likewise for B ( x ) , then C ( x ) = A ( x ) B ( x ) n − 1 n − 1 � � a i b j x i + j = i =0 j =0 so that c = ab = C ( β ) . • Double sum has complexity O ( n 2 ) (Grammar School Algorithm), we can do much better Alexander Kruppa 5 INRIA Rocquencourt

  6. Evaluation/Interpolation Evaluation/Interpolation • Unisolvence Theorem: Polynomial of degree d − 1 is determined uniquely by values at d distinct points • Since C ( k ) = A ( k ) B ( k ) for all k ∈ R for ring R , reduce the polynomial multiplication to: 1. Evaluate A ( x ) , B ( x ) at 2 n − 1 points k 0 , . . . , k 2 n − 2 2. Pairwise multiply to get C ( k i ) = A ( k i ) B ( k i ) 3. Interpolate C ( x ) from its values C ( k i ) Alexander Kruppa 6 INRIA Rocquencourt

  7. Karatsuba’s Method Karatsuba’s Method • First algorithm to use this principle (Karatsuba and Ofman, 1962) • Multiplies polynomials of degree 1 : A ( x ) = a 0 + a 1 x • Suggested points of evaluation: 0 , 1 , ∞ • A (0) = a 0 , A (1) = a 0 + a 1 , A ( ∞ ) = a 1 (same for B ( x ) ) • C (0) = a 0 b 0 , C (1) = ( a 0 + a 1 )( b 0 + b 1 ) , C ( ∞ ) = a 1 b 1 • c 0 = C (0) , c 2 = C ( ∞ ) , c 1 = C (1) − c 0 − c 2 • Product of 2 n words computed with 3 pointwise multiplications of n words each, applied recursively: O ( n log 2 (3) ) = O ( n 1 . 585 ) Alexander Kruppa 7 INRIA Rocquencourt

  8. Toom-Cook Method Toom-Cook Method • Generalized to polynomials of larger degree (Toom, 1963, Cook, 1966) • Product of two n word integers with A ( x ) , B ( x ) of degree d : 2 d + 1 products of n/ ( d + 1) word integers • For fixed d : complexity O ( n log d +1 (2 d +1) ) , e.g. d = 2 : O ( n 1 . 465 ) • Interpolation/Evaluation costly ( O ( dn log( d )) ), cannot increase d arbitrarily for given n • Choosing d as function of n allows algorithm in O ( n 1+ ǫ ) , for any ǫ > 0 . Small exponents need very large n Alexander Kruppa 8 INRIA Rocquencourt

  9. Evaluation/Interpolation with FFT Evaluation/Interpolation with FFT • FFT solves problem of costly evaluation/interpolation a j = A ( ω j • Length- ℓ DFT of a 0 , ..., a ℓ − 1 in R computes ˜ ℓ ) , 0 ≤ j < ℓ , with ω ℓ an ℓ -th principal root of unity in R : ℓ -point polynomial evaluation • Length- ℓ IDFT computes a i from given ˜ a j : ℓ -point polynomial interpolation • With FFT algorithm, algebraic complexity only O ( ℓ log( ℓ )) • Problem: R needs to support length- ℓ FFT (preferably ℓ a power of 2 ): needs ℓ -th principal root of unity, ℓ a unit Alexander Kruppa 9 INRIA Rocquencourt

  10. Weighted Transform Weighted Transform ℓ ) ℓ = 1 for all j ∈ N , C 1 ( x ) x ℓ + C 0 ( x ) has same DFT • Since ( ω j coefficients as C 1 ( x ) + C 0 ( x ) : implicit modulus x ℓ − 1 in DFT • FFT convolution gives C ( x ) = ( A ( x ) B ( x )) mod ( x ℓ − 1) : cyclic convolution • Can change that modulus with weighed transform: compute C ( wx ) = ( A ( wx ) B ( wx )) mod ( x ℓ − 1) . Then A ( wx ) B ( wx ) = C 1 ( wx ) x ℓ w ℓ + C 0 ( wx ) C ( wx ) = C 1 ( wx ) x ℓ w ℓ + C 0 ( wx ) mod ( x ℓ − 1) = C 1 ( wx ) w ℓ + C 0 ( wx ) so that C ( x ) = ( A ( x ) B ( x )) mod ( x ℓ − w ℓ ) • With w ℓ = − 1 , we get modulus x ℓ +1 : negacyclic convolution, but need 2 ℓ -th root of unity in R Alexander Kruppa 10 INRIA Rocquencourt

  11. Sch¨ onhage-Strassen’s Algorithm: Basic Idea Sch¨ onhage-Strassen’s Algorithm: Basic Idea • First algorithms to use FFT (Sch¨ onhage and Strassen 1971) • Uses ring R n = Z / (2 n + 1) Z for transform, with ℓ = 2 k | n • Then 2 n/ℓ ≡ − 1 (mod 2 n +1) : so 2 n/ℓ ∈ R n is 2 ℓ -th root of unity, multiplication by powers of 2 is fast! ( O ( n ) ) • Allows length ℓ weighted transform for negacyclic convolution • Write input a = A (2 M ) , b = B (2 M ) , compute C ( x ) = A ( x ) B ( x ) (mod x ℓ + 1) . Then c = C (2 M ) = ab (mod 2 Mℓ + 1) • Point-wise products modulo 2 n + 1 use SSA recursively: choose next level’s ℓ ′ , M ′ so that M ′ ℓ ′ = n Alexander Kruppa 11 INRIA Rocquencourt

  12. Improvements to Sch¨ onhage- Strassen’s Algorithm Alexander Kruppa 12 INRIA Rocquencourt

  13. Motivation for Improving SSA Motivation for Improving SSA • Integer multiplication is fundamental to arithmetic, used in PRP testing, ECPP , polynomial multiplication • Sch¨ onhage-Strassen’s algorithm [SSA]: good asymptotic complexity O ( n log n log log n ) , fast in practice for large operands, exact (only integer arithmetic) • Used in GMP , widely deployed • We improved algorithmic aspects of Sch¨ onhage-Strassen • Validated by implementation based on GMP 4.2.1 [GMP] Alexander Kruppa 13 INRIA Rocquencourt

  14. Sch¨ onhage-Strassen’s Algorithm Sch¨ onhage-Strassen’s Algorithm • SSA reduces multiplication of two S -bit integers to ℓ multiplications of approx. 4 S/ℓ -bit integers • Example: multiply two numbers a, b of 2 20 bits each ⇒ product has at most 2 21 bits 1. Choose N = 2 21 and a good ℓ , for this example ℓ = 512 . We compute ab mod (2 N + 1) 2. Write a as polynomial of degree ℓ , coefficients a i < 2 M with M = N/ℓ , a = a (2 M ) . Same for b 3. ab = a (2 M ) b (2 M ) mod (2 N + 1) , compute c ( x ) = a ( x ) b ( x ) mod ( x ℓ + 1) 4. Convolution theorem: Fourier transform and pointwise multiplication Alexander Kruppa 14 INRIA Rocquencourt

  15. Sch¨ onhage-Strassen’s Algorithm 5. FFT needs ℓ -th root of unity: map to Z / (2 n + 1) Z [ x ] with ℓ | n . Then 2 2 n/ℓ has order ℓ 6. We need 2 n + 1 > c i : choose n ≥ 2 M + log 2 ( ℓ ) + 1 7. Compute c ( x ) = a ( x ) b ( x ) mod ( x ℓ + 1) , evaluate ab = c (2 M ) - and we’re done! 8. Benefits: - Root of unity is power of 2 - Reduction mod(2 n + 1) is fast - Point-wise products can use SSA recursively without padding ⇒ Z [ x ] mod ( x ℓ + 1) = ⇒ Z 2 n +1 [ x ] mod ( x ℓ + 1) = Z = ⇒ Z 2 N +1 = ⇒ Z 2 n +1 ✻ ❄ ✟ ❍ ✟✟✟✟ ❍ n ❍ ❍ No, recurse ❍ ❍❍❍❍ ✟ small? ✟ ✟ ✟ ❍ ✟ ❄ Yes, multiply Alexander Kruppa 15 INRIA Rocquencourt

  16. High-Level Optimizations High-Level Optimizations Alexander Kruppa 16 INRIA Rocquencourt

  17. Mersenne Transform Mersenne Transform • Convolution theorem implies reduction mod( x ℓ − 1) • Convolution mod( x ℓ + 1) needs weights θ i with ord ( θ ) = 2 ℓ , needs ℓ | n to get 2 ℓ -th root of unity in R n • Computing ab mod (2 N + 1) to allows recursive use of SSA, but is not required at top level • Map a and b to Z / (2 N − 1) Z instead: compute c ( x ) = a ( x ) b ( x ) mod ( x ℓ − 1) • Condition relaxes to ℓ | 2 n . Twice the transform length, smaller n • No need to apply/unapply weights Alexander Kruppa 17 INRIA Rocquencourt

  18. CRT Reconstruction CRT Reconstruction a, b a, b Z / (2 N + 1) Z Z / (2 rN + 1) Z Z / (2 sN − 1) Z Convolution Convolution Convolution ab CRT with 2 N + 1 > ab ab with (2 rN + 1)(2 sN − 1) > ab Alexander Kruppa 18 INRIA Rocquencourt

  19. CRT Reconstruction • At least one of (2 rN − 1 , 2 sN +1) and (2 rN +1 , 2 sN − 1) is coprime • Our implementation uses (2 rN +1 , 2 N − 1) : always coprime, good speed • Smaller convolution, finer-grained parameter selection Alexander Kruppa 19 INRIA Rocquencourt

  20. √ 2 Trick The √ The 2 Trick • If 4 | n , 2 is a quadratic residue in Z / (2 n + 1) Z √ √ 2 ≡ 2 3 n/ 4 − 2 n/ 4 : simple form, multiplication by • In that case, 2 takes only 2 shift, 1 subtraction modulo 2 n + 1 • Offers root of unity of order 4 n , allows ℓ | 2 n for Fermat transform, ℓ | 4 n for Mersenne transform • Sadly, higher roots of 2 usually not available in Z / (2 n + 1) Z , or have no simple form Alexander Kruppa 20 INRIA Rocquencourt

  21. Low-Level Optimizations Low-Level Optimizations Alexander Kruppa 21 INRIA Rocquencourt

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend