Fast Integer Multiplication with Sch onhage-Strassens Algorithm - PowerPoint PPT Presentation

Fast Integer Multiplication with Sch¨ onhage-Strassen’s Algorithm Alexander Kruppa CACAO team at LORIA, Nancy s´ eminaire Algorithms INRIA Rocquencourt

Contents Contents Contents of this talk: 1. Basics of Integer Multiplication (a) by polynomial multiplication (b) Evaluation/Interpolation (c) Karatsuba’s method (d) Toom-Cook method (e) FFT and weighted transforms (f) Sch¨ onhage-Strassen’s Algorithm Alexander Kruppa 2 INRIA Rocquencourt

Contents 2. Motivation for Improving SSA 3. Sch¨ onhage-Strassen’s Algorithm 4. High-Level Improvements (a) Mersenne Transform (b) CRT Reconstruction √ 2 Trick (c) 5. Low-Level Improvements (a) Arithmetic modulo 2 n + 1 (b) Cache Locality (c) Fine-Grained Tuning 6. Timings, Comparisons and Untested Ideas Alexander Kruppa 3 INRIA Rocquencourt

Integer Multiplication Integer Multiplication • Problem: given two n -word (word base β ) integers n − 1 � a i β i , a = i =0 0 ≤ a i < β and likewise for b , compute 2 n − 1 � c i β i c = ab = i =0 n − 1 n − 1 � � a i b j β i + j , = i =0 j =0 with 0 ≤ c i < β . Alexander Kruppa 4 INRIA Rocquencourt

by Polynomial Multiplication by Polynomial Multiplication • We can rewrite the problem as polynomial arithmetic: n − 1 � a i x i A ( x ) = i =0 so that a = A ( β ) , likewise for B ( x ) , then C ( x ) = A ( x ) B ( x ) n − 1 n − 1 � � a i b j x i + j = i =0 j =0 so that c = ab = C ( β ) . • Double sum has complexity O ( n 2 ) (Grammar School Algorithm), we can do much better Alexander Kruppa 5 INRIA Rocquencourt

Evaluation/Interpolation Evaluation/Interpolation • Unisolvence Theorem: Polynomial of degree d − 1 is determined uniquely by values at d distinct points • Since C ( k ) = A ( k ) B ( k ) for all k ∈ R for ring R , reduce the polynomial multiplication to: 1. Evaluate A ( x ) , B ( x ) at 2 n − 1 points k 0 , . . . , k 2 n − 2 2. Pairwise multiply to get C ( k i ) = A ( k i ) B ( k i ) 3. Interpolate C ( x ) from its values C ( k i ) Alexander Kruppa 6 INRIA Rocquencourt

Karatsuba’s Method Karatsuba’s Method • First algorithm to use this principle (Karatsuba and Ofman, 1962) • Multiplies polynomials of degree 1 : A ( x ) = a 0 + a 1 x • Suggested points of evaluation: 0 , 1 , ∞ • A (0) = a 0 , A (1) = a 0 + a 1 , A ( ∞ ) = a 1 (same for B ( x ) ) • C (0) = a 0 b 0 , C (1) = ( a 0 + a 1 )( b 0 + b 1 ) , C ( ∞ ) = a 1 b 1 • c 0 = C (0) , c 2 = C ( ∞ ) , c 1 = C (1) − c 0 − c 2 • Product of 2 n words computed with 3 pointwise multiplications of n words each, applied recursively: O ( n log 2 (3) ) = O ( n 1 . 585 ) Alexander Kruppa 7 INRIA Rocquencourt

Toom-Cook Method Toom-Cook Method • Generalized to polynomials of larger degree (Toom, 1963, Cook, 1966) • Product of two n word integers with A ( x ) , B ( x ) of degree d : 2 d + 1 products of n/ ( d + 1) word integers • For fixed d : complexity O ( n log d +1 (2 d +1) ) , e.g. d = 2 : O ( n 1 . 465 ) • Interpolation/Evaluation costly ( O ( dn log( d )) ), cannot increase d arbitrarily for given n • Choosing d as function of n allows algorithm in O ( n 1+ ǫ ) , for any ǫ > 0 . Small exponents need very large n Alexander Kruppa 8 INRIA Rocquencourt

Evaluation/Interpolation with FFT Evaluation/Interpolation with FFT • FFT solves problem of costly evaluation/interpolation a j = A ( ω j • Length- ℓ DFT of a 0 , ..., a ℓ − 1 in R computes ˜ ℓ ) , 0 ≤ j < ℓ , with ω ℓ an ℓ -th principal root of unity in R : ℓ -point polynomial evaluation • Length- ℓ IDFT computes a i from given ˜ a j : ℓ -point polynomial interpolation • With FFT algorithm, algebraic complexity only O ( ℓ log( ℓ )) • Problem: R needs to support length- ℓ FFT (preferably ℓ a power of 2 ): needs ℓ -th principal root of unity, ℓ a unit Alexander Kruppa 9 INRIA Rocquencourt

Weighted Transform Weighted Transform ℓ ) ℓ = 1 for all j ∈ N , C 1 ( x ) x ℓ + C 0 ( x ) has same DFT • Since ( ω j coefficients as C 1 ( x ) + C 0 ( x ) : implicit modulus x ℓ − 1 in DFT • FFT convolution gives C ( x ) = ( A ( x ) B ( x )) mod ( x ℓ − 1) : cyclic convolution • Can change that modulus with weighed transform: compute C ( wx ) = ( A ( wx ) B ( wx )) mod ( x ℓ − 1) . Then A ( wx ) B ( wx ) = C 1 ( wx ) x ℓ w ℓ + C 0 ( wx ) C ( wx ) = C 1 ( wx ) x ℓ w ℓ + C 0 ( wx ) mod ( x ℓ − 1) = C 1 ( wx ) w ℓ + C 0 ( wx ) so that C ( x ) = ( A ( x ) B ( x )) mod ( x ℓ − w ℓ ) • With w ℓ = − 1 , we get modulus x ℓ +1 : negacyclic convolution, but need 2 ℓ -th root of unity in R Alexander Kruppa 10 INRIA Rocquencourt

Sch¨ onhage-Strassen’s Algorithm: Basic Idea Sch¨ onhage-Strassen’s Algorithm: Basic Idea • First algorithms to use FFT (Sch¨ onhage and Strassen 1971) • Uses ring R n = Z / (2 n + 1) Z for transform, with ℓ = 2 k | n • Then 2 n/ℓ ≡ − 1 (mod 2 n +1) : so 2 n/ℓ ∈ R n is 2 ℓ -th root of unity, multiplication by powers of 2 is fast! ( O ( n ) ) • Allows length ℓ weighted transform for negacyclic convolution • Write input a = A (2 M ) , b = B (2 M ) , compute C ( x ) = A ( x ) B ( x ) (mod x ℓ + 1) . Then c = C (2 M ) = ab (mod 2 Mℓ + 1) • Point-wise products modulo 2 n + 1 use SSA recursively: choose next level’s ℓ ′ , M ′ so that M ′ ℓ ′ = n Alexander Kruppa 11 INRIA Rocquencourt

Improvements to Sch¨ onhage- Strassen’s Algorithm Alexander Kruppa 12 INRIA Rocquencourt

Motivation for Improving SSA Motivation for Improving SSA • Integer multiplication is fundamental to arithmetic, used in PRP testing, ECPP , polynomial multiplication • Sch¨ onhage-Strassen’s algorithm [SSA]: good asymptotic complexity O ( n log n log log n ) , fast in practice for large operands, exact (only integer arithmetic) • Used in GMP , widely deployed • We improved algorithmic aspects of Sch¨ onhage-Strassen • Validated by implementation based on GMP 4.2.1 [GMP] Alexander Kruppa 13 INRIA Rocquencourt

Sch¨ onhage-Strassen’s Algorithm Sch¨ onhage-Strassen’s Algorithm • SSA reduces multiplication of two S -bit integers to ℓ multiplications of approx. 4 S/ℓ -bit integers • Example: multiply two numbers a, b of 2 20 bits each ⇒ product has at most 2 21 bits 1. Choose N = 2 21 and a good ℓ , for this example ℓ = 512 . We compute ab mod (2 N + 1) 2. Write a as polynomial of degree ℓ , coefficients a i < 2 M with M = N/ℓ , a = a (2 M ) . Same for b 3. ab = a (2 M ) b (2 M ) mod (2 N + 1) , compute c ( x ) = a ( x ) b ( x ) mod ( x ℓ + 1) 4. Convolution theorem: Fourier transform and pointwise multiplication Alexander Kruppa 14 INRIA Rocquencourt

Sch¨ onhage-Strassen’s Algorithm 5. FFT needs ℓ -th root of unity: map to Z / (2 n + 1) Z [ x ] with ℓ | n . Then 2 2 n/ℓ has order ℓ 6. We need 2 n + 1 > c i : choose n ≥ 2 M + log 2 ( ℓ ) + 1 7. Compute c ( x ) = a ( x ) b ( x ) mod ( x ℓ + 1) , evaluate ab = c (2 M ) - and we’re done! 8. Benefits: - Root of unity is power of 2 - Reduction mod(2 n + 1) is fast - Point-wise products can use SSA recursively without padding ⇒ Z [ x ] mod ( x ℓ + 1) = ⇒ Z 2 n +1 [ x ] mod ( x ℓ + 1) = Z = ⇒ Z 2 N +1 = ⇒ Z 2 n +1 ✻ ❄ ✟ ❍ ✟✟✟✟ ❍ n ❍ ❍ No, recurse ❍ ❍❍❍❍ ✟ small? ✟ ✟ ✟ ❍ ✟ ❄ Yes, multiply Alexander Kruppa 15 INRIA Rocquencourt

High-Level Optimizations High-Level Optimizations Alexander Kruppa 16 INRIA Rocquencourt

Mersenne Transform Mersenne Transform • Convolution theorem implies reduction mod( x ℓ − 1) • Convolution mod( x ℓ + 1) needs weights θ i with ord ( θ ) = 2 ℓ , needs ℓ | n to get 2 ℓ -th root of unity in R n • Computing ab mod (2 N + 1) to allows recursive use of SSA, but is not required at top level • Map a and b to Z / (2 N − 1) Z instead: compute c ( x ) = a ( x ) b ( x ) mod ( x ℓ − 1) • Condition relaxes to ℓ | 2 n . Twice the transform length, smaller n • No need to apply/unapply weights Alexander Kruppa 17 INRIA Rocquencourt

CRT Reconstruction CRT Reconstruction a, b a, b Z / (2 N + 1) Z Z / (2 rN + 1) Z Z / (2 sN − 1) Z Convolution Convolution Convolution ab CRT with 2 N + 1 > ab ab with (2 rN + 1)(2 sN − 1) > ab Alexander Kruppa 18 INRIA Rocquencourt

CRT Reconstruction • At least one of (2 rN − 1 , 2 sN +1) and (2 rN +1 , 2 sN − 1) is coprime • Our implementation uses (2 rN +1 , 2 N − 1) : always coprime, good speed • Smaller convolution, finer-grained parameter selection Alexander Kruppa 19 INRIA Rocquencourt

√ 2 Trick The √ The 2 Trick • If 4 | n , 2 is a quadratic residue in Z / (2 n + 1) Z √ √ 2 ≡ 2 3 n/ 4 − 2 n/ 4 : simple form, multiplication by • In that case, 2 takes only 2 shift, 1 subtraction modulo 2 n + 1 • Offers root of unity of order 4 n , allows ℓ | 2 n for Fermat transform, ℓ | 4 n for Mersenne transform • Sadly, higher roots of 2 usually not available in Z / (2 n + 1) Z , or have no simple form Alexander Kruppa 20 INRIA Rocquencourt

Low-Level Optimizations Low-Level Optimizations Alexander Kruppa 21 INRIA Rocquencourt

Fast Integer Multiplication with Sch onhage-Strassens Algorithm - PowerPoint PPT Presentation

Fast Integer Multiplication with Sch onhage-Strassens Algorithm Alexander Kruppa CACAO team at LORIA, Nancy s eminaire Algorithms INRIA Rocquencourt Contents Contents Contents of this talk: 1. Basics of Integer Multiplication (a)

CS 401 Integer Multiplication / Matrix Multiplication Xiaorui Sun 1 Integer Multiplication

MA/CSSE 473 Day 17 Divide-and-conquer Convex Hull Strassen's Algorithm: Matrix Multiplication

lecture 7 Integer multiplication (grade school) How to do (unsigned) integer multiplication in

Divide and Conquer Chapter 4 1 Integer Multiplication 2 Integer Multiplication =

lecture 7 Sequential circuits 3 - integer multiplication and division - floating point

CS 1501 www.cs.pitt.edu/~nlf4/cs1501/ Integer Multiplication Integer multiplication Say we have

Inputs and Outputs Objects with Analog Inputs A io.sch analog.sch Objects with Digital Inputs

On fast multiplication of a matrix by its transpose Jean-Guillaume Dumas Cl ement Pernet

Secure Strassen-Winograd Matrix Multiplication with MapReduce Radu Ciucanu 1 Matthieu Giraud 2

MA/CSSE 473 Day 17 Divide-and-conquer Convex Hull Strassen's Algorithm: Matrix Multiplication

MA/CSSE 473 Day 14 Strassen's Algorithm: Matrix Multiplication Decrease and Conquer DFS

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Statements and open sentences Statements: 2 is an even integer. 3 is an even integer.

Multiplication by an Integer Constant: Lower Bounds on the Code Length Vincent L EFVRE Loria,

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Topic 1/10/20 L.O To use role play -I can understand the significance of Alexander the great

Central-Upwind Schemes for Shallow Water Models Alexander Kurganov Southern University of

On Alexander Polynomials of Graphs Zhongtao Wu (joint with Yuanyuan Bao) The Chinese University

Alexander Duyck Open Source Technologist Intel Corporation Alexander.Duyck@gmail.com Agenda

iPlots 2.0 data models (static) graphics but is less strong in exploratory data

Introduction to Data Science CS 5963 / Math 3900 Alexander Lex Braxton Osting alex@sci.utah.edu

The Console Toolkit - Part 2 Alexander Schoch TheAlternative, SSC | ETHZ and UZH 29 th of March,

Luttinger Liquid Alexander Chudnovskiy Hamburg Unversity Luttinger Liquid Luttinger liquid