the tangent fft d j bernstein university of illinois at
play

The tangent FFT D. J. Bernstein University of Illinois at Chicago - PDF document

The tangent FFT D. J. Bernstein University of Illinois at Chicago Advertisement SPEED: Software Performance Enhancement for Encryption and Decryption A workshop on software speeds for secret-key cryptography and public-key


  1. The tangent FFT D. J. Bernstein University of Illinois at Chicago

  2. Advertisement “SPEED: Software Performance Enhancement for Encryption and Decryption” A workshop on software speeds for secret-key cryptography and public-key cryptography. Amsterdam, June 11–12, 2007 http:// www.hyperelliptic.org/SPEED

  3. The convolution problem How quickly can we multiply polynomials in the ring R [ x ]? Answer depends on degrees, representation of polynomials, number of polynomials, etc. Answer also depends on definition of “quickly.” Many models of computation; many interesting cost measures.

  4. f ; g 2 R [ x ], Assume two inputs f < m , deg g � n � m , deg f g < n . Assume f ; g ; f g so deg represented as coeff sequences. How quickly can we compute the n coeffs of f g , given f ’s m coeffs g ’s n � m + 1 coeffs? and m , f 0 ; f 1 ; : : : ; f 2 R m � 1 ) Inputs ( n � m +1 . g 0 ; g 1 ; : : : ; g 2 R n � m ) ( n h 0 ; h 1 ; : : : ; h 2 R n � 1 ) Output ( n � 1 = h 0 + h 1 x + � � � + h x n � 1 with f 0 + f 1 x + � � � )( g 0 + g 1 x + � � � ). (

  5. � � � � � � � � � � � � � � � � � Assume R -algebraic algorithms (without divs, branches): chains of u; v 7! u + v , binary R -adds u; v 7! u � v , binary R -subs u; v 7! uv , binary R -mults starting from inputs and constants. Example (1963 Karatsuba): f 0 � h 0 � � � � � f 1 � � � � � + � � � � � � � � � h 1 � � � � � � � � � � � � g 0 � � � � � � � + + � � � g 1 � � h 2 � � � � � �

  6. Cost measure for this talk: total R -algebraic complexity. Cost 1 for binary R -add; cost 1 for binary R -sub; cost 1 for binary R -mult; cost 0 for constant in R . Many real-world computations use (e.g.) Pentium M’s floating-point operations to approximate operations in R . Properly scheduled operations achieve Pentium M cycles � total R -algebraic complexity.

  7. Fast Fourier transforms � 2 C as exp(2 � i=n ). n Define n n T : C [ x ] = ( x � 1) , Define ։ C n � 1 f 7! f (1) ; f ( � ; : : : ; f ( � n ) n as ). T . Can very quickly compute First publication of fast algorithm: 1866 Gauss. Easy to see that Gauss’s FFT uses O ( n lg n ) arithmetic operations n 2 f 1 ; 2 ; 4 ; 8 ; : : : g . if Several subsequent reinventions, ending with 1965 Cooley Tukey.

  8. Inverse map is also very fast. n is very fast. Multiplication in C 1966 Sande, 1966 Stockham: Can very quickly multiply n in C [ x ] = ( x � 1) or C [ x ] or R [ x ] n n . by mapping C [ x ] = ( x � 1) to C n f ; g 2 C [ x ] = ( x � 1): Given � 1 ( f g as T T ( f ) T ( g )). compute f ; g 2 C [ x ] with deg f g < n : Given f g from compute n its image in C [ x ] = ( x � 1). O ( n lg n ). R -algebraic complexity

  9. A closer look at costs More precise analysis of Gauss FFT (and Cooley-Tukey FFT): n n using C [ x ] = ( x � 1) , ։ C (1 = 2) n lg n binary C -adds, (1 = 2) n lg n binary C -subs, (1 = 2) n lg n binary C -mults, n 2 f 1 ; 2 ; 4 ; 8 ; : : : g . if a; b ) 2 R 2 represents a + bi 2 C . ( ; 2 ; 6: C -add, C -sub, C -mult cost 2 a; b ) + ( ; d ) = ( a + ; b + d ), ( a; b ) � ( ; d ) = ( a � ; b � d ), ( a; b )( ; d ) = ( a � bd; ad + b ). (

  10. Total cost 5 n lg n . Easily save some time: eliminate mults by 1; � 1 ; i; � i absorb mults by p into subsequent operations; � � i p p simplify mults by a; b )(1 = 2 ; 1 = p p using, e.g., ( 2) = a � b ) = 2 ; ( a + b ) = (( 2). Cost 5 n lg n � 10 n + 16 n n , to map C [ x ] = ( x � 1) , ։ C n 2 f 4 ; 8 ; 16 ; 32 ; : : : g . if

  11. What about cost of convolution? 5 n lg n + O ( n ) to compute T ( f ), 5 n lg n + O ( n ) to compute T ( g ), n , O ( n ) to multiply in C � 1 . n lg n + O ( n ) for T similar 5 Total cost 15 n lg n + O ( n ) n f g 2 C [ x ] = ( x � 1) to compute n f ; g 2 C [ x ] = ( x � 1). given Total cost (15 = 2) n lg n + O ( n ) n f g 2 R [ x ] = ( x � 1) to compute n f ; g 2 R [ x ] = ( x � 1): map given n n= 2 � 1 R [ x ] = ( x � 1) , � C ։ R 2 (Gauss) to save half the time.

  12. 1968 R. Yavne: Can do better! Cost 4 n lg n � 6 n + 8 n n , to map C [ x ] = ( x � 1) , ։ C n 2 f 2 ; 4 ; 8 ; 16 ; : : : g . if

  13. 1968 R. Yavne: Can do better! Cost 4 n lg n � 6 n + 8 n n , to map C [ x ] = ( x � 1) , ։ C n 2 f 2 ; 4 ; 8 ; 16 ; : : : g . if 2004 James Van Buskirk: Can do better! Cost (34 = 9) n lg n + O ( n ). Expositions of the new algorithm: Frigo, Johnson, in IEEE Trans. Signal Processing ; Lundy, Van Buskirk, in Computing ; Bernstein, this talk, expanding an old idea of Fiduccia.

  14. Van Buskirk, comp.arch , January 2005: “Have you ever considered changing djbfft to get better opcounts along the lines of home.comcast.net/~kmbtib ?”

  15. Van Buskirk, comp.arch , January 2005: “Have you ever considered changing djbfft to get better opcounts along the lines of home.comcast.net/~kmbtib ?” Bernstein, comp.arch : “What do you mean, ‘better opcounts’? The k : : : of a size-2 algebraic complexity complex DFT has stood at (3 k � k +4 additions and ( k +4 k � 3)2 3)2 multiplications since 1968.”

  16. Van Buskirk, comp.arch , January 2005: “Have you ever considered changing djbfft to get better opcounts along the lines of home.comcast.net/~kmbtib ?” Bernstein, comp.arch : “What do you mean, ‘better opcounts’? The k : : : of a size-2 algebraic complexity complex DFT has stood at (3 k � k +4 additions and ( k +4 k � 3)2 3)2 multiplications since 1968.” Van Buskirk, comp.arch : “Oh, you’re so 20th century, Dan. Read the handwriting on the wall.”

  17. Understanding the FFT n The FFT trick: C [ x ] = ( x � 1) , n= 2 n= 2 + 1) ։ C [ x ] = ( x � 1) � C [ x ] = ( x by unique C [ x ]-algebra morphism. Cost 2 n : n= 2 C -adds, n= 2 C -subs. n = 4: C [ x ] = ( x 4 � 1) , e.g. ։ C [ x ] = ( x 2 � 1) � C [ x ] = ( x 2 + 1) g 0 + g 1 x + g 2 x 2 + g 3 x 3 7! by g 0 + g 2 ) + ( g 1 + g 3 ) x; ( g 0 � g 2 ) + ( g 1 � g 3 ) x . ( g 0 ; g 1 ; g 2 ; g 3 ) 7! Representation: ( g 0 + g 2 ) ; ( g 1 + g 3 )) ; (( g 0 � g 2 ) ; ( g 1 � g 3 )). ((

  18. n= 2 Recurse: C [ x ] = ( x � 1) , n= 4 n= 4 +1); ։ C [ x ] = ( x � 1) � C [ x ] = ( x n= 2 + 1) similarly C [ x ] = ( x , n= 4 n= 4 + ։ C [ x ] = ( x � i ) � C [ x ] = ( x i ); continue to C [ x ] = ( x � 1) � � � � . n General case: C [ x ] = ( x � � 2 ) , n= 2 n= 2 + ։ C [ x ] = ( x � � ) � C [ x ] = ( x � ) n= 2 + g 0 + � � � + g x � � � 7! n= 2 by g 0 + �g g 1 + � � � � ) x + � � � , n= 2 ) + ( ( g 0 � �g g 1 � � � � � ) x + � � � . n= 2 ) + ( ( Cost 5 n : n= 2 C -mults, n= 2 C -adds, n= 2 C -subs. Recurse, eliminate easy mults. Cost 5 n lg n � 10 n + 16.

  19. Alternative: the twisted FFT n After previous C [ x ] = ( x � 1) , n= 2 n= 2 +1), ։ C [ x ] = ( x � 1) � C [ x ] = ( x apply unique C -algebra morphism n= 2 +1) n= 2 C [ x ] = ( x , ։ C [ y ] = ( y � 1) x to � y . n that maps n= 2 + g 0 + g 1 x + � � � + g x � � � 7! n= 2 g 0 + g g 1 + g x + � � � , n= 2 )+( n= 2+1 ) ( g 0 � g � g 1 � g y + � � � . n ( n= 2 )+ n= 2+1 ) ( n : n= 2 C -mults, Again cost 5 n= 2 C -adds, n= 2 C -subs. Eliminate easy mults, recurse. Cost 5 n lg n � 10 n + 16.

  20. The split-radix FFT FFT and twisted FFT end up with � n , same number of mults by � n= 2 , same number of mults by � n= 4 , same number of mults by etc. Is this necessary? No! Split-radix FFT: more easy mults. “Don’t twist until you see i ’s.” the whites of their (Can use same idea to speed up Sch¨ onhage-Strassen algorithm for integer multiplication.)

  21. n Cost 2 n : C [ x ] = ( x � 1) , n= 2 n= 2 +1). ։ C [ x ] = ( x � 1) � C [ x ] = ( x n= 2 + 1) n : C [ x ] = ( x , Cost n= 4 n= 4 + ։ C [ x ] = ( x � i ) � C [ x ] = ( x i ). n= 4 Cost 6( n= 4): C [ x ] = ( x � i ) , n= 4 ։ C [ y ] = ( y � 1) by x 7! � y . n n= 4 + Cost 6( n= 4): C [ x ] = ( x i ) , n= 4 � 1 ։ C [ y ] = ( y � 1) by x 7! � y . n Overall cost 6 n to split into 1 = 2 ; 1 = 4 ; 1 = 4, entropy 1 : 5 bits. Eliminate easy mults, recurse. Cost 4 n lg n � 6 n + 8, exactly as in 1968 Yavne.

  22. The tangent FFT Several ways to achieve i� . e cost 6 for mult by i� e One approach: Factor i tan � ) cos � . as (1 + � . Cost 2 for mult by cos i tan � . Cost 4 for mult by 1 + For stability and symmetry, fj cos � j ; j sin � jg use max � . instead of cos Surprise (Van Buskirk): Can merge some cost-2 mults!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend