The tangent FFT D. J. Bernstein University of Illinois at Chicago

Advertisement “SPEED: Software Performance Enhancement for Encryption and Decryption” A workshop on software speeds for secret-key cryptography and public-key cryptography. Amsterdam, June 11–12, 2007 http:// www.hyperelliptic.org/SPEED

The convolution problem How quickly can we multiply polynomials in the ring R [ x ]? Answer depends on degrees, representation of polynomials, number of polynomials, etc. Answer also depends on definition of “quickly.” Many models of computation; many interesting cost measures.

f ; g 2 R [ x ], Assume two inputs f < m , deg g � n � m , deg f g < n . Assume f ; g ; f g so deg represented as coeff sequences. How quickly can we compute the n coeffs of f g , given f ’s m coeffs g ’s n � m + 1 coeffs? and m , f 0 ; f 1 ; : : : ; f 2 R m � 1 ) Inputs ( n � m +1 . g 0 ; g 1 ; : : : ; g 2 R n � m ) ( n h 0 ; h 1 ; : : : ; h 2 R n � 1 ) Output ( n � 1 = h 0 + h 1 x + � � � + h x n � 1 with f 0 + f 1 x + � � � )( g 0 + g 1 x + � � � ). (

� � � � � � � � � � � � � � � � � Assume R -algebraic algorithms (without divs, branches): chains of u; v 7! u + v , binary R -adds u; v 7! u � v , binary R -subs u; v 7! uv , binary R -mults starting from inputs and constants. Example (1963 Karatsuba): f 0 � h 0 � � � � � f 1 � � � � � + � � � � � � � � � h 1 � � � � � � � � � � � � g 0 � � � � � � � + + � � � g 1 � � h 2 � � � � � �

Cost measure for this talk: total R -algebraic complexity. Cost 1 for binary R -add; cost 1 for binary R -sub; cost 1 for binary R -mult; cost 0 for constant in R . Many real-world computations use (e.g.) Pentium M’s floating-point operations to approximate operations in R . Properly scheduled operations achieve Pentium M cycles � total R -algebraic complexity.

Fast Fourier transforms � 2 C as exp(2 � i=n ). n Define n n T : C [ x ] = ( x � 1) , Define ։ C n � 1 f 7! f (1) ; f ( � ; : : : ; f ( � n ) n as ). T . Can very quickly compute First publication of fast algorithm: 1866 Gauss. Easy to see that Gauss’s FFT uses O ( n lg n ) arithmetic operations n 2 f 1 ; 2 ; 4 ; 8 ; : : : g . if Several subsequent reinventions, ending with 1965 Cooley Tukey.

Inverse map is also very fast. n is very fast. Multiplication in C 1966 Sande, 1966 Stockham: Can very quickly multiply n in C [ x ] = ( x � 1) or C [ x ] or R [ x ] n n . by mapping C [ x ] = ( x � 1) to C n f ; g 2 C [ x ] = ( x � 1): Given � 1 ( f g as T T ( f ) T ( g )). compute f ; g 2 C [ x ] with deg f g < n : Given f g from compute n its image in C [ x ] = ( x � 1). O ( n lg n ). R -algebraic complexity

A closer look at costs More precise analysis of Gauss FFT (and Cooley-Tukey FFT): n n using C [ x ] = ( x � 1) , ։ C (1 = 2) n lg n binary C -adds, (1 = 2) n lg n binary C -subs, (1 = 2) n lg n binary C -mults, n 2 f 1 ; 2 ; 4 ; 8 ; : : : g . if a; b ) 2 R 2 represents a + bi 2 C . ( ; 2 ; 6: C -add, C -sub, C -mult cost 2 a; b ) + ( ; d ) = ( a + ; b + d ), ( a; b ) � ( ; d ) = ( a � ; b � d ), ( a; b )( ; d ) = ( a � bd; ad + b ). (

Total cost 5 n lg n . Easily save some time: eliminate mults by 1; � 1 ; i; � i absorb mults by p into subsequent operations; � � i p p simplify mults by a; b )(1 = 2 ; 1 = p p using, e.g., ( 2) = a � b ) = 2 ; ( a + b ) = (( 2). Cost 5 n lg n � 10 n + 16 n n , to map C [ x ] = ( x � 1) , ։ C n 2 f 4 ; 8 ; 16 ; 32 ; : : : g . if

What about cost of convolution? 5 n lg n + O ( n ) to compute T ( f ), 5 n lg n + O ( n ) to compute T ( g ), n , O ( n ) to multiply in C � 1 . n lg n + O ( n ) for T similar 5 Total cost 15 n lg n + O ( n ) n f g 2 C [ x ] = ( x � 1) to compute n f ; g 2 C [ x ] = ( x � 1). given Total cost (15 = 2) n lg n + O ( n ) n f g 2 R [ x ] = ( x � 1) to compute n f ; g 2 R [ x ] = ( x � 1): map given n n= 2 � 1 R [ x ] = ( x � 1) , � C ։ R 2 (Gauss) to save half the time.

1968 R. Yavne: Can do better! Cost 4 n lg n � 6 n + 8 n n , to map C [ x ] = ( x � 1) , ։ C n 2 f 2 ; 4 ; 8 ; 16 ; : : : g . if

1968 R. Yavne: Can do better! Cost 4 n lg n � 6 n + 8 n n , to map C [ x ] = ( x � 1) , ։ C n 2 f 2 ; 4 ; 8 ; 16 ; : : : g . if 2004 James Van Buskirk: Can do better! Cost (34 = 9) n lg n + O ( n ). Expositions of the new algorithm: Frigo, Johnson, in IEEE Trans. Signal Processing ; Lundy, Van Buskirk, in Computing ; Bernstein, this talk, expanding an old idea of Fiduccia.

Van Buskirk, comp.arch , January 2005: “Have you ever considered changing djbfft to get better opcounts along the lines of home.comcast.net/~kmbtib ?”

Van Buskirk, comp.arch , January 2005: “Have you ever considered changing djbfft to get better opcounts along the lines of home.comcast.net/~kmbtib ?” Bernstein, comp.arch : “What do you mean, ‘better opcounts’? The k : : : of a size-2 algebraic complexity complex DFT has stood at (3 k � k +4 additions and ( k +4 k � 3)2 3)2 multiplications since 1968.”

Van Buskirk, comp.arch , January 2005: “Have you ever considered changing djbfft to get better opcounts along the lines of home.comcast.net/~kmbtib ?” Bernstein, comp.arch : “What do you mean, ‘better opcounts’? The k : : : of a size-2 algebraic complexity complex DFT has stood at (3 k � k +4 additions and ( k +4 k � 3)2 3)2 multiplications since 1968.” Van Buskirk, comp.arch : “Oh, you’re so 20th century, Dan. Read the handwriting on the wall.”

Understanding the FFT n The FFT trick: C [ x ] = ( x � 1) , n= 2 n= 2 + 1) ։ C [ x ] = ( x � 1) � C [ x ] = ( x by unique C [ x ]-algebra morphism. Cost 2 n : n= 2 C -adds, n= 2 C -subs. n = 4: C [ x ] = ( x 4 � 1) , e.g. ։ C [ x ] = ( x 2 � 1) � C [ x ] = ( x 2 + 1) g 0 + g 1 x + g 2 x 2 + g 3 x 3 7! by g 0 + g 2 ) + ( g 1 + g 3 ) x; ( g 0 � g 2 ) + ( g 1 � g 3 ) x . ( g 0 ; g 1 ; g 2 ; g 3 ) 7! Representation: ( g 0 + g 2 ) ; ( g 1 + g 3 )) ; (( g 0 � g 2 ) ; ( g 1 � g 3 )). ((

n= 2 Recurse: C [ x ] = ( x � 1) , n= 4 n= 4 +1); ։ C [ x ] = ( x � 1) � C [ x ] = ( x n= 2 + 1) similarly C [ x ] = ( x , n= 4 n= 4 + ։ C [ x ] = ( x � i ) � C [ x ] = ( x i ); continue to C [ x ] = ( x � 1) � � � � . n General case: C [ x ] = ( x � � 2 ) , n= 2 n= 2 + ։ C [ x ] = ( x � � ) � C [ x ] = ( x � ) n= 2 + g 0 + � � � + g x � � � 7! n= 2 by g 0 + �g g 1 + � � � � ) x + � � � , n= 2 ) + ( ( g 0 � �g g 1 � � � � � ) x + � � � . n= 2 ) + ( ( Cost 5 n : n= 2 C -mults, n= 2 C -adds, n= 2 C -subs. Recurse, eliminate easy mults. Cost 5 n lg n � 10 n + 16.

Alternative: the twisted FFT n After previous C [ x ] = ( x � 1) , n= 2 n= 2 +1), ։ C [ x ] = ( x � 1) � C [ x ] = ( x apply unique C -algebra morphism n= 2 +1) n= 2 C [ x ] = ( x , ։ C [ y ] = ( y � 1) x to � y . n that maps n= 2 + g 0 + g 1 x + � � � + g x � � � 7! n= 2 g 0 + g g 1 + g x + � � � , n= 2 )+( n= 2+1 ) ( g 0 � g � g 1 � g y + � � � . n ( n= 2 )+ n= 2+1 ) ( n : n= 2 C -mults, Again cost 5 n= 2 C -adds, n= 2 C -subs. Eliminate easy mults, recurse. Cost 5 n lg n � 10 n + 16.

The split-radix FFT FFT and twisted FFT end up with � n , same number of mults by � n= 2 , same number of mults by � n= 4 , same number of mults by etc. Is this necessary? No! Split-radix FFT: more easy mults. “Don’t twist until you see i ’s.” the whites of their (Can use same idea to speed up Sch¨ onhage-Strassen algorithm for integer multiplication.)

n Cost 2 n : C [ x ] = ( x � 1) , n= 2 n= 2 +1). ։ C [ x ] = ( x � 1) � C [ x ] = ( x n= 2 + 1) n : C [ x ] = ( x , Cost n= 4 n= 4 + ։ C [ x ] = ( x � i ) � C [ x ] = ( x i ). n= 4 Cost 6( n= 4): C [ x ] = ( x � i ) , n= 4 ։ C [ y ] = ( y � 1) by x 7! � y . n n= 4 + Cost 6( n= 4): C [ x ] = ( x i ) , n= 4 � 1 ։ C [ y ] = ( y � 1) by x 7! � y . n Overall cost 6 n to split into 1 = 2 ; 1 = 4 ; 1 = 4, entropy 1 : 5 bits. Eliminate easy mults, recurse. Cost 4 n lg n � 6 n + 8, exactly as in 1968 Yavne.

The tangent FFT Several ways to achieve i� . e cost 6 for mult by i� e One approach: Factor i tan � ) cos � . as (1 + � . Cost 2 for mult by cos i tan � . Cost 4 for mult by 1 + For stability and symmetry, fj cos � j ; j sin � jg use max � . instead of cos Surprise (Van Buskirk): Can merge some cost-2 mults!

The tangent FFT D. J. Bernstein University of Illinois at Chicago - PDF document

The tangent FFT D. J. Bernstein University of Illinois at Chicago Advertisement SPEED: Software Performance Enhancement for Encryption and Decryption A workshop on software speeds for secret-key cryptography and public-key

The tangent FFT D. J. Bernstein University of Illinois at Chicago See online version of paper,

Connections in Tangent Categories Robin Cockett University of Calgary (joint work with Geoff

Weil spaces and closed tangent structure June 2, 2018 1 / 30 Overview W 1 -actegories

Differential structure, tangent structure, and SDG Geoff Cruttwell (joint work with Robin

The Fast Fourier Transform - FFT Sound Design and Interactive Music - FFT Learning Objectives

FFT Application Examples and Implementation FFT Example 1: Signal Sparsity in time Frequency

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

2DECOMP&FFT A Highly Scalable 2D Decomposition Library and FFT Interface Ning Li and

FFT analysis of DNA sequences Harvey Lab Group Meeting March 1, 2004 Russell Hanson 2 Nave

Crypto horror stories Daniel J. Bernstein University of Illinois at Chicago & Technische

Connections in tangent categories Geoff Cruttwell Mount Allison University (joint work with

Differential equations in tangent categories Geoff Cruttwell Mount Allison University (joint

Affine objects in a tangent category Geoff Cruttwell Mount Allison University (joint work with

Integration in Tangent Categories JS Lemay Work with Robin Cockett and Geoff Cruttwell

Tangent categories are locally Cartesian differential categories J.R.B. Cockett Department of

Case Study in 3D FFT Ahmed Sanaullah Martin Herbordt Vipin Sachdeva Boston University Silicon

Trig functions and x and y In this presentation we describe the graphs of each of the six trig

Mohr Circles F 3 F 2 T N V F 4 F 1 H F 5 N Area = sin H T Area = cos

MATH 12002 - CALCULUS I 2.3 & 2.4: Derivatives of Trigonometric Functions Professor

JUST THE MATHS SLIDES NUMBER 3.5 TRIGONOMETRY 5 (Trigonometric identities &

The Co-evolution of Speech and the Lexicon: The Interaction of Functional Pressures, Redundancy,

Special values of trigonometric Dirichlet series Legacy of Ramanujan OPSFA-13, NIST Armin

G N I W O R G A G N I D A E R N E D R A G Once upon a time, there was an

Delimiting Adverbial Meanings A corpus-based comparative study on Czech spatial prepositions and