the fft via matrix factorizations

The FFT Via Matrix Factorizations A Key to Designing High - PowerPoint PPT Presentation

The FFT Via Matrix Factorizations A Key to Designing High Performance Implementations Charles Van Loan Department of Computer Science Cornell University A High Level Perspective... Blocking For Performance A 11 A 12 A 1 q


  1. The FFT Via Matrix Factorizations A Key to Designing High Performance Implementations Charles Van Loan Department of Computer Science Cornell University

  2. A High Level Perspective...

  3. Blocking For Performance   A 11 A 12 · · · A 1 q } n 1   A 21 A 22 · · · A 2 q } n 2   A = . . . ...  . . .  . . .   A p 1 A p 2 · · · A pq } n q ���� ���� ���� n 1 n 2 n q A well known strategy for high-performance Ax = b and Ax = λx solvers.

  4. Factoring for Performance One way to execute a matrix-vector product y = F n x when F n = A t · · · A 2 A 1 is as follows: y = x for k = 1: t y = A k x end ˜ t · · · ˜ A different factorization F n = A ˜ A 1 would yield a different algorithm.

  5. The Discrete Fourier Transform ( n = 8 )   ω 0 ω 0 ω 0 ω 0 ω 0 ω 0 ω 0 ω 0 8 8 8 8 8 8 8 8   ω 0 ω 1 ω 2 ω 3 ω 4 ω 5 ω 6 ω 7   8 8 8 8 8 8 8 8     ω 0 ω 2 ω 4 ω 6 ω 8 ω 10 ω 12 ω 14   8 8 8 8 8 8 8 8     ω 0 ω 3 ω 6 ω 9 ω 12 ω 15 ω 18 ω 21   8 8 8 8 8 8 8 8   y = F 8 x = x   ω 0 ω 4 ω 8 ω 12 ω 16 ω 20 ω 24 ω 28   8 8 8 8 8 8 8 8     ω 0 ω 5 ω 10 ω 15 ω 20 ω 25 ω 30 ω 35   8 8 8 8 8 8 8 8     ω 0 ω 6 ω 12 ω 18 ω 24 ω 30 ω 36 ω 42   8 8 8 8 8 8 8 8   ω 0 ω 7 ω 14 ω 21 ω 28 ω 35 ω 42 ω 49 8 8 8 8 8 8 8 8 ω 8 = cos(2 π/ 8) − i · sin(2 π/ 8)

  6. The DFT Matrix In General... If ω n = cos(2 π/n ) − i · sin(2 π/n ) then [ F n ] pq = ω pq n = (cos(2 π/n ) − i · sin(2 π/n )) pq = cos(2 pqπ/n ) − i · sin(2 pqπ/n ) Fact: F H n F n = nI n Thus, F n / √ n is unitary.

  7. Data Sparse Matrices An n -by- n matrix A is data sparse if it can be represented with many fewer than n 2 numbers. Example 1. A has lots of zeros. (“Traditional Sparse”) Example 2. A is Toeplitz...   a b c d e a b c   A =   f e a b   g f e a

  8. More Examples of Data Sparse Matrices A is a Kronecker Product B ⊗ C , e.g., � � b 11 C b 12 C A = b 21 C b 22 C If B ∈ IR m 1 × m 1 and C ∈ IR m 2 × m 2 then A = B ⊗ C has m 2 1 m 2 2 entries but is parameterized by just m 2 1 + m 2 2 numbers.

  9. Extreme Data Sparsity n n n n � � � � S ( i, j, k, ℓ ) · (2-by-2) ⊗ · · · ⊗ (2-by-2) A = � �� � i =1 j =1 k =1 ℓ =1 d times A is 2 d -by-2 d but is parameterized by O ( dn 4 ) numbers.

  10. Factorization of F n The DFT matrix can be factored into a short product of sparse matrices, e.g., F 1024 = A 10 · · · A 2 A 1 P 1024 where each A -matrix has 2 nonzeros per row and P 1024 is a per- mutation.

  11. From Factorization to Algorithm If n = 2 10 and F n = A 10 · · · A 2 A 1 P n then y = P n x for k = 1:10 y = A k x ← 2 n flops. end computes y = F n x and requires O ( n log n ) flops.

  12. Recursive Block Structure F 8 (: , [ 0 2 4 6 1 3 5 7 ]) =   1 0 0 0 1 0 0 0 0 1 0 0 0 ω 8 0 0     ω 2   0 0 1 0 0 0 0 8 � F 4 0   � ω 3   0 0 0 1 0 0 0 8     1 0 0 0 − 1 0 0 0 0 F 4     0 1 0 0 0 − ω 8 0 0   − ω 2   0 0 1 0 0 0 0   8 − ω 3 0 0 0 1 0 0 0 8 F n/ 2 “shows up” when you permute the columns of F n so that the odd-indexed columns come first.

  13. Recursion... We build an 8-point DFT from two 4-point DFTs...   1 0 0 0 1 0 0 0  0 1 0 0 0 ω 8 0 0    ω 2   0 0 1 0 0 0 0 8   � � F 4 x (0:2:7) ω 3   0 0 0 1 0 0 0  8  F 8 x =   1 0 0 0 − 1 0 0 0 F 4 x (1:2:7)     0 1 0 0 0 − ω 8 0 0   − ω 2   0 0 1 0 0 0 0   8 − ω 3 0 0 0 1 0 0 0 8

  14. Radix-2 FFT: Recursive Implementation function y = fft ( x, n ) if n = 1 y = x else m = n/ 2; ω = exp( − 2 πi/n ) Ω = diag(1 , ω, . . . , ω m − 1 ) z T = fft ( x (0:2: n − 1) , m ) z B = Ω · fft ( x (1:2: n − 1) , m ) � � � � I m I m z T y = Overall: 5 n log n flops. I m − I m z B end

  15. The Divide-and-Conquer Picture (0:1:15) ✟ ❍❍❍❍❍❍❍❍ ✟ ✟ ✟ ✟ ✟ ✟ ✟ (0:2:15) (1:2:15) ✑ ◗◗◗◗ ✑ ◗◗◗◗ ✑ ✑ ✑ ✑ ✑ ✑ (0:4:15) (2:4:15) (1:4:15) (3:4:15) � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ (0:8:15) (4:8:15) (2:8:15) (6:8:15) (1:8:15) (5:8:15) (3:8:15) (7:8:15) ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ [0] [8] [4] [12] [2] [10] [6] [14] [1] [9] [5] [13] [3] [11] [7] [15]

  16. Towards a Nonrecursive Implementation The Radix-2 Factorization... If n = 2 m and Ω m = diag(1 , ω n , . . . , ω m − 1 ) , n then � � � � F m Ω m F m I m Ω m ( I 2 ⊗ F m ) . F n Π n = = F m − Ω m F m I m − Ω m where Π n = I n (: , [0:2: n 1:2: n ]). � � F m 0 I 2 ⊗ F m = Note: . 0 F m

  17. The Cooley-Tukey Factorization n = 2 t F n = A t · · · A 1 P n P n = the n -by- n “bit reversal ” permutation matrix � � I L/ 2 Ω L/ 2 L = 2 q , r = n/L A q = I r ⊗ I L/ 2 − Ω L/ 2 Ω L/ 2 = diag(1 , ω L , . . . , ω L/ 2 − 1 ) ω L = exp( − 2 πi/L ) L

  18. The Bit Reversal Permutation (0:1:15) ✟ ❍❍❍❍❍❍❍❍ ✟ ✟ ✟ ✟ ✟ ✟ ✟ (0:2:15) (1:2:15) ✑ ◗◗◗◗ ✑ ◗◗◗◗ ✑ ✑ ✑ ✑ ✑ ✑ (0:4:15) (2:4:15) (1:4:15) (3:4:15) � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ (0:8:15) (4:8:15) (2:8:15) (6:8:15) (1:8:15) (5:8:15) (3:8:15) (7:8:15) ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ [0] [8] [4] [12] [2] [10] [6] [14] [1] [9] [5] [13] [3] [11] [7] [15]

  19. Bit Reversal         x (0) x (0000) x (0000) x (0) x (1) x (0001) x (1000) x (8)                 x (2) x (0010) x (0100) x (4)                 x (3) x (0011) x (1100) x (12)                 x (4) x (0100) x (0010) x (2)                 x (5) x (0101) x (1010) x (10)                 x (6) x (0110) x (0110) x (6)                 x (7) x (0111) x (1110) x (14)         = → =         x (8) x (1000) x (0001) x (1)                 x (9) x (1001) x (1001) x (9)                 x (10) x (1010) x (0101) x (5)                 x (11) x (1011) x (1101) x (13)                 x (12) x (1100) x (0011) x (3)                 x (13) x (1101) x (1011) x (11)                 x (14) x (1110) x (0111) x (7)         x (15) x (1111) x (1111) x (15)

  20. Butterfly Operations This matrix is block diagonal... � � I L/ 2 Ω L/ 2 L = 2 q , r = n/L A q = I r ⊗ I L/ 2 − Ω L/ 2 r copies of things like this   1 × 1 ×     1 ×     1 ×     1 ×     1 ×     1 ×   1 ×

  21. At the Scalar Level... ❍❍❍ ✟ a a + ωb s s ✟ ✟ ω ✟✟✟ ❍ ❍ ❍ a − ωb b s s

Recommend


More recommend