the fft via matrix factorizations
play

The FFT Via Matrix Factorizations A Key to Designing High - PowerPoint PPT Presentation

The FFT Via Matrix Factorizations A Key to Designing High Performance Implementations Charles Van Loan Department of Computer Science Cornell University A High Level Perspective... Blocking For Performance A 11 A 12 A 1 q


  1. The FFT Via Matrix Factorizations A Key to Designing High Performance Implementations Charles Van Loan Department of Computer Science Cornell University

  2. A High Level Perspective...

  3. Blocking For Performance   A 11 A 12 · · · A 1 q } n 1   A 21 A 22 · · · A 2 q } n 2   A = . . . ...  . . .  . . .   A p 1 A p 2 · · · A pq } n q ���� ���� ���� n 1 n 2 n q A well known strategy for high-performance Ax = b and Ax = λx solvers.

  4. Factoring for Performance One way to execute a matrix-vector product y = F n x when F n = A t · · · A 2 A 1 is as follows: y = x for k = 1: t y = A k x end ˜ t · · · ˜ A different factorization F n = A ˜ A 1 would yield a different algorithm.

  5. The Discrete Fourier Transform ( n = 8 )   ω 0 ω 0 ω 0 ω 0 ω 0 ω 0 ω 0 ω 0 8 8 8 8 8 8 8 8   ω 0 ω 1 ω 2 ω 3 ω 4 ω 5 ω 6 ω 7   8 8 8 8 8 8 8 8     ω 0 ω 2 ω 4 ω 6 ω 8 ω 10 ω 12 ω 14   8 8 8 8 8 8 8 8     ω 0 ω 3 ω 6 ω 9 ω 12 ω 15 ω 18 ω 21   8 8 8 8 8 8 8 8   y = F 8 x = x   ω 0 ω 4 ω 8 ω 12 ω 16 ω 20 ω 24 ω 28   8 8 8 8 8 8 8 8     ω 0 ω 5 ω 10 ω 15 ω 20 ω 25 ω 30 ω 35   8 8 8 8 8 8 8 8     ω 0 ω 6 ω 12 ω 18 ω 24 ω 30 ω 36 ω 42   8 8 8 8 8 8 8 8   ω 0 ω 7 ω 14 ω 21 ω 28 ω 35 ω 42 ω 49 8 8 8 8 8 8 8 8 ω 8 = cos(2 π/ 8) − i · sin(2 π/ 8)

  6. The DFT Matrix In General... If ω n = cos(2 π/n ) − i · sin(2 π/n ) then [ F n ] pq = ω pq n = (cos(2 π/n ) − i · sin(2 π/n )) pq = cos(2 pqπ/n ) − i · sin(2 pqπ/n ) Fact: F H n F n = nI n Thus, F n / √ n is unitary.

  7. Data Sparse Matrices An n -by- n matrix A is data sparse if it can be represented with many fewer than n 2 numbers. Example 1. A has lots of zeros. (“Traditional Sparse”) Example 2. A is Toeplitz...   a b c d e a b c   A =   f e a b   g f e a

  8. More Examples of Data Sparse Matrices A is a Kronecker Product B ⊗ C , e.g., � � b 11 C b 12 C A = b 21 C b 22 C If B ∈ IR m 1 × m 1 and C ∈ IR m 2 × m 2 then A = B ⊗ C has m 2 1 m 2 2 entries but is parameterized by just m 2 1 + m 2 2 numbers.

  9. Extreme Data Sparsity n n n n � � � � S ( i, j, k, ℓ ) · (2-by-2) ⊗ · · · ⊗ (2-by-2) A = � �� � i =1 j =1 k =1 ℓ =1 d times A is 2 d -by-2 d but is parameterized by O ( dn 4 ) numbers.

  10. Factorization of F n The DFT matrix can be factored into a short product of sparse matrices, e.g., F 1024 = A 10 · · · A 2 A 1 P 1024 where each A -matrix has 2 nonzeros per row and P 1024 is a per- mutation.

  11. From Factorization to Algorithm If n = 2 10 and F n = A 10 · · · A 2 A 1 P n then y = P n x for k = 1:10 y = A k x ← 2 n flops. end computes y = F n x and requires O ( n log n ) flops.

  12. Recursive Block Structure F 8 (: , [ 0 2 4 6 1 3 5 7 ]) =   1 0 0 0 1 0 0 0 0 1 0 0 0 ω 8 0 0     ω 2   0 0 1 0 0 0 0 8 � F 4 0   � ω 3   0 0 0 1 0 0 0 8     1 0 0 0 − 1 0 0 0 0 F 4     0 1 0 0 0 − ω 8 0 0   − ω 2   0 0 1 0 0 0 0   8 − ω 3 0 0 0 1 0 0 0 8 F n/ 2 “shows up” when you permute the columns of F n so that the odd-indexed columns come first.

  13. Recursion... We build an 8-point DFT from two 4-point DFTs...   1 0 0 0 1 0 0 0  0 1 0 0 0 ω 8 0 0    ω 2   0 0 1 0 0 0 0 8   � � F 4 x (0:2:7) ω 3   0 0 0 1 0 0 0  8  F 8 x =   1 0 0 0 − 1 0 0 0 F 4 x (1:2:7)     0 1 0 0 0 − ω 8 0 0   − ω 2   0 0 1 0 0 0 0   8 − ω 3 0 0 0 1 0 0 0 8

  14. Radix-2 FFT: Recursive Implementation function y = fft ( x, n ) if n = 1 y = x else m = n/ 2; ω = exp( − 2 πi/n ) Ω = diag(1 , ω, . . . , ω m − 1 ) z T = fft ( x (0:2: n − 1) , m ) z B = Ω · fft ( x (1:2: n − 1) , m ) � � � � I m I m z T y = Overall: 5 n log n flops. I m − I m z B end

  15. The Divide-and-Conquer Picture (0:1:15) ✟ ❍❍❍❍❍❍❍❍ ✟ ✟ ✟ ✟ ✟ ✟ ✟ (0:2:15) (1:2:15) ✑ ◗◗◗◗ ✑ ◗◗◗◗ ✑ ✑ ✑ ✑ ✑ ✑ (0:4:15) (2:4:15) (1:4:15) (3:4:15) � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ (0:8:15) (4:8:15) (2:8:15) (6:8:15) (1:8:15) (5:8:15) (3:8:15) (7:8:15) ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ [0] [8] [4] [12] [2] [10] [6] [14] [1] [9] [5] [13] [3] [11] [7] [15]

  16. Towards a Nonrecursive Implementation The Radix-2 Factorization... If n = 2 m and Ω m = diag(1 , ω n , . . . , ω m − 1 ) , n then � � � � F m Ω m F m I m Ω m ( I 2 ⊗ F m ) . F n Π n = = F m − Ω m F m I m − Ω m where Π n = I n (: , [0:2: n 1:2: n ]). � � F m 0 I 2 ⊗ F m = Note: . 0 F m

  17. The Cooley-Tukey Factorization n = 2 t F n = A t · · · A 1 P n P n = the n -by- n “bit reversal ” permutation matrix � � I L/ 2 Ω L/ 2 L = 2 q , r = n/L A q = I r ⊗ I L/ 2 − Ω L/ 2 Ω L/ 2 = diag(1 , ω L , . . . , ω L/ 2 − 1 ) ω L = exp( − 2 πi/L ) L

  18. The Bit Reversal Permutation (0:1:15) ✟ ❍❍❍❍❍❍❍❍ ✟ ✟ ✟ ✟ ✟ ✟ ✟ (0:2:15) (1:2:15) ✑ ◗◗◗◗ ✑ ◗◗◗◗ ✑ ✑ ✑ ✑ ✑ ✑ (0:4:15) (2:4:15) (1:4:15) (3:4:15) � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ � ❅ (0:8:15) (4:8:15) (2:8:15) (6:8:15) (1:8:15) (5:8:15) (3:8:15) (7:8:15) ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ [0] [8] [4] [12] [2] [10] [6] [14] [1] [9] [5] [13] [3] [11] [7] [15]

  19. Bit Reversal         x (0) x (0000) x (0000) x (0) x (1) x (0001) x (1000) x (8)                 x (2) x (0010) x (0100) x (4)                 x (3) x (0011) x (1100) x (12)                 x (4) x (0100) x (0010) x (2)                 x (5) x (0101) x (1010) x (10)                 x (6) x (0110) x (0110) x (6)                 x (7) x (0111) x (1110) x (14)         = → =         x (8) x (1000) x (0001) x (1)                 x (9) x (1001) x (1001) x (9)                 x (10) x (1010) x (0101) x (5)                 x (11) x (1011) x (1101) x (13)                 x (12) x (1100) x (0011) x (3)                 x (13) x (1101) x (1011) x (11)                 x (14) x (1110) x (0111) x (7)         x (15) x (1111) x (1111) x (15)

  20. Butterfly Operations This matrix is block diagonal... � � I L/ 2 Ω L/ 2 L = 2 q , r = n/L A q = I r ⊗ I L/ 2 − Ω L/ 2 r copies of things like this   1 × 1 ×     1 ×     1 ×     1 ×     1 ×     1 ×   1 ×

  21. At the Scalar Level... ❍❍❍ ✟ a a + ωb s s ✟ ✟ ω ✟✟✟ ❍ ❍ ❍ a − ωb b s s

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend