fast fourier transform
play

Fast Fourier transform Prof. Richard Vuduc Georgia Institute of - PowerPoint PPT Presentation

Fast Fourier transform Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA, Spring 2008 [L.11] Tuesday, February 12, 2008 1 Sources for todays material CS 267 (Yelick & Demmel, UCB) Applied Numerical Linear Algebra , by


  1. Fast Fourier transform Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA, Spring 2008 [L.11] Tuesday, February 12, 2008 1

  2. Sources for today’s material CS 267 (Yelick & Demmel, UCB) Applied Numerical Linear Algebra , by Demmel Xiaoye (Sherry) Li (LBNL) Van Loan (Cornell) “The ubiquitous Kronecker product” Computational frameworks for the FFT Heath (UIUC) 2

  3. Review: Sparse direct solvers 3

  4. Anatomy of a sparse direct solver P r · A · P T c = LU Order equations & variables to minimize fill in L, U [Combinatorial] Symbolic factorization [Allocate memory for factorization] Numerical factorization [Dominates run-time] Triangular solves [Small fraction, unless many RHS] 4

  5. Sparse LU software See also: http://www.netlib.org/utk/people/JackDongarra/la-sw.html LU SuperLU [Xiaoye “Sherry” Li @ LBNL] MUMPS [Amestoy, et al ., INPT-ENSEEIHT-IRIT, France] UMFPACK [Davis @ U. Florida] Cholesky PSPASES / WSMP [Gupta @ IBM] TAUCS [Toledo @ Tel Aviv U.] Oblio [Dobrian, formerly @ Columbia / ODU] Also: MUMPS, UMFPACK 5

  6. Source: Gould, Yu, Scott (2005); http://www.numerical.rl.ac.uk/reports/reports.shtml 6

  7. SuperLU MUMPS (LU) Source: Amestoy, Duff, L’Excellent, & Li (2001). http://mumps.enseeiht.fr/doc.html 7

  8. Source: Amestoy, Duff, L’Excellent, & Li (2001). http://mumps.enseeiht.fr/doc.html 8

  9. Source: Amestoy, Duff, L’Excellent, & Li (2001). http://mumps.enseeiht.fr/doc.html 9

  10. Algorithms for 2-D (3-D) Poisson, N=n 2 (=n 3 ) Algorithm Serial PRAM Memory # procs Dense LU N 3 N N 2 N 2 Band LU N 2 (N 7/3 ) N N 3/2 (N 5/3 ) N (N 4/3 ) Jacobi N 2 (N 5/3 ) N (N 2/3 ) N N Explicit inverse N 2 log N N 2 N 2 Sparse LU N 3/2 (N 2 ) N 1/2 N log N (N 4/3 ) N Conj. grad. N 3/2 (N 4/3 ) N 1/2(1/3) log N N N RB SOR N 3/2 (N 4/3 ) N 1/2 (N 1/3 ) N N FFT N log N log N N N Multigrid N log 2 N N N N log N N Lower bound PRAM = idealized parallel model with zero communication cost. Source: Demmel (1997) 10

  11. Review: Poisson’s equation 11

  12. Discretizing 1-D Poisson − d 2 u ( x ) = f ( x ) , 0 < x < 1 , u (0) = u (1) = 0 dx 2 1 h = n + 1 i = 0 n + 1 Discretize: u i = u ( i · h ) Approximate: − d 2 u ( x ) | x = x i ≈ 2 u i − u i − 1 − u i +1 dx 2 h 2 “Stencil”: -1 -1 2 12

  13. Express stencil in matrix notation ≈ − u i − 1 + 2 u i − u i +1 f i � f ( ih ) Approximation: = h 2     u 1 f 1   2 − 1 u 2 f 2 − 1 2 − 1           u 3 f 3   h 2 − 1 2 − 1 =       · . .       . .     . . · · ·       − 1 2 u n f n ⇓ h 2 f = T n · u 13

  14. Discretizing 2-D Poisson Graph and stencil 2 4 5 6 8   4 − 1 − 1 − 1 4 − 1 − 1 1 4 7     − 1 4 − 1     − 1 4 − 1 − 1   2 5 8   − 1 − 1 4 − 1 − 1     − 1 − 1 4 − 1   3 6 9   − 1 4 − 1     − 1 − 1 4 − 1   − 1 − 1 4 14

  15. Serial time complexity for RB SOR, CG, and sparse LU In 2-D: O ( n 3 ) O ( n 6 ) Na¨ ıve (dense) : O ( n 2 ) Optimal : 15

  16. Exploiting structure to obtain fast algorithms for 2-D Poisson Dense LU : Assume no structure ⇒ O( n 6 ) Sparse LU : Sparsity ⇒ O( n 3 ), need extra memory CG : Symmetric positive definite ⇒ O( n 3 ), a little extra memory RB SOR : Fixed sparsity pattern ⇒ O( n 3 ), no extra memory Today — FFT : Exploit numerical structure ⇒ O( n 2 log n ) 16

  17. Numerical properties of the Poisson stencil 17

  18. Express stencil in matrix notation ≈ − u i − 1 + 2 u i − u i +1 Approximation: f i � f ( ih ) = h 2     u 1 f 1   2 − 1 u 2 f 2 − 1 2 − 1           u 3 f 3   h 2 − 1 2 − 1 =       · . .       . .     . . · · ·       − 1 2 u n f n ⇓ h 2 f = T n · u 18

  19. 1-D Poisson: Eigendecomposition of T n Lemma : Eigenvalues and eigenvectors of T n are Z Λ Z T = T n T n z j λ j z j ⇒ = = ZZ T I = � � π j = 2 1 − cos λ j n + 1 � 2 n + 1 sin π jk z j ( k ) = n + 1 Exercise : Verify . Hint: sin a + sin b = 2 sin a + b cos a − b 2 2 19

  20. Eigendecomposition for discretized 2-D Poisson stencil matrix? Graph and stencil 2 4 5 6 8   4 − 1 − 1 − 1 4 − 1 − 1 1 4 7     − 1 4 − 1     − 1 4 − 1 − 1   2 5 8   − 1 − 1 4 − 1 − 1 T n × n =     − 1 − 1 4 − 1   3 6 9   − 1 4 − 1     − 1 − 1 4 − 1   − 1 − 1 4 20

  21. 2-D Poisson − ∂ 2 u ( x, y ) − ∂ 2 u ( x, y ) f ( x, y ) = ∂ x 2 ∂ y 2 ⇓ h 2 f ij = 4 u ij − u i +1 ,j − u i − 1 ,j − u i,j +1 − u i,j − 1 21

  22. 2-D Poisson − ∂ 2 u ( x, y ) − ∂ 2 u ( x, y ) f ( x, y ) = ∂ x 2 ∂ y 2 ⇓ h 2 f ij = 4 u ij − u i +1 ,j − u i − 1 ,j − u i,j +1 − u i,j − 1 ⇓ = ( f ij ) F = ( u ij ) U h 2 F = T n · U + U · T n 22

  23. 2-D Poisson: Eigendecomposition h 2 F = T n · U + U · T n ⇓ z i z T X ≡ j � z i z T � � z i z T � T n · X + X · T n = + T n · · T n j j λ i z i z T z i z T � � = j + λ j j = ( λ i + λ j ) · X ⇓ z i z T = “eigenvector” ∴ j λ i + λ j = eigenvalue 23

  24. Relating T n and T nxn : “vec(X)” and Kronecker products Let vec(X) = stacks columns of matrix X into a single vector � � x 1 · · · x n X ≡   x 1 . . vec( X )   ≡ .   x n Kronecker product   a 1 , 1 B a 1 ,n B · · · A ⊗ B ≡  . .  . .   . . a m, 1 B a m,n B · · · 24

  25. Facts about “vec( X )” and “ ⊗ ” vec( A + B ) = vec( A ) + vec( B ) vec( A · X ) = ( I ⊗ A ) · vec( X ) vec( X · B ) = ( B ⊗ I ) · vec( X ) A ⊗ ( B ⊗ C ) = ( A ⊗ B ) ⊗ C ( A · C ) ⊗ ( B · D ) = ( A ⊗ B ) · ( C ⊗ D ) A T ⊗ B T ( A ⊗ B ) T = A − 1 ⊗ B − 1 ( A ⊗ B ) − 1 = Note : In MATLAB, kron(A,B) computes A ⊗ B 25

  26. Relating T n and T nxn h 2 F = T n · U + U · T n ⇓ h 2 vec( F ) = (( I ⊗ T n ) + ( T n ⊗ I )) · vec( U ) ⇓ ( I ⊗ T n ) + ( T n ⊗ I ) T n × n ≡ ( Z ⊗ Z ) · ( I ⊗ Λ + Λ ⊗ I ) · ( Z ⊗ Z ) T = 26

  27. Extending to higher dimensions Z Λ Z T = T n ⇓ = ( I ⊗ T n ) + ( T n ⊗ I ) T n × n = ( Z ⊗ Z ) · ( I ⊗ Λ + Λ ⊗ I ) ⇓ = ( I ⊗ I ⊗ T n ) + ( I ⊗ T n ⊗ I ) + ( T n ⊗ I ⊗ I ) T n × n × n ( Z ⊗ Z ⊗ Z ) · ( I ⊗ I ⊗ Λ + ... ) · ( Z ⊗ Z ⊗ Z ) T = 27

  28. Using the eigendecomposition to solve 1-D Poisson T − 1 · h 2 f = u n ( Z Λ Z T ) − 1 h 2 f = h 2 Z Λ − 1 Z T f = Need a fast Z T *f , Z*x 28

  29. Using the eigendecomposition to solve 2-D Poisson h 2 · ( Z ⊗ Z )( I ⊗ Λ + Λ ⊗ I ) − 1 ( Z ⊗ Z ) T · vec( F ) vec( U ) = ⇓ F ′ = Z T FZ 1 . h 2 f ′ jk u ′ 2 . jk = ∀ j, k λ j + λ k U = ZU ′ Z T 3 . Need a fast Z T *f , Z*x 29

  30. Administrivia 30

  31. Two joint classes with CS 8803 SC Tues 2/19 : Floating-point issues in parallel computing by me Tues 2/26 : GPGPUs by Prof. Hyesoon Kim Both classes meet in Klaus 1116E 31

  32. Administrative stuff Front-end login node: ccil.cc.gatech.edu (CoC Unix account) We “own” warp43—warp56 Some docs ( MPI ): http://www-static.cc.gatech.edu/projects/ihpcl/mpi.html Sign-up for mailing list: https://mailman.cc.gatech.edu/mailman/listinfo/ihpc-lab 32

  33. Homework 1: Parallel conjugate gradients Implement a parallel solver for Ax = b (serial C version provided) Evaluate on three matrices: 27-pt stencil, and two application matrices “Simplified:” No preconditioning Bonus : Reorder, precondition Performance models to understand scalability of your implementation Make measurements Build predictive models Collaboration encouraged: Compare programming models or platforms 33

  34. Fast Fourier transform 34

  35. Discrete Fourier transform (DFT) Recall “Z” � n + 1 sin π ( j + 1)( k + 1) 2 Note: = z jk j, k 0-based n + 1 Multiplying by “Z” is almost the (m-point) DFT: Φ ( m ) x y = Φ ( m ) ω jk � � ≡ 0 ≤ j,k<m = cos 2 π m − i · sin 2 π √ − 2 π i = e i = − 1 m ω m 35

  36. DFT and its inverse ω jk � � Φ ≡ 0 ≤ j,k<m Φ · x = y ⇓ m − 1 � ω jk x k = y j k =0 ⇓ 1 Φ − 1 m Φ ∗ = m − 1 1 � ω − jk y j = x k m j =0 36

  37. DFT as polynomial evaluation Assume m = power of 2 m − 1 m − 1 ω j � k � � ω jk x k = � = y j x k · k =0 k =0 X ( ω j ) ≡ m − 1 � x k · w k X ( w ) ≡ k =0 x 0 + x 2 w 2 + x 4 w 4 + · · · ⇐ Even terms = x 1 + x 3 w 2 + x 5 w 4 + · · · � � ⇐ Odd terms + w · X even( w 2 ) + w · X odd( w 2 ) = 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend