my parallel electromagnetic solver
play

My Parallel Electromagnetic Solver A Star-Maxwell-P FDTD Propagator - PowerPoint PPT Presentation

My Parallel Electromagnetic Solver A Star-Maxwell-P FDTD Propagator 18.337 Parallel Computing Alejandro W. Rodriguez Outline Overview of nanophotonics Statement of the problem Parallelization: a data-parallel approach Minimizing temporal and


  1. My Parallel Electromagnetic Solver A Star-Maxwell-P FDTD Propagator 18.337 Parallel Computing Alejandro W. Rodriguez

  2. Outline Overview of nanophotonics Statement of the problem Parallelization: a data-parallel approach Minimizing temporal and memory scalings Future work

  3. “New-School” Electromagnetism Beyond the geometric-optics limit Omniguides and Fiber optical guiding Design of frequency-selective structures [ P. Vukosic et al. , Proc. Roy. Soc: Bio. 3µm Sci. 266 , 1403 (1999) ] [ S. Y. Lin et al. , Nature 394 , 251 (1998) ] optical “insulators” [ P. Vukosic et al. , Proc. Roy. Soc: Bio. Sci. 266 , 1403 (1999) ] [ A. Rodriguez et al. , Opt. Lett. 30 , (2005) ]

  4. Nanophotonics — l -scale geometries (wavelength) periodic geometries Quasi-periodic geometries a Coherent scattering! Any 1d-periodic (layered) Structure has a band gap [ Notomi, 2004 ] [ Lederman, 2006 ] [ A. Rodriguez et. al. Phys. Rev. B, 77, [J. Zi et al , Proc. Nat. Acad. Sci . USA , 104201, (2008) ] 100 , 12576 (2003) ] [figs: Blau, Physics Today 57 , 18 (2004)]

  5. A Light-Speed Introduction to FDTD Maxwell’s Equations 2D Yee Grid r r D = ∂ r E = - ∂ r r H B — ¥ — ¥ ∂ t + 4 p J ∂ t r r r r D = e B = m 0 E H EM between produce D y D x r r ( x i , y j , t j ) = n E E i , j r ( x i , y j , t n ) = B i + 1/ 2, j + 1/ 2 n B

  6. A Light-Speed Introduction to FDTD 2D Maxwell’s Equations continuum discrete ( ) + D t n - 1 n - 1 n - 1 = B - E n B D y E ∂ t = ∂ E z ∂ B y ∂ B x ∂ t = - ∂ E z x ,( i + , j + ) x ,( i + , j + ) z ,( i , j + 1) z ,( i , j - 1) ∂ y ( ) ∂ x - D t = B n - 1 n - 1 - E n - 1 n B D x E y ,( i + , j + ) y ,( i + , j + ) z ,( i + 1, j ) z ,( i - 1, j ) r - 1 r H = m 0 = m 0 - 1 B B n n H ( i + , j + ) ( i + , j + ) ( ) n - 1 + D t = D n - 1 - H n - 1 n D D x H ∂ t = ∂ H y ∂ D z Ê ∂ x - ∂ H x ˆ y ,( i + , j ) y ,( i - , j ) z ,( i , j ) z ,( i , j ) Á ˜ ( ) + 4 p J z ,( i , j ) - D t Ë ∂ y ¯ n - 1 - H n - 1 n - 1 D y H x ,( i , j + ) x ,( i , j - ) - 1 D = e ( i , j ) E z = e - 1 D z n n E ( i , j ) ( i , j )

  7. So…why is this a hard problem? Maxwell’s Equations Temporal complexity r r D = ∂ r E = ∂ r r H B • 2 complex (discrete) 3D fields — ¥ — ¥ ∂ t + 4 p J ∂ t ~ 100 flops / pixel / step r r r r D = e B = m 0 E H 3 • 3D size ~ 20 x 18 x 18 a • resolution (pixels / a) ~ 25 8 ~ 10 pixels • 100-400 time steps rod layer 10 12 ~ flops!! hole layer Memory ~ 20 GB FCC crystal (solved 1995) Now, let’s create our own parallel code!

  8. Parallelization Schemes Optimizing complexity 1D 2D # op. counts ~ a res d + b res d - 1 n p n p task ~ volume comm. ~ area

  9. Parallelization Schemes optimizing complexity 1D 2D Most common scenario n >> n p fi 2D Wins! Ê ˆ Ê ˆ ~ O n 2 n 2 + nn p + 4 n n p Á ˜ ~ O Á ˜ Ë ¯ Ë ¯ n p n p

  10. Star-P Implementation Power of vectorization Simple 1D example B , E Œ¬ ( N ¥ N ) consider E N + 1, j = E 0, j = 0 ( ) B i , j = B i , j + a E i , j + 1 - E i , j Looped Vectorized for k=2:(N-1) B(:,2:end-1)=B(:,2:end-1) B(:,k)=B(:,k)+ a ( E(:,k+1)-E(:,k) ) + a ( E(:,3:end-1)-E(:,1:end-2) ) end

  11. Star-P Implementation Power of vectorization (1D parallelization) Communication ( ) over second index B i , j = B i , j + a E i , j + 1 - E i , j looped vectorized Communication cost too great?

  12. Star-P Implementation 1D parallelization ( ) B i , j = B i , j + a E i , j + 1 - E i , j E(N,N*p) E(N*p,N) Direction of costly operation Parallelizing over Ê ˆ Ê ˆ direction perpendicular n 2 n 2 + nn p + 1 ~ O to operation means ~ O Á ˜ Á ˜ Ë ¯ Ë ¯ n p n p constant comm. cost!

  13. 2D Maxwell Equations Back to our problem ( ) + D t = B n - 1 n - 1 - E n - 1 n (1) B D y E x ,( i + , j + ) x ,( i + , j + ) z ,( i , j + 1) z ,( i , j - 1) Our problems mixes ( ) - D t direction of n - 1 n - 1 n - 1 = B - E n (2) B D x E cost-operations y ,( i + , j + ) y ,( i + , j + ) z ,( i + 1, j ) z ,( i - 1, j ) 1D parallelization will be = m 0 - 1 B n n H susceptible to ( i + , j + ) ( i + , j + ) communication costs due to either (1) or (2) ( ) n - 1 + D t = D n - 1 - H n - 1 and certainly due to (3) n D D x H y ,( i + , j ) y ,( i - , j ) z ,( i , j ) z ,( i , j ) (3) ( ) + 4 p J z ,( i , j ) - D t n - 1 - H n - 1 n - 1 D y H x ,( i , j + ) x ,( i , j - ) - 1 D = e ( i , j ) n n E ( i , j ) ( i , j ) Let’s try 2D parallelization!

  14. 2D Maxwell Equations 2D parallelization? Expected from previous results Regime of interest

  15. 2D Maxwell Equations A solution: hybridization! Hybrid parallelization scheme ( ) + D t n Œ N * p ¥ N n Œ N * p ¥ N ( ) ~ E z , x ( ) = B n - 1 n - 1 - E n - 1 n B D y E B x x ,( i + , j + ) x ,( i + , j + ) z ,( i , j + 1) z ,( i , j - 1) ( ) n Œ N ¥ N * p n Œ N ¥ N * p - D t ( ) ~ E z , y ( ) n - 1 n - 1 n - 1 = B - E B y n B D x E y ,( i + , j + ) y ,( i + , j + ) z ,( i + 1, j ) z ,( i - 1, j ) = m 0 - 1 B Auxiliary fields n n H ( i + , j + ) ( i + , j + ) ( ) n Œ N * p ¥ N * p n Œ N ¥ N * p n - 1 + D t ( ) ~ B y ( ) = D n - 1 - H n - 1 D z n D D x H y ,( i + , j ) y ,( i - , j ) z ,( i , j ) z ,( i , j ) n Œ N * p ¥ N ( ) ( ) + 4 p J z ,( i , j ) + B x - D t n - 1 - H n - 1 n - 1 D y H x ,( i , j + ) x ,( i , j - ) n Œ N ¥ N * p n Œ N * p ¥ N * p ( ) ~ D z ( ) E z , x - 1 D n Œ N * p ¥ N n Œ N * p ¥ N * p = e ( i , j ) ( ) ~ D z ( ) n n E E z , y ( i , j ) ( i , j )

  16. 2D Maxwell Equations hybridization wins! ~ order of magnitude

  17. Example 1 Field visualization E z Metallic geometry Steady-state field J z ~ e i w t A quadrupole is born! res = 100 fi N pixels = 10,000 Serial ~ 1 minute Parallel ~ 10 minutes Let’s go to a more interesting problem…

  18. Example 2 Field visualization PhC-Metal geometry Using moderate resolutions res ~ 30 fi N pixels = O (10 8 ) J z Is this a job for star-Maxwell? a a serial = impossible parallel ~ hour L >> a

  19. Future Work (coming months) C++ for loops much faster than Matlab’s (try MPI implementation --- task-parallel approach) A promising optimization (3D geometries):

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend