My Parallel Electromagnetic Solver A Star-Maxwell-P FDTD Propagator - - PowerPoint PPT Presentation

my parallel electromagnetic solver
SMART_READER_LITE
LIVE PREVIEW

My Parallel Electromagnetic Solver A Star-Maxwell-P FDTD Propagator - - PowerPoint PPT Presentation

My Parallel Electromagnetic Solver A Star-Maxwell-P FDTD Propagator 18.337 Parallel Computing Alejandro W. Rodriguez Outline Overview of nanophotonics Statement of the problem Parallelization: a data-parallel approach Minimizing temporal and


slide-1
SLIDE 1

My Parallel Electromagnetic Solver

A Star-Maxwell-P FDTD Propagator

Alejandro W. Rodriguez

18.337 Parallel Computing

slide-2
SLIDE 2

Outline

Overview of nanophotonics Statement of the problem Parallelization: a data-parallel approach Minimizing temporal and memory scalings Future work

slide-3
SLIDE 3

“New-School” Electromagnetism

Beyond the geometric-optics limit

3µm

[ P. Vukosic et al.,

  • Proc. Roy. Soc: Bio.
  • Sci. 266, 1403 (1999) ]

Design of frequency-selective structures Omniguides and Fiber optical guiding

  • ptical “insulators”

[ A. Rodriguez et al.,

  • Opt. Lett. 30, (2005) ]

[ P. Vukosic et al., Proc. Roy. Soc: Bio. Sci. 266, 1403 (1999) ] [ S. Y. Lin et al., Nature 394, 251 (1998) ]

slide-4
SLIDE 4

(wavelength)

[ Notomi, 2004 ] [ Lederman, 2006 ]

Quasi-periodic geometries

Nanophotonics — l-scale geometries

[ A. Rodriguez et. al.

  • Phys. Rev. B, 77,

104201, (2008) ]

periodic geometries a

Coherent scattering!

Any 1d-periodic (layered) Structure has a band gap

[J. Zi et al, Proc. Nat. Acad. Sci. USA, 100, 12576 (2003) ] [figs: Blau, Physics Today 57, 18 (2004)]

slide-5
SLIDE 5

— ¥ r D = ∂ r H ∂t — ¥ r E = - ∂ r B ∂t + 4p r J

Maxwell’s Equations

A Light-Speed Introduction to FDTD

r D = e r E r B = m0 r H

EM between produce 2D Yee Grid

r E (xi,y j,t j) = r E

i, j n

r B (xi,y j,tn) = Bi+1/ 2, j +1/ 2

n

Dx Dy

slide-6
SLIDE 6

continuum

A Light-Speed Introduction to FDTD

2D Maxwell’s Equations

∂Dz ∂t = ∂Hy ∂x - ∂Hx ∂y Ê Ë Á ˆ ¯ ˜ ∂Bx ∂t = ∂Ez ∂y ∂By ∂t = - ∂Ez ∂x r H = m0

  • 1r

B Ez = e -1Dz B

x,( i +, j + )

n

= B

x,( i +, j + )

n-1

+ Dt Dy E

z,( i, j+1)

n-1

  • E

z,( i, j-1)

n-1

( )

B

y,( i +, j + )

n

= B

y,( i +, j + )

n-1

  • Dt

Dx E

z,( i+1, j )

n-1

  • E

z,( i-1, j )

n-1

( )

H

(i +, j +)

n

= m0

  • 1B

(i +, j +)

n

D

z,( i, j )

n

= D

z,( i, j )

n-1 + Dt

Dx H

y,( i +, j )

n-1

  • H

y,(i -, j )

n-1

( )

  • Dt

Dy H

x,( i, j +)

n-1

  • H

x,(i, j - )

n-1

( ) + 4pJz,(i, j)

n-1

E

(i, j )

n

= e(i, j)

  • 1 D

( i, j )

n

discrete

slide-7
SLIDE 7

So…why is this a hard problem?

— ¥ r D = ∂ r H ∂t — ¥ r E = ∂ r B ∂t + 4p r J

Maxwell’s Equations

r D = e r E r B = m0 r H

rod layer hole layer

FCC crystal (solved 1995)

  • 2 complex (discrete) 3D fields

~ 100 flops / pixel / step

  • 3D size ~ 20 x 18 x 18 a
  • resolution (pixels / a) ~ 25

~ 10 pixels

3 8

~ flops!!

1012

  • 100-400 time steps

Temporal complexity Memory ~ 20 GB

Now, let’s create our own parallel code!

slide-8
SLIDE 8

Parallelization Schemes

Optimizing complexity

1D 2D # op. counts ~ a resd

np + b resd -1np

task ~ volume

  • comm. ~ area
slide-9
SLIDE 9

Parallelization Schemes

  • ptimizing complexity

1D ~ O n2

np + nnp Ê Ë Á ˆ ¯ ˜

Most common scenario

n >> np

2D

fi 2D Wins!

~ O

n2 np + 4n np Ê Ë Á ˆ ¯ ˜

slide-10
SLIDE 10

Star-P Implementation

Power of vectorization

Simple 1D example

Bi, j = Bi, j + a Ei, j +1 - Ei, j

( ) B,E Œ¬(N ¥ N)

consider EN +1, j = E0, j = 0 Vectorized Looped

for k=2:(N-1) B(:,k)=B(:,k)+ a ( E(:,k+1)-E(:,k) ) end B(:,2:end-1)=B(:,2:end-1) + a ( E(:,3:end-1)-E(:,1:end-2) )

slide-11
SLIDE 11

Star-P Implementation

Power of vectorization

looped vectorized

Bi, j = Bi, j + a Ei, j +1 - Ei, j

( )

Communication

  • ver second index

(1D parallelization) Communication cost too great?

slide-12
SLIDE 12

Star-P Implementation

1D parallelization

Bi, j = Bi, j + a Ei, j +1 - Ei, j

( )

E(N*p,N) E(N,N*p)

Direction of costly operation

~ O

n2 np + nnp Ê Ë Á ˆ ¯ ˜

~ O

n2 np +1 Ê Ë Á ˆ ¯ ˜

Parallelizing over direction perpendicular to operation means constant comm. cost!

slide-13
SLIDE 13

2D Maxwell Equations

Back to our problem

B

x,( i +, j + )

n

= B

x,( i +, j + )

n-1

+ Dt Dy E

z,( i, j+1)

n-1

  • E

z,( i, j-1)

n-1

( )

B

y,( i +, j + )

n

= B

y,( i +, j + )

n-1

  • Dt

Dx E

z,( i+1, j )

n-1

  • E

z,( i-1, j )

n-1

( )

H

(i +, j +)

n

= m0

  • 1B

(i +, j +)

n

D

z,( i, j )

n

= D

z,( i, j )

n-1 + Dt

Dx H

y,( i +, j )

n-1

  • H

y,(i -, j )

n-1

( )

E

(i, j )

n

= e(i, j)

  • 1 D

( i, j )

n

  • Dt

Dy H

x,( i, j +)

n-1

  • H

x,(i, j - )

n-1

( ) + 4pJz,(i, j)

n-1

Let’s try 2D parallelization!

Our problems mixes direction of cost-operations 1D parallelization will be susceptible to communication costs due to either (1) or (2) and certainly due to (3)

(1) (2) (3)

slide-14
SLIDE 14

2D Maxwell Equations

2D parallelization?

Expected from previous results

Regime of interest

slide-15
SLIDE 15

2D Maxwell Equations

A solution: hybridization!

B

x,( i +, j + )

n

= B

x,( i +, j + )

n-1

+ Dt Dy E

z,( i, j+1)

n-1

  • E

z,( i, j-1)

n-1

( )

B

y,( i +, j + )

n

= B

y,( i +, j + )

n-1

  • Dt

Dx E

z,( i+1, j )

n-1

  • E

z,( i-1, j )

n-1

( )

H

(i +, j +)

n

= m0

  • 1B

(i +, j +)

n

D

z,( i, j )

n

= D

z,( i, j )

n-1 + Dt

Dx H

y,( i +, j )

n-1

  • H

y,(i -, j )

n-1

( )

E

(i, j )

n

= e(i, j)

  • 1 D

( i, j )

n

  • Dt

Dy H

x,( i, j +)

n-1

  • H

x,(i, j - )

n-1

( ) + 4pJz,(i, j)

n-1

Hybrid parallelization scheme

Bx

n Œ N * p ¥ N

( ) ~ Ez,x

n Œ N * p ¥ N

( )

By

n Œ N ¥ N * p

( ) ~ Ez,y

n Œ N ¥ N * p

( )

Dz

n Œ N * p ¥ N * p

( ) ~ By

n Œ N ¥ N * p

( )

+Bx

n Œ N * p ¥ N

( )

Ez,x

n Œ N ¥ N * p

( ) ~ Dz

n Œ N * p ¥ N * p

( )

Ez,y

n Œ N * p ¥ N

( ) ~ Dz

n Œ N * p ¥ N * p

( )

Auxiliary fields

slide-16
SLIDE 16

2D Maxwell Equations

hybridization wins!

~ order of magnitude

slide-17
SLIDE 17

Example 1

Field visualization

Jz ~ eiwt Serial ~ 1 minute Parallel ~ 10 minutes

A quadrupole is born!

Ez

res = 100 fi N pixels = 10,000 Steady-state field Metallic geometry Let’s go to a more interesting problem…

slide-18
SLIDE 18

Example 2

Field visualization

PhC-Metal geometry

Jz

L >> a

Using moderate resolutions res ~ 30

fi N pixels = O(108)

Is this a job for star-Maxwell?

a a

serial = impossible parallel ~ hour

slide-19
SLIDE 19

Future Work

(coming months)

C++ for loops much faster than Matlab’s (try MPI implementation --- task-parallel approach) A promising optimization (3D geometries):