 
              Advances in Self-Consistent Accelerator modeling John R. Cary University of Colorado and Tech-X Corporation Presented at TJNAF 25 May 06 and the VORPAL TEAM Dan Barnes, David Bruhwiler, Richard Busby, Johan Carlsson, John R. Cary, Dimitre Dimitrov, Eugene Kashdan, Peter Messmer, Chet Nieter, Viktor Przebinda, Nate Sizemore, Peter Stoltz, Raoul Trines, Seth Veitzer, Wen-Lan Wang, Nong Xiang NSF, the DOE SBIR program
Advances in Self-Consistent Electromagnetic Modeling • Complex cavity computations with particles have been improved through algorithms, including parallelization, making possiblle computations of wakefields in complex structures, intrabunch effects, injectors, … • Summary of some of what has made this possible – Local charge and current deposition methods – Parallelization – Improved stability – Boundary representations • Comparison with – Finite element approaches – Unitary separation approaches 2
The goals of modeling? • Part of the design process – Create – Simulate – Build – Test • Simulation for prediction of – Cavity losses – Instability • In general for – Exploration – Confirmation – Elucidation 3
Modeling allows one to answer questions without construction cost NLC ILC (Tesla) 4
Basic problem in charge particles moving in EM fields Auxiliary equations • Maxwell � B � t = ��� E � • B = 0 � E � t = c 2 �� B � µ 0 j � • E = � / � 0 [ ] • Particle sources � � j = q i v i � ( x � x i ) q i � ( x � x i ) � = • Particle dynamics d � v ( ) d x i = q i [ ] dt = v i E ( x i , t ) + v i � B ( x i , t ) dt m i 5
With much other physics added for a complete model • Particle injection • Dark currents • Multipactoring • Photon (short wavelength) production • Surface resistance • Secondary emission 6
ELECTROMAGNETICS 7
Yee: 2nd order accurate spatial differentiation E z � y + � E y � t = �� E z � B x � z • At the midpoint � y = E z , j + 1 � E z , j � E z y j y j+1 + O ( � y 2 ) � y B z • Leads to special layout E y of values in a cell E z B y • Yee mesh gives 2nd z B x order accuracy of spatial E z derivatives y E y x E x 8
Second-order in time by leap frog E n E n+1 E n+2 B n+1/2 B n+3/2 B n+1 t n+1/2 t n+3/2 t n t n+1 � y + � E y � t = �� E z � B x � z � n n n n � n � 1/2 = � t E z , i , j , k � E z , i , j + 1, k + E y , i , j , k + 1 � E y , i , j , k n + 1/2 � B x , i , j , k � � B x , i , j , k � � � y � z � � • Time centered differences give second order accuracy in Δ t • Can get time-collocated values by half-stepping in B • Similar for E update, except c 2 factor 9
Matrix representation useful for stability � � dB x , i , j , k E z , i , j , k � E z , i , j + 1, k E y , i , j , k + 1 � E y , i , j , k � � = + � � dt � y � z � � d 2 b d b d e dt = c 2 � dt 2 = � c 2 C • dt = � C • e C • b C • b = � D • b � • Magnetic and electric spaces are different • C , C ’ are adjoints, so D is self-adjoint (symmetric) • Diagonalize into separate harmonic oscillators • Leap frog for harmonic oscillator, stability limit at 1 � t CFL = � max � t CFL = 2 � x 2 + 1 1 � y 2 + 1 c � z 2 10
Gershgorin Circle Theorem gives stability bound • Frequencies are eigenmodes of D = c 2 C’C • Eigenvalues in range � � 0 < � 2 < max � � � D ij � over i � � � j • Gives precise range for infinite grid • Points to relation between coefficients and frequencies for other cases 11
Many other methods available • Finite element - later • Hamiltonian splitting (de Raedt): into exactly solvable parts d b , e ( ) ( ) = M • N • b , e ( ) = A • b , e dt d U M d U N – known: = M • U M = N • U N dt dt – stable approximate solution (since unitary): U ( � t ) = U N ( � t /2) • U M ( � t ) • U N ( � t /2) – Similar to drift-kick of symplectic integration – Lee and Fornberg (2005) have improved method based on Zheng et al (1999) • Smith, Cary, Carlsson now have implicit, charge- conserving algorithm 12
PARTICLES 13
Computing particle-particle interactions is prohibitive • Coulomb interaction leads to N p2 force computations x i � x j d � i v i q i � q j = 3 dt � 0 m i x i � x j j • Lenard-Weichert (retarded potentials) - worse due to need to keep history d � i v i q i � q j F ij ( x i , x j ( t � � )) = dt � 0 m i j 14
Particle In Cell (PIC) reduces to N p scaling • Particle contributions to charges and currents are added to each cell: O(N p ) operations • Forces on a particle are found from interpolation of the cell values: O(N p ) operations 15
Finding the force: interpolation (gather) • Linear weighting for each dimension – 1D: linear – 2D: bilinear = area weighting – 3D: trilinear = volume weighting • Force obtained through 1st order, error is 2nd order • For simplicity, no loss of accuracy, weight first to nodal points E x,node E x,yee E x,yee 16
Efficiency Avoiding Poisson
Only certain EM algorithms ensure Poisson satisfied � x + � E y � y + � E z � E x satisfied � • E = � / � 0 � z = � / � 0 always if � E � t = c 2 �� B � µ 0 j [ ] finite difference version and � x + � E y � y + � E z � E x initially � • E = � / � 0 � z = � / � 0 and � x + � j y �� � y + � j z � t = �� j x � � � t = �� • j � z 18
A special scatter ensures finite difference charge conservation • Principle: apportion via some weighting • Computing the charge density – Compute the current density and find the charge density from finite difference – Directly weight particles to the grid • If these two methods do not agree, then Current contrib. to this interface must match charge one can have false charge buildup from difference change across the Ampere-Maxwell equation. separated cells Requires Poisson solve to remove. j • Villasenor/Buneman explicitly conserves charge, but is noisier t 19
EM algorithm must take numerically divergenceless to numerically divergenceless n n n � x + � E y � y + � E z � E x � z = 0 and E n + 1 = M • E n implies n + 1 n + 1 n + 1 + � E y + � E z � E x = 0 � x � y � z 20
Mardahl and Verboncoeur show importance of getting this right 21
Parallelism: domain decomposition Domain 3 Domain 1 Domain 4 Domain 2 22
Parallelism rules of thumb • Communication is expensive • Global solves are really expensive 23
Overlap of communication and computation needed for speed • Non overlap algorithms: GuardPlus – Compute domain – Send skin (outer edge) – Receive guard – Repeat • Overlap algorithms – Compute skin Skin Guard – Send skin Body – Compute interior – Receive guard – Repeat 24
Similar overlap possible for particles • Move particles and weight currents to grid • Send currents needed by neighboring processors • Send particles to neighboring processors • Update B for half step • Receive currents and add in • Update E, B • Receive particles Without charge conserving current deposition, further costly global solve 25
VORPAL implements basic algorithms in highly scalable manner Object-oriented and flexible ( Arbitrary dimensional) 10 4 • Self-consistent EM modeling – Full EM or electrostatic + cavity s(N) mode – Particle in cell with relativistic or nonrelativistic dynamics • But has other capabilities 1 – Impact and field ionization 10 4 # processors – Fluid methods for plasma or neutral VORPAL scales well to gases 1,000’s of processors (strong scaling) – Implicit EM – Secondary emission • And is modern – Serial or Parallel (general domain decomposition) – Cross-platform (Linux, AIX, OS X, Windows) – Cross-platform binary data (HDF5) 26
Simplest algorithm allows complex computations • Example: formation of beams in laser-plasma interaction 27
Elucidation: long pulses shorten to resonance, capture, loading, acceleration 28
Simulations have found the hosing problem 29
Complications: boundaries 30
Modes computed with combination of FFT and fitting • Spherical cavity • Resonant current driver • FFT measurement of frequency, for accuracy by fitting 31
Early work on structured meshes had stair-step boundary conditions 120x24x24 = 71,424 cells = 215,000 degrees of freedom • N (L/ Δ x) cells in each direction • Error of ( Δ x/L) 3 at each surface cell • O(N 2 ) cells on surface • Error = N 2 ( Δ x/L) 3 = O(1/N) 32
Convergence studies confirm result, indicate modeling problem • Stair-step error is 10 -3 at 5000 cells per dimension, error linear with cell size • 10 11 cells for 3D problem This approach will not give answer even on large, parallel hardware 33
Recommend
More recommend