Efficient Particle-In-Cell modeling of laser-plasma accelerators - - PowerPoint PPT Presentation
Efficient Particle-In-Cell modeling of laser-plasma accelerators - - PowerPoint PPT Presentation
Efficient Particle-In-Cell modeling of laser-plasma accelerators J.-L. Vay Lawrence Berkeley National Laboratory Johns Adams Institute, Oxford, United Kingdom June 12, 2014 In collaboration with the LOASIS program Lawrence Berkeley
Program head – Wim Leemans Deputy – Eric Esarey
In collaboration with
- the LOASIS program – Lawrence Berkeley National Laboratory
- B. Godfrey, I. Haber – U. of Maryland
- D. Grote – Lawrence Livermore National Laboratory
- T. Drummond, A. Koniges – Lawrence Berkeley National Laboratory
Laser plasma acceleration (LPA) offers path to shorter, cheaper particle accelerators
boat wake surfer
3
water laser e- beam plasma Strong electric fields ~1,000x that of conventional accelerators
Based in first principles
- first principles includes nonlinear, 3D, kinetic effects,
- particle push and EM solver locals scales well to >100ks CPUs.
4
Primary method for simulations of beams and plasmas is the Particle-In-Cell (PIC) method
Deposit current Push particles time Gather forces Newton-Lorentz Field solve Maxwell Clouds of particles fields
Filtering Filtering
currents + filtering (currents and/or fields).
Charged particles EM fields
Particle-In-Cell (PIC) Ex Ex Ex Ex Ex Ex Ey Ey Ey Ey Ey Ey Usually ‘Yee’ staggered mesh
Ex Ex Ex Ex Ex Ex Ey Ey Ey Ey Ey Ey Ex Ex Ex Ex Ex Ex Ex Ey Ey Ey Ey Ey Ey Ex Ex Ex Ex Ex Ex Ex Ey Ey Ey Ey Ey Ey Ey Ex Ex Ex Ex Ex Ex Ey Ey Ey Ey Ey Ey Ey Ex Ex Ex Ex Ex Ex Ey Ey Ey Ey Ey Ey Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y Ex,y
“Energy-Conserving”
(a.k.a. “Galerkin”)
“Momentum-Conserving” “Uniform”
- interpolates from staggered grid
- one order down in // direction
- interpolates from staggered grid
- same order in all directions
- first interpolate at grid nodes
- same order in all directions
Field interpolation options
(Bz – not shown – at cell centers)
Ez (V/m) 20
- 10
10
injection laser pulse pump laser pulse trapped electron bunch
Example: 3-D simulation of a new concept for injection
- f very low emittance beams
*L.-L. Yu, E. Esarey, C. B. Schroeder, J.-L. Vay, C. Benedetti, C. G. R. Geddes, M. Chen, and W. P. Leemans, Phys. Rev. Lett. 112, 125001 (2014)
Two-color laser-ionization injection*:
0 200 400 600 800 0.01 0.02 0.03
Emittance [mm.mrad] Z [microns]
−2 −1 1 2 −2 −1 1 2 −2 −1 1 2 −0.1 0.0 0.1 −2 −1 1 2 −0.1 0.0 0.1 1.5 2.0 2.5 3.0 3.5 10+6 0.0 0.5 1.0 1.5 10+5
10 20 30 40 50
Y vs X X Y
10 20 30 40 50
Py/mc vs Y Y Py/mc
50 100 150 200
Px/mc vs X X Px/mc
Simulations confirm that new scheme enables the production of very high quality beams. Emittance < 0.03 mm-mrad for HEP and light sources
e- bunch projections
energy
Low emittance enables focusability to tight spot.
Warp-3D
7
- Numerical limitations
- discretization errors (finite cell size, finite time step, staggering of quantities)
high resolution, small time steps
- sampling errors (noise)
many macroparticles and/or smoothing/filtering
Laser plasma acceleration is especially challenging
- Large space and time scale disparities
- short wavelength laser propagates into long plasma channel
Lab (full) Lab (w/ moving window)
Warp-3D Warp-3D
even with moving window, many time steps needed for first principles simulations (tens of millions of time steps for 10 GeV stage)
8
Many techniques available:
- downscaled parameters
- e.g. LPA: 100 MeV/1019cm-3 stage proxy for 10 GeV/1017cm-3 stage
- reduced dimensionality: 1D, 2D, 2D-RZ, 2D-RZ-multimodes, fluid
- large spatial disparities
- parallelization
- moving window
- mesh refinement
- large time scale disparities
- envelope/ponderomotive – averages over shortest time scale
- quasistatic – separates slow macro- & fast micro-evolutions
- large spatial & time scale disparities
- Lorentz boosted frame – reduces scale disparities
Dealing with large spatial/time scale disparities
9
Many techniques available:
- downscaled parameters
- e.g. LPA: 100 MeV/1019cm-3 stage proxy for 10 GeV/1017cm-3 stage
- reduced dimensionality: 1D, 2D, 2D-RZ, 2D-RZ-multimodes, fluid
- large spatial disparities
- parallelization
- moving window
- mesh refinement
- large time scale disparities
- envelope/ponderomotive – averages over shortest time scale
- quasistatic – separates slow macro- & fast micro-evolutions
- large spatial & time scale disparities
- Lorentz boosted frame – reduces spatial/time scale disparities
Dealing with large spatial/time scale disparities
10
L≈1. m l≈1. m
- 1. m/1. m=1,000,000.
Lab frame compaction
X20,000.
l’=200. m 0.01 m/200. m=50. Boosted frame = 100
Hendrik Lorentz
L’=0.01 m
Lorentz boosted frame reduces scale range by orders of magnitude1
1J.-L. Vay, Phys. Rev. Lett. 98, 130405 (2007) 2J.-L. Vay, et al., Phys. Plasmas 18,123103 (2011)
LBF predicted speedup1,2:
- > 10,000 for single 10 GeV (Bella) stage,
- > 1,000,000 for single 1 TeV stage.
e- beam energy
11
Warp-3D – a0=1, n0=1019cm-3 (~100 MeV) scaled to 1017cm-3 (~10 GeV)
LBF method carefully validated in deeply depleted beam loaded stages
- - Excellent agreement between runs at various boost
*J.-L. Vay, et al., Phys. Plasmas 18, 123103 (2011)
Wake early late momentum spread XR.M.S.
12
But two difficulties were identified at high boost:
- relative shortening of Rayleigh length complicates laser injection,
- instability developing at entrance of plasma.
13
Laser injection through moving plane solves initialization issue in LBF
Lab frame Standard laser injection from left boundary or all at once plasma Boosted frame Shorter Rayleigh length LR/boost prevents standard laser injection plasma
*J.-L. Vay, et al., Phys. Plasmas 18, 123103 (2011)
Solution: injection through a moving planar antenna in front of plasma*
- vboost
- Laser injected using macroparticles
using Esirkepov current deposition ==> verifies Gauss’ Law.
- For high boost, backward radiation
is blue shifted and unresolved.
Method developed in Warp*, and implemented in other codes (e.g. Osiris and V-Sim).
14
Is it numerical Cherenkov instability? BTW, what is “numerical Cherenkov instability”?
Warp 2D simulation 10 GeV LPA (ne=1017cc, =130)
Longitudinal electric field laser plasma
Short wavelength instability observed at entrance of plasma for large 100)
Numerical Cherenkov discovered by B. Godfrey in 1974*
Lagrangian plasma streaming through Eulerian grid at relativistic velocity
*B. B. Godfrey, “Numerical Cherenkov instabilities in electromagnetic particle codes”, J. Comput. Phys. 15 (1974)
Exact Standard PIC Numerical dispersion leads to crossing of EM field and plasma modes -> instability.
16
Non-Standard Finite-Difference solver offers tunability of numerical dispersion
FD (Yee) NSFD (Karkkainen)
No dispersion along axes
a b b b b
- a
- b
- b
- b
- b
-
-
-
-
Dx
NSFD1,2: weighted average of quantities transverse to FD (ab). NSFD=FD if a, b=0. Pukhov algo3 for 1 set of a,b,
- 1J. B. Cole, IEEE Trans. Microw. Theory Tech. 45 (1997).
- J. B. Cole, IEEE Trans. Antennas Prop. 50 (2002).
- 2M. Karkkainen et al., Proc. ICAP, Chamonix, France (2006).
- 3A. Pukhov, J. Plasma Physics 61 (1999) 425
Comparing runs using Yee or NSFD (Karkkainen) solvers
17
Power spectrum (a.u.)
Surprise: Instability mostly insensitive to tuning of numerical dispersion… …but very sensitive to time step*!
*J.-L. Vay, et al., J. Comput. Phys. 230, 5908 (2011).
Sharp decrease of instability level around cδt=δz/√2
Used special time step to reduce instability but was not enough: wideband filtering?
18
Lab frame Frame of wake (=130)
spectrum spectrum
Does physics allow to use wideband filtering?
Time history of laser spectrum (relative to laser l0 in vacuum)
Dephasing time
Content concentrated around l0 Content concentrated at much larger l
More filtering possible without altering physics*.
*J.-L. Vay, et al., PoP Lett. 18 (2011).
Spectrum very different in lab and boosted frames
19
Time Time
Laser field Laser field Lab frame Wake frame Hyperbolic rotation from Lorentz Transformation converts laser…
…spatial oscillations
into
time beating
20
Digital filtering of current density and/or fields
- - commonly used for improving stability and accuracy
Multiple pass of bilinear filter + compensation routinely used
100% absorption at Nyquist freq.
Bilinear (B) Bilinear (B) + compensation (C) 1/2 1/4 1/4
Bilinear filter
Wideband filtering difficult in parallel (footprint limited by size of local domains)
- r expensive
Example: wideband filters using N repetitions of bilinear filter
1×B + C 4×B + C 20×B + C 50×B + C 80×B + C
21
“Strided” bilinear filters enable more efficient filtering*
4×BC stride 1 (G1) 4×BC stride 2 (G2) 4×BC stride 3 (G3) 4×BC stride 4 (G4)
Bilinear filter with stride 2
1/2 1/4 1/4
Using a stride N shifts the 100% absorption frequency to Fnyquist/N Succession of passes with different strides wideband filter:
- G1G2 20*B+C; speedup ×2
- G1G2G3 50*B+C; speedup ×3.5
- G1G2G4 80*B+C; speedup ×5.5
G1 × G2 G1 × G2 × G3 G1 × G2 × G4 20×B+C 50×B+C 80×B+C
*J.-L. Vay, et al., J. Comput. Phys. 230, 5908 (2011).
81 pass standard wideband filter down to 15 passes with strided filters
22
Magic time step + filtering enabled high boosted frame
>1 million x speedup
Warp:
- 1. J.-L. Vay, et al., Phys. Plasmas 18
123103 (2011)
- 2. J.-L. Vay, et al., Phys. Plasmas
(letter) 18 030701 (2011)
- 3. J.-L. Vay, et al., J. Comput. Phys.
230 5908 (2011)
- 4. J.-L. Vay et al, PAC Proc. (2009)
Osiris:
- 1. S. Martins, et al., Nat. Phys. 6
311 (2010)
- 2. S. Martins, et al., Comput. Phys.
- Comm. 181 869 (2010)
- 3. S. Martins, et al., Phys. Plasmas
17 056705 (2010)
- 4. S. Martins et al, PAC Proc. (2009)
Vorpal:
- 1. D. Bruhwiler, et al., AIP Conf. Proc
1086 29 (2009)
22
Speedup verified by us and others to over a million
Enabling simulations previously untractable
23
Simulation of 10 GeV stage*
*J.-L. Vay, et al., Phys. Plasmas 18 123103 (2011)
Warp 2-D
State-of-the-art PIC simulations of 10 GeV stages:
2006 (lab) in 1D: ~ 5k CPU-hours 2011 (boost) in 3D: ~ 1k CPU-hours
Mitigating instability was nice but understanding it is better
Godfrey extended his theory to 2-D* (applies readily to 3-D):
- 2-D numerical dispersion relation (in Fourier space):
- The ξ depend on w, k, Dt, Dx, Dz, the deposition and gather weighting functions,
and the streaming velocity
- The [w], [k] , and D* depend on the form of the finite differences (D* = 1 for
standard Yee EM field solver) xz,z +[w] xz,x xz,y +[kx] xx,x +[w] xx,y -[kz] Dx
*[kx]
- Dz
*[kz]
[w] æ è ç ç ç ç ö ø ÷ ÷ ÷ ÷ Ez Ex By æ è ç ç ç ç ö ø ÷ ÷ ÷ ÷ =0
*B. B. Godfrey, J. L. Vay, “Numerical Stability of Relativistic Beam Multidimensional PIC Simulations Employing the Esirkepov Algorithm”, J. Comput. Phys., 248 (2013) 33–46
kz kx w Need to consider at least first aliases mx={-3…+3} to study stability.
Exact FDTD-Yee
kz kx w
Numerical Cherenkov in 2-D (for cDt=Dx/sqrt(2))
light plasma at b=0.99 light plasma at b=0.99
25
kz kx w Need to consider at least first aliases mx={-3…+3} to study stability. kz kx w Need to consider at least first aliases mx={-3…+3} to study stability.
Full treatment requires taking aliases into account
PSATD FDTD-Yee light plasma at b=0.99 light plasma at b=0.99 aliases aliases
26
Maps of unstable modes
Normal modes at kx=0.5p/Dx for cDt=0.7Dz
EM modes Plasma modes
Projection of normal modes intersection
Growth rates from theory match Warp simulations
Theory Warp kx kz kz kx
Warp run uses uniform drifting plasma with periodic BC.
Yee finite difference, energy conserving gather (cDt/Dx=0.7)
Analysis recovers magic time step observed with Warp
Warp 2D simulation 10 GeV LPA (ne=1017cc, =130)
Yee solver Cole-Karkkainen solver
“Energy conserving” gather
Also found that alternate field gather procedure lowers growth rate significantly
Warp 2D simulation 10 GeV LPA (ne=1017cc, =130)
Alternative gather gives vanishing growth rate around Dt/Dz=0.5.
- growth rate goes to zero as tends to infinity.
Yee Cole-Karkkainen
Was confirmed in 2-D and 3-D Warp simulations of LPA (linear deposition of currents, no smoothing)
Final field energy normalized to reference case with no instability (100 MeV LPA stages at =13) 2-D 3-D
Using cubic interpolation and smoothing widens stability region
Final field energy normalized to reference case with no instability (100 MeV LPA stages at =13) 2-D 3-D
Latest insight led to new scheme for suppression of instability
New scheme* corrects fields seen by particles using 9-points linear smoothing in direction of plasma drift; e.g. for uniform gather
- key feature is slightly different
smoothing for Ex and By (except at magical time step)
- inexpensive and easy to implement
- amount of smoothing similar to
standard practice
1 2 3 0.0 0.5 1.0
Uniform, spectrum (dt/dx=0.05) kx s
1 2 3 0.0 0.5 1.0
Uniform, spectrum (dt/dx=0.5) kx s
1 2 3 0.0 0.5 1.0
Uniform, spectrum (dt/dx=0.95) kx s
*B. Godfrey & J.-L. Vay, J. Comput. Phys. 267 (2014) Correction to Ex Correction to By 3-pass bilinear filter
New scheme yields excellent stability
Superior choices for smoothing function may produce even better results. Theory Warp simulations of LPA Enables stability at all time steps.
- B. Godfrey & J.-L. Vay, J. Comput. Phys. 267 (2014)
Time to revisit PIC algorithm for upcoming new computer architectures
- In the early 70s, it was shown by Haber et al1 that an exact solution exists for
Maxwell in Fourier space if source J assumed constant over time step: “Pseudo-Spectral Analytical Time-Domain” or PSATD
- Note: taking first terms of Taylor expansion of C and S leads to pseudo-spectral formulation:
“Pseudo-Spectral Time-Domain” or PSTD2
- 1I. Haber, R. Lee, H. Klein & J. Boris, Proc. Sixth Conf. on Num. Sim. Plasma, Berkeley, CA, 46-48 (1973)
2Similar to UPIC solver, V. Decyck, UCLA
35
New analysis enables the exploration of other schemes
Pseudo-spectral solvers offer higher accuracy
36
- 1I. Haber, R. Lee, H. Klein & J. Boris, Proc. Sixth Conf. on Num. Sim. Plasma, Berkeley, CA, 46-48 (1973)
2J.-L. Vay, I. Haber, B. Godfrey, J. Comput. Phys. 243, 260-268 (2013)
Electromagnetic solvers
Finite-Difference Time Domain
Warp (LBNL/LLNL/U. Maryland), Osiris (UCLA), V-sim (Tech-X), etc.
Pseudo-Spectral Time Domain
UPIC-EMMA (UCLA)
Pseudo-Spectral Analytic Time Domain1,2
Warp (LBNL/LLNL/U. Maryland)
Kronecker d pulse
- Numerical dispersion,
- anisotropy,
- Courant condition:
1
- Numerical dispersion,
- isotropy,
- Courant condition:
- Exact dispersion,
- isotropy,
- Courant condition:
None
Godfrey theory applied to PSATD shows improved stability*
FDTD-CK simulation results included for comparison
*B. B. Godfrey, J.-L. Vay, I. Haber, “Numerical stability analysis of the Pseudo-Spectral Analytical Time-Domain PIC algorithm”, http://arxiv.org/abs/1305.7375v2
37
Stability over wide range of time steps confirmed with Warp*
Numerical energy growth in 3 cm, =13 LPA segment
38
*B. B. Godfrey, J.-L. Vay, I. Haber, “Numerical stability analysis of the Pseudo-Spectral Analytical Time-Domain PIC algorithm”, http://arxiv.org/abs/1305.7375v2
39
But spectral solvers involve global operations: harder to scale to large # of cores
Spectral Finite Difference (FDTD) global “costly” local “cheap” communications communications vs Harder to scale Easier to scale
New concept* opens the way to spectral accuracy with FDTD scaling: finite speed of light local FFTs.
*J.-L. Vay, I. Haber, B. Godfrey, J. Comput. Phys. 243, 260-268 (2013)
−50 −50
−1 −0.5
−50 50 −50
−1 −0.5
−50 −50
−1 −0.5
−50 −50
−1 −0.5
Copy unaffected data
−50 −50
−1 −0.5
−50 50 −50
−1 −0.5
Separate calculation in two domains guard regions
−50 −50
−1 −0.5
−50 50 −50
−1 −0.5
Advance to time T+DT spurious signal limited to guard regions
40
Example: unit pulse expansion at time T
*J.-L. Vay, I. Haber, B. Godfrey, J. Comput. Phys. 243, 260-268 (2013)
New concept on single pulse – part 1
Global FFT Local FFT Local FFT Local FFT Local FFT
−50 −50
−1 −0.5
−50 50 −50
−1 −0.5
−50 −50
−1 −0.5
−50 −50
−1 −0.5
−50 −50
−1 −0.5
−50 −50
−1 −0.5
Ready for next time step
−50 −50
−1 −0.5
−50 50 −50
−1 −0.5
−50 −50
−1 −0.5
−50 −50
−1 −0.5
−50 −50
−1 −0.5
−50 −50
−1 −0.5
zero out remaining areas
−50 −50
−1 −0.5
−50 −50
−1 −0.5
41
*J.-L. Vay, I. Haber, B. Godfrey, J. Comput. Phys. 243, 260-268 (2013)
−50 −50
−1 −0.5
−50 50 −50
−1 −0.5
−50 −50
−1 −0.5
−50 −50
−1 −0.5
To affected areas Successfully tested on 7x7 domain
New concept on single pulse – part 2
Local FFTs
PSATD-PIC
Improved phase space accuracy
correct behavior
0.
- 0.5
0.5
vz/c
0.1 0.11
z(mm)
0.
- 0.5
0.5
vz/c
0.1 0.11
z(mm)
spurious heating Artificial trapping
FDTD-PIC
Lab frame Improved stability
instability
PSATD-PIC FDTD-PIC
Lorentz boosted frame (wake)
Short laser propagates into long plasma channel, electron beam accelerated in wake.
Warp-3D Warp-3D Warp-2D Warp-2D
Modeling in a boosted frame reduces # time steps. Plasma drifting near C leads to Num. Cherenkov.
42
Successfully tested on 2-D modeling of short LPA stages
Errors appear lower than standard PIC
e.g., smoothing, guards cells are effective
Challenge
- - novel method makes a small approximation
Mix global with new local exchanges
reduces further impact of approximation
5 10 15 10−6 10−4 10−2 10+0
−6 −4 −2
dW/W0 npass
Smoothing (bilinear, # passes) Dw/w0 Relative error spectral standard 4 guard cells 8 guard cells 16 guard cells Typical PIC Test unit pulse
- grid 126×126
- T=200Dx/(cDt)
- // runs: 3 CPUs
43
Error accumulation and mitigations are being studied in detail.
e.g. openmp + mpi
New more efficient & accurate envelope solver
- superior envelope representation:
cartesian â = Re{â} + i Im{â} polar|â|exp(iθ) → reduced numerical errors → improves efficiency by x30 for 10 GeV stage.
- more complete wave operator** enables study of
deep laser depletion (essential for BELLA) .
*2D-RZ parallel PIC/fluid code - Benedetti et al., Proc. of AAC10 (2010); **Benedetti et al., Proc. of PAC11 (2011)
Other algorithmic advances: New envelope & quasistatic solvers implemented in INF&RNO*
Quasistatic solver allows large time steps → parallelization w/ F. Rossi (U. Bologna)
→ sped up >x10,000 parametric study of self-guiding for BELLA laser in ~10 cm gas jet
usually neglected
Ez/E0 kp(z-ct)
– explicit – QS explicit QS
Benchmarked agains explicit PIC
44
Summary
- Laser Plasma Accelerators promise shorter & cheaper accelerators.
- Challenges of efficient modeling of LPAs has fostered algorithmic advances.
- Those are exciting times: both accelerators and computing are in the midst of a
revolution (or rapid evolution), with the emergence of:
- high-gradient field technologies for shorter & cheaper particle accelerators,
- manycore/GPU technologies for more powerful and energy efficient supercomputers.
- Other informations:
- Warp is open source; available at warp.lbl.gov
- Supporting material (with Mathematical source) for Numerical Cherenkov available at
http://hifweb.lbl.gov/public/BLAST/Godfrey
45
Extras
46
- Geometry: 3-D (x,y,z) axisym. (r,z) 2-D (x,z) 2-D (x,y)
- Reference frame: lab moving-window Lorentz boosted
z z-vt (z-vt); (t-vz/c2)
- Field solvers
- electrostatic/magnetostatic - FFT, multigrid; AMR; implicit; cut-cell boundaries
- Fully electromagnetic - Yee mesh, PML, MR
- Accelerator lattice: general; non-paraxial; can read MAD files
- solenoids, dipoles, quads, sextupoles, linear maps, arbitrary fields, acceleration.
- Particle emission & collisions
- particle emission: space charge limited, thermionic, hybrid, arbitrary,
- secondary e- emission (Posinst), ion-impact electron emission (Txphysics) & gas emission,
- Monte Carlo collisions: ionization, capture, charge exchange.
47
Warp: Particle-In-Cell framework for accelerator modeling
y z x
Z (m) R (m)
Automatic meshing around ion beam source emitter Versatile conductor generator accommodates complicated structures
- Parallellization: MPI (1, 2 and 3D domain decomposition)
- Python and FORTRAN*: “steerable,” input decks are programs
48
Warp is parallel, combining modern and efficient programming languages
Parallel strong scaling of Warp 3D PIC-EM solver on Franklin supercomputer (NERSC)
From warp import * … nx = ny = nz = 32 dt = 0.5*dz/vbeam … initialize() step(zmax/(dt*vbeam)) … Imports Warp modules and routines in memory Sets # of grid cells Sets time step Initializes internal FORTRAN arrays Pushes particles for N time steps with FORTRAN routines *http://hifweb.lbl.gov/Forthon (wrapper supports FORTRAN90 derived types) – dpgrote@lbl.gov
49 Space charge dominated beams
Injection Transport Neutralization Warp-Posinst SPS
Electron cloud effects Multi-charge state beams
LEBT – Project X
Traps
Warp Alpha anti-H trap Paul trap
Courtesy H. Sugimoto
3D Coherent Synchrotron Radiation Laser plasma acceleration Free Electron Lasers Beam dynamics in rings
UMER
Multi-pacting
“Ping-Pong” effect