Why is it worth to spend 1.5 million CPU-hours on Relativistic Astrophysics?
Miguel Ángel Aloy Torás
Departamento de Astronomía y Astrofísica Universidad de Valencia Collaborators: Petar Mimica, Javier Vargas
Barcelona, 19-05-09 SciComp09
1
Why is it worth to spend 1.5 million CPU-hours on Relativistic - - PowerPoint PPT Presentation
Why is it worth to spend 1.5 million CPU-hours on Relativistic Astrophysics? Miguel ngel Aloy Tors Departamento de Astronoma y Astrofsica Universidad de Valencia Collaborators: Petar Mimica, Javier Vargas SciComp09 Barcelona,
Miguel Ángel Aloy Torás
Departamento de Astronomía y Astrofísica Universidad de Valencia Collaborators: Petar Mimica, Javier Vargas
Barcelona, 19-05-09 SciComp09
1
We have employed a numerical approach in order to understand the actual physics happening in relativistic, astrophysical sources. Over the last 2 years, we have used ~106 CPU hours/year of computer time in RES-facilities. In this period we have accomplished several tasks in the field
Tasks: 1.Calculation of the early phases of propagation of ultrarelativistic shells of
phase of gamma-ray bursts (GRBs), which find themselves among the most energetic events in our Universe. 2.Study of the physical parameters of blazars by comparing synthetic- emission models with observations. 3.Influence of the cooling on the collimation properties of relativistic outflows.
Common link: Relativistic (Resistive) (Magneto-)Hydrodynamics RHD, RMHD, RRMHD
2
Special Relativistic Hydrodynamics (SRHD): Equations
D t +·(Dv) = 0 (mass conservation) S t +·(S⊗v+ pI) = 0 (momentum conservation)
The state vector and the flux vectors are:
U = (D,S1,S2,S3,) , Fi = (Dvi,S1vi +1i,S2vi +2i,S3vi +3i,Si −Dvi) D = W: relativistic rest-mass density. S = hW2v: relativistic momentum density. = hW2c2 − p−Wc2: relativistic
energy density.
v: flow velocity. W = 1/
: rest-mass density. h = 1+/c2+ p/c2: specific enthalpy. : specific internal energy. p: pressure.
Relativistic Effects
h ≥ 1 ( ≥ c2) , W ≥ 1 (v → c)
5 equations 5 unknowns + p(,)
3
∂U ∂t + ∂Fx(U) ∂x + ∂Fy(U) ∂y + ∂Fz(U) ∂z = 0, ∇ · B = 0, U = ρu0 (ρh + b2)u0ux − b0bx (ρh + b2)u0uy − b0by (ρh + b2)u0uz − b0bz (ρh + b2)(u0)2 −
2
Bx By Bz , Fx(U) = ρux (ρh + b2)uxux +
2
(ρh + b2)uxuy − bxby (ρh + b2)uxuz − bxbz (ρh + b2)u0ux − b0bx Byvx − Bxvy Bzvx − Bxvz , Fy(U) = ρuy (ρh + b2)uxuy − bxby (ρh + b2)uyuy +
2
(ρh + b2)uyuz − bybz (ρh + b2)u0uy − b0by Bxvy − Byvx Bzvy − Byvz Fz(U) = ρuz (ρh + b2)uxuz − bxbz (ρh + b2)uyuz − bybz (ρh + b2)uzuz +
2
(ρh + b2)u0uz − b0bz Bxvz − Bzvx Byvz − Byvz .
8 equations 8 unknowns + p(,)
Requires 3 more variables than RHD. RMHD differs substantially from RHD at the numerical level. There is a mixture of volume average (v, p, etc.) and surface average variables (B). The volumetric variables follow the same numerical scheme as in RHD. Magnetic field components evolved using constraint transport to account for the numerical preservation of B. 4
14 equations 14 unknowns + p(,)
∂Q(P) ∂t + ∂F m(P) ∂xm = S(P), (55) where Q =
q ργ e Pi
, P =
q ρ p ui
, S =
−κ 0i q − κ −J i 0i
are the vectors of conserved quantities, primitive quantities and sources, respectively, and Fm =
Bm eimk Ek + gim E j −eimk Bk + gim J m ρum Sm im
i i
Sa(P) =
0i q −qvi 0i
and Sb(P) =
−κ 0i −κ −J i
c
0i
, where Jc = σγ [E + v×B − (E·v)v] is the conductivity current. The source term S is potentially stiff (in
From Komissarov (2007)
RRMHD is treated in a way very similar to RHD (Komissarov 2007), e.g., no constraint transport since all variables are volumetric. Drawback: need 2 extra scalar fields (,). Stiffness: Semi-analytic integration of the source terms involving resistivity and/or scalar fields.
B0
domain computational
Lx 2L y x y L
5
MRGENESIS is a multidimensional (1D, 2D or 3D), parallel (MPI) code which allows
In the newest development, not fully integrated with RMHD-branch, RRMHD is also included. Employs:
6
1 10 100 1000
number of threads
0,001 0,01 0,1 1 10
wall clock time [normalized]
iMac Intel Core 2 Duo 2.8 GHz (Uni. Valencia) iMac Intel Core 2 Duo 2.8 GHz, ifort 11.0 (Uni. Valencia) Intel Xeon 5140 2.33 GHz (Uni. Valencia) Intel Xeon E6405 2.0 GHz (HYBRID cluster, Uni. Split) AMD Opteron 8350 2.9 GHz (Uni. Valencia)
Small and multicore systems
7
1 10 100 1000
number of threads
0,001 0,01 0,1 1 10
wall clock time [normalized]
iMac Intel Core 2 Duo 2.8 GHz (Uni. Valencia) iMac Intel Core 2 Duo 2.8 GHz, ifort 11.0 (Uni. Valencia) Intel Xeon 5140 2.33 GHz (Uni. Valencia) Intel Xeon E6405 2.0 GHz (HYBRID cluster, Uni. Split) AMD Opteron 8350 2.9 GHz (Uni. Valencia)
Small and multicore systems
best performance: different threads on different processors. worst performance: different threads on same processor but different cores
7
1 10 100 1000
number of threads
0,001 0,01 0,1 1 10
wall clock time [normalized]
SiCortex SC072 (Megware Computer GmbH) Tirant (Uni. Valencia) MareNostrum (BSC)
Large multiprocessor systems
8
1 10 100 1000
number of threads
0,001 0,01 0,1 1 10
wall clock time [normalized]
iMac Intel Core 2 Duo 2.8 GHz (Uni. Valencia) iMac Intel Core 2 Duo 2.8 GHz, ifort 11.0 (Uni. Valencia) Intel Xeon 5140 2.33 GHz (Uni. Valencia) SiCortex SC072 (Megware Computer GmbH) Intel Xeon E6405 2.0 GHz (HYBRID cluster, Uni. Split) Tirant (Uni. Valencia) MareNostrum (BSC) AMD Opteron 8350 2.9 GHz (Uni. Valencia)
All systems
9
The use of a cutting-edge facility as Mare Nostrum (also Cesvima and Tirant) has allowed us to simulate with unprecedented resolution the early phase of the afterglow associated to one of the most luminous events in the Universe: GRBs. Because of the ultrarelativistic character of the ejecta flow, we have to use huge numerical resolution, i.e., enormous computational resources.
0,001 0,01 0,1 1 10 100
1 10
eq
Weak/No Reverse Shock Dissipation rel=1.04 rel=1.1 rel=1.25 Substantial Reverse Shock Dissipation
Long standing issue: do afterglows result from magnetized or unmagnetized ejecta sweeping the interstellar medium (ISM)?. Our simulations have quantified which is the approximate magnetization of the ejecta (o) to allow for the production of a reverse shock in the ejecta, which may accelerate particles, whose optical emission (optical flash) is envisioned to be the signature of such shock.
σ0 := B2 4πγ0ρ0c2
ξ = 0.73 E1/6
53
n1/6 ∆1/2
12 γ4/3 2.5
Mimica, Giannios, Aloy (2009)
10
Evolution of magnetized and non-magnetized afterglows
1 light-day = 2.6x1015 cm = 174 AU Earth surface magnetic field strength = 0.03 - 0.06 mT
11
Evolution of magnetized and non-magnetized afterglows
1 light-day = 2.6x1015 cm = 174 AU Earth surface magnetic field strength = 0.03 - 0.06 mT
12
Evolution of magnetized and non-magnetized afterglows
1 light-day = 2.6x1015 cm = 174 AU Earth surface magnetic field strength = 0.03 - 0.06 mT
12
coordinates).
which correspond to combinations
o = 10, 15, 20 o = 1015 cm (thin shell), 1016 cm (thick shell) E53 = 3.33, 33.3 n0 = 10 cm-3
Our simulations allowed us to verify the existence of scaling laws, that enabled us to extrapolate the results to parameters in the right range for applications to GRB afterglows, namely: oAG~100, oAG~3x1012 cm.
1,56 1,57
r [10
17] cm 10
10
10
,
1,57
r [10
17] cm 10 20
and blue lines show the results for models with γ0 = 10, 15 and 20, respectively. Full, dashed and dot-dashed lines show the rest-mass density, σ and the fluid Lorentz factor, respec-
Eqs (10) - (12). After rescaling of the results, the profiles of all models almost overlap. Only around the RS there are large
resolution and, therefore, we have to rescale our models using the closest discrete time we have to the one requested by the transformation expressed in Eqs. 11.
10 100
rel
FS 0 = 10 0 = 15 0 = 20 0 = 100 1 2
tobs
1 1,01 1,02 1,03 1,04
rel
RS
the rescaling hypothesis. Black, blue and red lines in the upper (lower) panel show the relative Lorentz factor of the fluid at the FS (RS) as a function of the normalized time of observation for models with γ0 = 10, 15 and 20, respectively. As expected, the relative Lorentz factor at the front shock scales with γ0 while at the reverse shock they coincide. The dashed line in the upper panel shows the results of a model obtained by rescaling to γ0 = 100 the model with γ0 = 15.
Mimica, Giannios, Aloy (2009)
13
Note on resolution: Need to resolve o in more than 104 zones for o ~ 10. Since Rtot / o ~ 100 - 1000 ⇒ Ntot ~ 106 - 107 ⇒ Niter ~ 107 - 108 (CFL~0.1) Currently, we use ~ 500 - 1000 CPUs/job, which need ~ 200 h to finish. Since x∝o , we would need ~ 2000 h in 500 - 1000 CPUs to run realistic models with o ~100. But the physical problem, if a realistic parameter set were needed, demands (in 1D!!): Rtot / oAG ~ 3x105 ⇒ Ntot ~ 3x109 ⇒ Niter ~ 3x1010 ⇒ 6x105 h in 500 CPUs ⇒ 3x104 h in 10000 CPUs ⇒ 3x105 h in 10000 CPUs (o ~100).
t , l.
1
log
log x
27 28
Niter x 10
3
(lower panel) needed to resolve the Riemann problem in planar coordinates as a function of the spatial discretization ∆x.
.
10 20 30 0.5 1
nar coordinates as a function of the initial Lorentz factor Γ0 for ∆x = 1.5625 × 10−5.
.
14
15
Blazars and pc-scale jets are observed because they emit huge amounts of non-thermal (synchrotron, IC, etc.) radiation.
Kovalev et al. (2007)
15
Blazars and pc-scale jets are observed because they emit huge amounts of non-thermal (synchrotron, IC, etc.) radiation.
Kovalev et al. (2007) Connecting observations (=non-thermal radiation) with simulations of a R(M)HD (=thermal) flow is an unavoidable task. Physically this means to solve the Bolztman equation to couple the space-time/momentum evolution of non-thermal particles to the dynamical evolution of the thermal jet flow. To solve Boltzman-eq. is a 7D problem (t, x, p)!!! Assuming one needs 100 points/dimension ⇒ 1014 points!!! Simplifications: 2D spatial axial-symmetry (r, z) + 1D time evolution (t) + 1D momentum evolution (p = p(E)) Total: 4D problem (t, r, z, E) CLAIMS for SUPERCOMPUTING! pb
b
∂f ∂xβ − Γa
bcpc ∂f
∂pa
δf δτ
.
15
We have developed a new algorithm to deal with such a huge 4D problem: SPEV It has taken more than 2 years of development to include lepton-transport fully coupled to R(M)HD. SPEV properties:
problem.
functions) for piece-wise power-law electron distribution functions [100 times faster than direct numerical integration].
dilation, Doppler boosting).
positions and their whole evolution is computed.
converted to 3D + time).
Energy evolution I-intensive. tobs t1 t2 t3
detector screen (observer) motion (v~c) towards observer
16
17
Radio imaging of a pc-scale jet Simulation data: Computing time:1 week with 512 CPUs RHD: 1600 x 80 x 20,000 (time steps) ~ 0.5 Tb + 2x105 particles Detector: 270 x 18 (screen area) x 128 (obs. time intervals) x 3 (frequencies) ~ 2x106
17
17
The parameter space in SPEV simulations spans variations of: 1.Magnetic field strength. 2.Jet-to-external medium pressure. 3.Initial proportionality between the energy and number density of the thermal and non- thermal particle distribution. 4.Initial power-law index of the non-thermal particle energy distribution. 5.Hydrodynamic properties of the beam and of the perturbation. All in all, the minimum dimension of the parameter space is 6! ⇒ Many models need to be run!! Supercomputing needed because: 1.Large number of models. 2.Large dimensionality of the problem. 3.Huge I/O needs.
18
The problem: Doppler factor crisis
➡The size of an emitting region is bounded by the light crossing time of the smallest variability timescale. ➡Example HST1-knot located at Rprojected ~ 60 pc from the core of M87:
➡ Radio/optical/X-ray variability: tvarX-rays < 0.14 yr ⇒ RemissX-rays < DtvarX-raysc ~ 0.044D pc << Rprojected (D ~ 4 - 20 Doppler factor from superluminal motions) ➡ TeV variability: It is correlated with the X-ray variability, but: tvarTeV << tvarX-rays ⇒ RemissTeV << RemissX-rays < DtvarTeVc ~ 0.002D pc
It has been proposed (Bromberg & Levinson 2008
so-called blazar zone) may produce substantial collimation of a jet whose beam crosses over a conical shock. We have tested such theoretical hypothesis with MRGENESIS (2D, RHD, spherical coordinates).
19
No Cooling Strong Cooling
20
Strong Cooling No Cooling
21
Strong Cooling No Cooling
Recession of the recollimation shock
21
Strong Cooling No Cooling
Slightly better collimation Recession of the recollimation shock
21
Strong Cooling No Cooling
Slightly better collimation Recession of the recollimation shock Recollimation shock changes geometry from planar (more effective to decelerate the beam) to bi-conical
21
Strong Cooling No Cooling
Slightly better collimation Recession of the recollimation shock Recollimation shock changes geometry from planar (more effective to decelerate the beam) to bi-conical slower beam before the shock
21
The need of supercomputing: Huge resolution because of the stiffness of cooling
➡ Each model needs a minimum resolution of: 20000 x 1000 (r x theta) ➡ 72 hours / run for stabilization ➡ Around 400 CPUs/run @ minimum resolution. ➡ In time dependent models (to be run), a proper time resolution will need an
spatial resolution: 40000 x 2000 ⇒ 4 times better spatial resolution ⇒ 8 times longer runs ~ 576 h/run with 400 CPUs ~ 230 h/run with 1000 CPUs
22
The results of the work related to the tasks developed in the RES, can be sum up with the following numbers:
➡ 6 publications in journals:
Scale Jets, ApJ, 696, 1142 (2009)
➡ 2 conference proceedings:
➡Results presented in 4 different conferences (two invited talks).
23
The results of the work related to the tasks developed in the RES, can be sum up with the following numbers:
➡ 6 publications in journals:
Scale Jets, ApJ, 696, 1142 (2009)
➡ 2 conference proceedings:
➡Results presented in 4 different conferences (two invited talks).
★All in all, it has payed off to use more than 1.5 million hours of computing time in Mare Nostrum.
23
The results of the work related to the tasks developed in the RES, can be sum up with the following numbers:
➡ 6 publications in journals:
Scale Jets, ApJ, 696, 1142 (2009)
➡ 2 conference proceedings:
➡Results presented in 4 different conferences (two invited talks).
★All in all, it has payed off to use more than 1.5 million hours of computing time in Mare Nostrum. ★We hope we will get, at least, a similar amount in the near future!.
23